# Treerack Syntax Definition Language The Treerack library uses a custom grammar description language derived from EBNF (Extended Backus-Naur Form). It allows for the concise definition of recursive descent parsers. A syntax file consists of a series of Production Rules (definitions), terminated by semicolons. ## Production Rules A rule assigns a name to a pattern expression. Rules may include optional flags to modify the parser's behavior or the resulting AST (Abstract Syntax Tree). ``` RuleName = Expression; RuleName:flag1:flag2 = Expression; ``` ## Flags Flags are appended to the rule name, separated by colons. They control AST generation, whitespace handling, and error propagation. - `alias`: Transparent Node. The rule validates input but does not create its own node in the AST. Children nodes (if any) are attached to the parent of this rule. - `ws`: Global Whitespace. Marks this rule as the designated whitespace handler. The parser will attempt to match (and discard) this rule between tokens throughout the entire syntax. - `nows`: No Whitespace. Disables automatic whitespace skipping inside this rule. Useful for defining tokens like string literals where spaces are significant. - `root`: Entry Point. Explicitly marks the rule as the starting point of the syntax. If omitted, the last defined rule is implied to be the root. - `kw`: Keyword. Marks the content as a reserved keyword. - `nokw`: No Keyword. Prevents the rule from matching text that matches a defined kw rule. Essential for distinguishing identifiers from keywords (e.g., ensuring var is not parsed as a variable name). - `failpass`: Pass Failure. If this rule fails to parse, the error is reported as a failure of the parent rule, not this specific rule. ## Expressions Expressions define the structure of the text to be parsed. They are composed of terminals, sequences, choices, and quantifiers. ## Terminals Terminals match specific characters or strings in the input. - `"abc"` (string): Matches an exact sequence of characters. - `.` (any char): Matches any single character (wildcard). - `[123]`, `[a-z]`, `[123a-z]` (class): Matches a single character from a set or range. - `[^123]`, `[^a-z]`, `[^123a-z]` (not class) Matches any single character not in the set. ## Quantifiers Quantifiers determine how many times an item must match. They are placed immediately after the item they modify. - `?`: Optional (Zero or one). - `*`: Zero or more. - `+`: One or more. - `{n}`: Exact count. Matches exactly n times. - `{n,}`: At least. Matches n or more times. - `{,m}`: At most. Matches between 0 and m times. - `{n,m}`: Range. Matches between n and m times. ## Composites Complex patterns are built by combining terminals and other rules. ### 1. Sequences Items written consecutively are matched in order. ``` // Matches "A", then "B", then "C" MySequence = "A" "B" "C"; ``` ### 2. Grouping Parentheses (...) group items together, allowing quantifiers to apply to the entire group. ``` // Matches "AB", "ABAB", "ABABAB"... MyGroup = ("A" "B")+; ``` ### 3. Choices The pipe | character represents a choice between alternatives. The parser evaluates all provided options against the input at the current position and selects the best match based on the following priority rules: 1. _Longest Match_: The option that consumes the largest number of characters takes priority. This eliminates the need to manually order specific matches before general ones (e.g., "integer" will always be chosen over "int" if the input supports it, regardless of their order in the definition). 2. _First Definition Wins_: If multiple options consume the exact same number of characters, the option defined first(left-most) in the list takes priority. ``` // Longest match wins automatically: // Input "integer" is matched by 'type', even though "int" comes first. type = "int" | "integer"; // Tie-breaker rule: // If input is "foo", both options match 3 characters. // Because 'identifier' is last, it takes priority over 'keyword'. // (Use :kw and :nokw to control such situations, when it applies.) content = keyword | identifier; ``` ## Comments Comments follow C-style syntax and are ignored by the definition parser. - Line comments: Start with // and end at the newline. - Block comments: Enclosed in /* ... */. ## Examples - [JSON](examples/json.treerack) - [Scheme](examples/scheme.treerack) - [Treerack (itself)](../syntax.treerack)