4.5 KiB
Treerack Syntax Definition Language
The Treerack library uses a custom grammar description language derived from EBNF (Extended Backus-Naur Form). It allows for the concise definition of recursive descent parsers.
A syntax file consists of a series of Production Rules (definitions), terminated by semicolons.
Production rules
A rule assigns a name to a pattern expression. Rules may include optional flags to modify the parser's behavior or the resulting AST (Abstract Syntax Tree).
rule-name = expression;
rule-name:flag1:flag2 = expression;
Flags
Flags are appended to the rule name, separated by colons. They control AST generation, whitespace handling, and error propagation.
alias: transparent node. The rule validates input but does not create its own node in the AST. Children nodes (if any) are attached to the parent of this rule.ws: global whitespace. Marks this rule as the designated whitespace handler. The parser will attempt to match (and discard) this rule between tokens throughout the entire syntax.nows: no whitespace. Disables automatic whitespace skipping inside this rule. Useful for defining tokens like string literals where spaces are significant. The flagnowsis automatically applied to char sequences like `"abc" or [abc]+.root: entry point. Explicitly marks the rule as the starting point of the syntax. If omitted, the last defined rule is implied to be the root.kw: keyword. Marks the content as a reserved keyword.nokw: no keyword. Prevents the rule from matching text that matches a defined kw rule. Essential for distinguishing identifiers from keywords (e.g., ensuring var is not parsed as a variable name).failpass: pass failure. If this rule fails to parse, the error is reported as a failure of the parent rule, not this specific rule.
Expressions
Expressions define the structure of the text to be parsed. They are composed of terminals, sequences, choices, and quantifiers.
Terminals
Terminals match specific characters or strings in the input.
"abc"(string): Matches an exact sequence of characters. Equivalent to [a][b][c]..(any char): Matches any single character (wildcard).[123],[a-z],[123a-z](class): Matches a single character from a set or range.[^123],[^a-z],[^123a-z](not class) Matches any single character not in the set.
Quantifiers
Quantifiers determine how many times an item must match. They are placed immediately after the item they modify.
?: optional (zero or one).*: zero or more.+: one or more.{n}: exact count. Matches exactly n times.{n,}: at least. Matches n or more times.{,m}: at most. Matches between 0 and m times.{n,m}: range. Matches between n and m times.
Composites
Complex patterns are built by combining terminals and other rules.
1. Sequences
Items written consecutively are matched in order.
// matches "A", then "B", then "C":
my-sequence = "A" "B" "C";
2. Grouping
Parentheses (...) group items together, allowing quantifiers to apply to the entire group.
// matches "AB", "ABAB", "ABABAB"...:
my-group = ("A" "B")+;
3. Choices
The pipe | character represents a choice between alternatives.
The parser evaluates all provided options against the input at the current position and selects the best match based on the following priority rules:
- longest match: the option that consumes the largest number of characters takes priority. This eliminates the need to manually order specific matches before general ones (e.g., "integer" will always be chosen over "int" if the input supports it, regardless of their order in the definition).
- first definition wins: if multiple options consume the exact same number of characters, the option defined first(left-most) in the list takes priority.
// longest match wins automatically: input "integer" is matched by 'type', even though "int" comes first.
type = "int" | "integer";
// Tie-breaker rule: if input is "foo", both options match 3 characters. Because 'identifier' is last, it takes
// priority over 'keyword'. (Use :kw and :nokw to control such situations, when it applies.)
content = keyword | identifier;
Comments
Comments follow C-style syntax and are ignored by the definition parser.
- line comments: start with // and end at the newline.
- block comments: enclosed in /* ... */.