2026-01-18 22:52:27 +01:00
|
|
|
# Treerack Syntax Definition Language
|
|
|
|
|
|
|
|
|
|
The Treerack library uses a custom grammar description language derived from EBNF (Extended Backus-Naur Form).
|
|
|
|
|
It allows for the concise definition of recursive descent parsers.
|
|
|
|
|
|
|
|
|
|
A syntax file consists of a series of Production Rules (definitions), terminated by semicolons.
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
## Production rules
|
2026-01-18 22:52:27 +01:00
|
|
|
|
|
|
|
|
A rule assigns a name to a pattern expression. Rules may include optional flags to modify the parser's behavior
|
|
|
|
|
or the resulting AST (Abstract Syntax Tree).
|
|
|
|
|
|
|
|
|
|
```
|
2026-01-21 20:54:16 +01:00
|
|
|
rule-name = expression;
|
|
|
|
|
rule-name:flag1:flag2 = expression;
|
2026-01-18 22:52:27 +01:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Flags
|
|
|
|
|
|
|
|
|
|
Flags are appended to the rule name, separated by colons. They control AST generation, whitespace handling, and
|
|
|
|
|
error propagation.
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
- `alias`: transparent node. The rule validates input but does not create its own node in the AST. Children
|
2026-01-18 22:52:27 +01:00
|
|
|
nodes (if any) are attached to the parent of this rule.
|
2026-01-21 20:54:16 +01:00
|
|
|
- `ws`: global whitespace. Marks this rule as the designated whitespace handler. The parser will attempt to
|
2026-01-18 22:52:27 +01:00
|
|
|
match (and discard) this rule between tokens throughout the entire syntax.
|
2026-01-21 20:54:16 +01:00
|
|
|
- `nows`: no whitespace. Disables automatic whitespace skipping inside this rule. Useful for defining tokens
|
|
|
|
|
like string literals where spaces are significant. The flag `nows` is automatically applied to char sequences
|
|
|
|
|
like `"abc" or [abc]+.
|
|
|
|
|
- `root`: entry point. Explicitly marks the rule as the starting point of the syntax. If omitted, the last
|
2026-01-18 22:52:27 +01:00
|
|
|
defined rule is implied to be the root.
|
2026-01-21 20:54:16 +01:00
|
|
|
- `kw`: keyword. Marks the content as a reserved keyword.
|
|
|
|
|
- `nokw`: no keyword. Prevents the rule from matching text that matches a defined kw rule. Essential for
|
2026-01-18 22:52:27 +01:00
|
|
|
distinguishing identifiers from keywords (e.g., ensuring var is not parsed as a variable name).
|
2026-01-21 20:54:16 +01:00
|
|
|
- `failpass`: pass failure. If this rule fails to parse, the error is reported as a failure of the parent rule,
|
2026-01-18 22:52:27 +01:00
|
|
|
not this specific rule.
|
|
|
|
|
|
|
|
|
|
## Expressions
|
|
|
|
|
|
|
|
|
|
Expressions define the structure of the text to be parsed. They are composed of terminals, sequences, choices,
|
|
|
|
|
and quantifiers.
|
|
|
|
|
|
|
|
|
|
## Terminals
|
|
|
|
|
|
|
|
|
|
Terminals match specific characters or strings in the input.
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
- `"abc"` (string): Matches an exact sequence of characters. Equivalent to [a][b][c].
|
2026-01-18 22:52:27 +01:00
|
|
|
- `.` (any char): Matches any single character (wildcard).
|
|
|
|
|
- `[123]`, `[a-z]`, `[123a-z]` (class): Matches a single character from a set or range.
|
|
|
|
|
- `[^123]`, `[^a-z]`, `[^123a-z]` (not class) Matches any single character not in the set.
|
|
|
|
|
|
|
|
|
|
## Quantifiers
|
|
|
|
|
|
|
|
|
|
Quantifiers determine how many times an item must match. They are placed immediately after the item they modify.
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
- `?`: optional (zero or one).
|
|
|
|
|
- `*`: zero or more.
|
|
|
|
|
- `+`: one or more.
|
|
|
|
|
- `{n}`: exact count. Matches exactly n times.
|
|
|
|
|
- `{n,}`: at least. Matches n or more times.
|
|
|
|
|
- `{,m}`: at most. Matches between 0 and m times.
|
|
|
|
|
- `{n,m}`: range. Matches between n and m times.
|
2026-01-18 22:52:27 +01:00
|
|
|
|
|
|
|
|
## Composites
|
|
|
|
|
|
|
|
|
|
Complex patterns are built by combining terminals and other rules.
|
|
|
|
|
|
|
|
|
|
### 1. Sequences
|
|
|
|
|
|
|
|
|
|
Items written consecutively are matched in order.
|
|
|
|
|
|
|
|
|
|
```
|
2026-01-21 20:54:16 +01:00
|
|
|
// matches "A", then "B", then "C":
|
|
|
|
|
my-sequence = "A" "B" "C";
|
2026-01-18 22:52:27 +01:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2. Grouping
|
|
|
|
|
|
|
|
|
|
Parentheses (...) group items together, allowing quantifiers to apply to the entire group.
|
|
|
|
|
|
|
|
|
|
```
|
2026-01-21 20:54:16 +01:00
|
|
|
// matches "AB", "ABAB", "ABABAB"...:
|
|
|
|
|
my-group = ("A" "B")+;
|
2026-01-18 22:52:27 +01:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3. Choices
|
|
|
|
|
|
|
|
|
|
The pipe | character represents a choice between alternatives.
|
|
|
|
|
|
|
|
|
|
The parser evaluates all provided options against the input at the current position and selects the best match
|
|
|
|
|
based on the following priority rules:
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
1. _longest match_: the option that consumes the largest number of characters takes priority. This eliminates the
|
2026-01-18 22:52:27 +01:00
|
|
|
need to manually order specific matches before general ones (e.g., "integer" will always be chosen over "int" if
|
|
|
|
|
the input supports it, regardless of their order in the definition).
|
2026-01-21 20:54:16 +01:00
|
|
|
2. _first definition wins_: if multiple options consume the exact same number of characters, the option defined
|
2026-01-18 22:52:27 +01:00
|
|
|
first(left-most) in the list takes priority.
|
|
|
|
|
|
|
|
|
|
```
|
2026-01-21 20:54:16 +01:00
|
|
|
// longest match wins automatically: input "integer" is matched by 'type', even though "int" comes first.
|
2026-01-18 22:52:27 +01:00
|
|
|
type = "int" | "integer";
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
// Tie-breaker rule: if input is "foo", both options match 3 characters. Because 'identifier' is last, it takes
|
|
|
|
|
// priority over 'keyword'. (Use :kw and :nokw to control such situations, when it applies.)
|
2026-01-18 22:52:27 +01:00
|
|
|
content = keyword | identifier;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Comments
|
|
|
|
|
|
|
|
|
|
Comments follow C-style syntax and are ignored by the definition parser.
|
|
|
|
|
|
2026-01-21 20:54:16 +01:00
|
|
|
- line comments: start with // and end at the newline.
|
|
|
|
|
- block comments: enclosed in /* ... */.
|
2026-01-18 22:52:27 +01:00
|
|
|
|
|
|
|
|
## Examples
|
|
|
|
|
|
|
|
|
|
- [JSON](examples/json.treerack)
|
|
|
|
|
- [Scheme](examples/scheme.treerack)
|
|
|
|
|
- [Treerack (itself)](../syntax.treerack)
|