630 lines
15 KiB
Markdown
630 lines
15 KiB
Markdown
# Treerack Manual
|
|
|
|
This manual describes the primary use cases and workflows supported by Treerack.
|
|
|
|
## Prerequisits
|
|
|
|
We assume a working installation of the standard Go tooling.
|
|
|
|
This manual relies on the treerack command-line tool. We can install it using one of the following methods.
|
|
|
|
**A. source installation (requires make):**
|
|
|
|
1. clone the repository `git clone https://code.squareroundforest.org/arpio/treerack`
|
|
2. navigate to the source directory, run: `make install`. To install it to a custom location, use the `prefix`
|
|
environment variable, e.g. run `prefix=~/.local make install`
|
|
3. verify the installation: run `treerack version` and `man treerack`
|
|
|
|
**B. via go install:**
|
|
|
|
Alternatively, we _may be able to_ install directly using the Go toolchain:
|
|
|
|
1. run `go install code.squareroundforest.org/arpio/treerack/cmd/treerack`
|
|
2. verify: `treerack help`
|
|
|
|
## Hello syntax
|
|
|
|
A basic syntax definition looks like this:
|
|
|
|
```
|
|
hello = "Hello, world!"
|
|
```
|
|
|
|
This definition matches only the exact string "Hello, world!" and nothing else. To test the validity of this
|
|
rule, run:
|
|
|
|
```
|
|
treerack check-syntax --syntax-string 'hello = "Hello, world!"'
|
|
```
|
|
|
|
If successful, the command exits silently with code 0. (We can append && echo ok to advertise successful
|
|
execution).
|
|
|
|
To test the syntax against actual input content:
|
|
|
|
```
|
|
treerack check --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
|
|
```
|
|
|
|
To visualize the resulting Abstract Syntax Tree (AST), use the show subcommand:
|
|
|
|
```
|
|
treerack show --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
|
|
```
|
|
|
|
The output will be raw JSON:
|
|
|
|
```
|
|
{"name":"hello","from":0,"to":13,"text":"Hello, world!"}
|
|
```
|
|
|
|
For a more readable output, add the --pretty flag:
|
|
|
|
```
|
|
treerack show --pretty --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
|
|
```
|
|
|
|
...then the output will look like this:
|
|
|
|
```
|
|
{
|
|
"name": "hello",
|
|
"from": 0,
|
|
"to": 13,
|
|
"text": "Hello, world!"
|
|
}
|
|
```
|
|
|
|
### Handling errors
|
|
|
|
If our syntax definition is invalid, check-syntax will fail:
|
|
|
|
```
|
|
treerack check-syntax --syntax-string 'foo = bar'
|
|
```
|
|
|
|
The above command will fail because the parser called foo references an undefined parser bar.
|
|
|
|
We can use check or show to detect when the input content does not match a valid syntax. Using the hello syntax,
|
|
we can try the following:
|
|
|
|
```
|
|
treerack check --syntax-string 'hello = "Hello, world!"' --input-string 'Hi!'
|
|
```
|
|
|
|
It will show that parsing the input failed and that it failed while using the parser hello.
|
|
|
|
## Basic syntax - An arithmetic calculator
|
|
|
|
In this section, we will build a basic arithmetic calculator. It will read a line from standard input, parse it
|
|
as an arithmetic expression, compute the result, and print it—effectively creating a REPL (Read-Eval-Print
|
|
Loop).
|
|
|
|
We will support addition +, subtraction -, multiplication *, division /, and grouping with parentheses ().
|
|
|
|
acalc.treerack:
|
|
|
|
```
|
|
// Define whitespace characters.
|
|
// The :ws flag marks this as the global whitespace handler.
|
|
ignore:ws = " " | [\t] | [\r] | [\n];
|
|
|
|
// Define the number format.
|
|
//
|
|
// The :nows flag ensures we do not skip whitespace *inside* the number token. We support integers, floats, and
|
|
// scientific notation (e.g., 1.5e3). Arbitrary leading zeros are disallowed to prevent confusion with octal
|
|
// literals.
|
|
num:nows = "-"? ("0" | [1-9][0-9]*) ("." [0-9]+)? ([eE] [+\-]? [0-9]+)?;
|
|
|
|
// define the supported operators:
|
|
add = "+";
|
|
sub = "-";
|
|
mul = "*";
|
|
div = "/";
|
|
|
|
// Grouping logic.
|
|
//
|
|
// Expressions can be enclosed in parentheses. This references 'expression', which is defined later,
|
|
// demonstrating recursive definitions. The :alias flag prevents 'group' from creating its own node in the AST;
|
|
// only the child 'expression' will appear.
|
|
group:alias = "(" expression ")";
|
|
|
|
// Operator Precedence.
|
|
//
|
|
// We group operators by precedence levels to ensure correct order of operations.
|
|
//
|
|
// Level 0 (High): Multiplication/Division
|
|
op0:alias = mul | div;
|
|
|
|
// Level 1 (Low): Addition/Subtraction
|
|
op1:alias = add | sub;
|
|
|
|
// Operands for each precedence level.
|
|
//
|
|
// operand0 can be a raw number or a grouped expression.
|
|
operand0:alias = num | group;
|
|
|
|
// operand1 can be a higher-precedence operand or a completed binary0 operation.
|
|
operand1:alias = operand0 | binary0;
|
|
|
|
// Binary Expressions.
|
|
//
|
|
// We define these hierarchically. 'binary0' handles high-precedence operations (mul/div).
|
|
binary0 = operand0 (op0 operand0)+;
|
|
binary1 = operand1 (op1 operand1)+;
|
|
binary:alias = binary0 | binary1;
|
|
|
|
// The generalized Expression.
|
|
//
|
|
// An expression is either a raw number, a group, or a binary operation.
|
|
expression:alias = num | group | binary;
|
|
|
|
// Root Definition.
|
|
//
|
|
// The final result is either a valid expression or the "exit" command. Since 'expression' is an alias, we need
|
|
// a concrete root parser to anchor the AST. Note: The :root flag is optional here because this is the last
|
|
// definition in the file.
|
|
result = expression | "exit"
|
|
```
|
|
|
|
### Testing the syntax
|
|
|
|
#### 1. Simple number
|
|
|
|
```
|
|
treerack show --pretty --syntax acalc.treerack --input-string 42
|
|
```
|
|
|
|
Output:
|
|
|
|
```
|
|
{
|
|
"name": "result",
|
|
"from": 0,
|
|
"to": 2,
|
|
"nodes": [
|
|
{
|
|
"name": "num",
|
|
"from": 0,
|
|
"to": 2,
|
|
"text": "42"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 2. Basic operation
|
|
|
|
```
|
|
treerack show --pretty --syntax acalc.treerack --input-string "42 + 24"
|
|
```
|
|
|
|
Output:
|
|
|
|
```
|
|
{
|
|
"name": "expression",
|
|
"from": 0,
|
|
"to": 7,
|
|
"nodes": [
|
|
{
|
|
"name": "binary1",
|
|
"from": 0,
|
|
"to": 7,
|
|
"nodes": [
|
|
{
|
|
"name": "num",
|
|
"from": 0,
|
|
"to": 2,
|
|
"text": "42"
|
|
},
|
|
{
|
|
"name": "add",
|
|
"from": 3,
|
|
"to": 4,
|
|
"text": "+"
|
|
},
|
|
{
|
|
"name": "num",
|
|
"from": 5,
|
|
"to": 7,
|
|
"text": "24"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 3. Precedence check
|
|
|
|
```
|
|
treerack show --pretty --syntax acalc.treerack --input-string "42 + 24 * 2"
|
|
```
|
|
|
|
Output:
|
|
|
|
```
|
|
{
|
|
"name": "result",
|
|
"from": 0,
|
|
"to": 11,
|
|
"nodes": [
|
|
{
|
|
"name": "binary1",
|
|
"from": 0,
|
|
"to": 11,
|
|
"nodes": [
|
|
{
|
|
"name": "num",
|
|
"from": 0,
|
|
"to": 2,
|
|
"text": "42"
|
|
},
|
|
{
|
|
"name": "add",
|
|
"from": 3,
|
|
"to": 4,
|
|
"text": "+"
|
|
},
|
|
{
|
|
"name": "binary0",
|
|
"from": 5,
|
|
"to": 11,
|
|
"nodes": [
|
|
{
|
|
"name": "num",
|
|
"from": 5,
|
|
"to": 7,
|
|
"text": "24"
|
|
},
|
|
{
|
|
"name": "mul",
|
|
"from": 8,
|
|
"to": 9,
|
|
"text": "*"
|
|
},
|
|
{
|
|
"name": "num",
|
|
"from": 10,
|
|
"to": 11,
|
|
"text": "2"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4. Grouping override
|
|
|
|
```
|
|
treerack show --pretty --syntax acalc.treerack --input-string "(42 + 24) * 2"
|
|
```
|
|
|
|
Notice how the 'group' alias node is not present, but now the expression of the addition is a factor in the
|
|
multiplication:
|
|
|
|
```
|
|
{
|
|
"name": "result",
|
|
"from": 0,
|
|
"to": 13,
|
|
"nodes": [
|
|
{
|
|
"name": "binary0",
|
|
"from": 0,
|
|
"to": 13,
|
|
"nodes": [
|
|
{
|
|
"name": "binary1",
|
|
"from": 1,
|
|
"to": 8,
|
|
"nodes": [
|
|
{
|
|
"name": "num",
|
|
"from": 1,
|
|
"to": 3,
|
|
"text": "42"
|
|
},
|
|
{
|
|
"name": "add",
|
|
"from": 4,
|
|
"to": 5,
|
|
"text": "+"
|
|
},
|
|
{
|
|
"name": "num",
|
|
"from": 6,
|
|
"to": 8,
|
|
"text": "24"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"name": "mul",
|
|
"from": 10,
|
|
"to": 11,
|
|
"text": "*"
|
|
},
|
|
{
|
|
"name": "num",
|
|
"from": 12,
|
|
"to": 13,
|
|
"text": "2"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Generator - Implementing the calculator
|
|
|
|
We will now generate the Go parser code and integrate it into a CLI application.
|
|
|
|
Initialize the project:
|
|
|
|
```
|
|
go mod init acalc && go mod tidy
|
|
```
|
|
|
|
Generate the parser:
|
|
|
|
```
|
|
treerack generate --syntax acalc.treerack > parser.go
|
|
```
|
|
|
|
Implement the application logic in main.go.
|
|
|
|
main.go:
|
|
|
|
```
|
|
package main
|
|
|
|
import (
|
|
"bufio"
|
|
"bytes"
|
|
"encoding/json"
|
|
"errors"
|
|
"fmt"
|
|
"io"
|
|
"log"
|
|
"os"
|
|
"strings"
|
|
)
|
|
|
|
var errExit = errors.New("exit")
|
|
|
|
// repl runs the Read-Eval-Print Loop.
|
|
func repl(input io.Reader, output io.Writer) {
|
|
|
|
// use buffered io, to be able to read the input line-by-line:
|
|
buf := bufio.NewReader(os.Stdin)
|
|
|
|
// our REPL loop:
|
|
for {
|
|
// print a basic prompt:
|
|
if _, err := output.Write([]byte("> ")); err != nil {
|
|
log.Fatalln(err)
|
|
}
|
|
|
|
// read the input and handle the errors:
|
|
expr, err := read(buf)
|
|
|
|
// Handle EOF (Ctrl+D)
|
|
if errors.Is(err, io.EOF) {
|
|
output.Write([]byte{'\n'})
|
|
os.Exit(0)
|
|
}
|
|
|
|
// Handle explicit exit command
|
|
if errors.Is(err, errExit) {
|
|
os.Exit(0)
|
|
}
|
|
|
|
// Handle parser errors (allow user to retry)
|
|
var perr *parseError
|
|
if errors.As(err, &perr) {
|
|
log.Println(err)
|
|
continue
|
|
}
|
|
|
|
if err != nil {
|
|
log.Fatalln(err)
|
|
}
|
|
|
|
// Evaluate and print
|
|
result := eval(expr)
|
|
if err := print(output, result); err != nil {
|
|
log.Fatalln(err)
|
|
}
|
|
}
|
|
}
|
|
|
|
func read(input *bufio.Reader) (*node, error) {
|
|
line, err := input.ReadString('\n')
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
// Parse the line using the generated parser
|
|
expr, err := parse(bytes.NewBufferString(line))
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
if strings.TrimSpace(expr.Text()) == "exit" {
|
|
return nil, errExit
|
|
}
|
|
|
|
// Based on our syntax, the root node always has exactly one child:
|
|
// either a number or a binary operation.
|
|
return expr.Nodes[0], nil
|
|
}
|
|
|
|
// eval always returns the calculated result as a float64:
|
|
func eval(expr *node) float64 {
|
|
var value float64
|
|
switch expr.Name {
|
|
case "num":
|
|
|
|
// the number format in our syntax is based on the JSON spec, so we can piggy-back on it for the number
|
|
// parsing. In a real application, we would need to handle the errors here anyway, even if our parser
|
|
// already validated the input:
|
|
json.Unmarshal([]byte(expr.Text()), &value)
|
|
return value
|
|
default:
|
|
|
|
// Handle binary expressions (recursively)
|
|
// Format: Operand [Operator Operand]...
|
|
value, expr.Nodes = eval(expr.Nodes[0]), expr.Nodes[1:]
|
|
for len(expr.Nodes) > 0 {
|
|
var (
|
|
operator string
|
|
operand float64
|
|
)
|
|
|
|
operator, operand, expr.Nodes = expr.Nodes[0].Name, eval(expr.Nodes[1]), expr.Nodes[2:]
|
|
switch operator {
|
|
case "add":
|
|
value += operand
|
|
case "sub":
|
|
value -= operand
|
|
case "mul":
|
|
value *= operand
|
|
case "div":
|
|
value /= operand // Go handles division by zero as ±Inf
|
|
}
|
|
}
|
|
}
|
|
|
|
return value
|
|
}
|
|
|
|
func print(output io.Writer, result float64) error {
|
|
_, err := fmt.Fprintln(output, result)
|
|
return err
|
|
}
|
|
|
|
func main() {
|
|
// for testability, we define the REPL loop in a separate function so that the test code can call it with
|
|
// in-memory buffers as input and output. Our main function calls it with the stdio handles:
|
|
repl(os.Stdin, os.Stdout)
|
|
}
|
|
```
|
|
|
|
### Running the calculator
|
|
|
|
Our arithmetic calculator is now ready. We can run it via `go run .`. An example session may look like this:
|
|
|
|
```
|
|
$ go run .
|
|
> (42 + 24) * 2
|
|
132
|
|
> 42 + 24 * 2
|
|
90
|
|
> 1 + 2 + 3
|
|
6
|
|
> exit
|
|
```
|
|
|
|
We can find the source files for this example here: [./examples/acalc](./examples/acalc).
|
|
|
|
## Important Note: Unescaping
|
|
|
|
Treerack does not automatically handle escape sequences (e.g., converting \n to a literal newline). If our
|
|
syntax supports escaped characters—common in string literals—the user code is responsible for "unescaping" the
|
|
raw text from the AST node.
|
|
|
|
This is analogous to how we needed to parse the numbers in the calculator example to convert the string
|
|
representation of a number into a Go float64.
|
|
|
|
## Programmatically loading syntaxes
|
|
|
|
While generating static code via treerack generate is the recommended approach, we can also load definitions
|
|
dynamically at runtime.
|
|
|
|
```
|
|
package parser
|
|
|
|
import (
|
|
"io"
|
|
"code.squareroundforest.org/arpio/treerack"
|
|
)
|
|
|
|
func initAndParse(syntax, content io.Reader) (*treerack.Node, error) {
|
|
s := &treerack.Syntax{}
|
|
if err := s.ReadSyntax(syntax); err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
if err := s.Init(); err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
return s.Parse(content)
|
|
}
|
|
```
|
|
|
|
Caution: Be mindful of security implications when loading syntax definitions from untrusted sources.
|
|
|
|
## Programmatically defining syntaxes
|
|
|
|
In rare cases where a syntax must be constructed computationally, we can define rules via the Go API:
|
|
|
|
```
|
|
package parser
|
|
|
|
import (
|
|
"io"
|
|
"code.squareroundforest.org/arpio/treerack"
|
|
)
|
|
|
|
func initAndParse(content io.Reader) (*treerack.Node, error) {
|
|
s := &treerack.Syntax{}
|
|
|
|
// whitespace:
|
|
s.Class("whitespace-chars", treerack.Alias, false, []rune{' ', '\t', '\r\, '\n'}, nil)
|
|
s.Choice("whitespace", treerack.Whitespace, "whitespace-chars")
|
|
|
|
s.Class("digit", treerack.Alias, false, nil, [][]rune{'0', '9'})
|
|
s.Sequence("number", treerack.NoWhitespace, treerack.SequenceItem{Name: "digit", Min: 1})
|
|
s.Class("operator", treerack.None, false, []rune{'+', '-'}, nil)
|
|
s.Sequence(
|
|
"expression",
|
|
treerack.Root,
|
|
treerack.SequenceItem{Name: "number"},
|
|
treerack.SequenceItem{Name: "operator"},
|
|
treerack.SequenceItem{Name: "number"},
|
|
)
|
|
|
|
if err := s.Init(); err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
return s.Parse(content)
|
|
}
|
|
```
|
|
|
|
## Summary
|
|
|
|
We have demonstrated how to use the Treerack tool to define, test, and implement a parser. We recommend the
|
|
following workflow:
|
|
|
|
1. draft: define a syntax in a .treerack file.
|
|
2. verify: use `treerack check` and `treerack show` to validate building blocks incrementally.
|
|
3. generate: use `treerack generate` to create embeddable Go code.
|
|
|
|
**Links:**
|
|
|
|
- the detailed documentation of the treerack definition language: [./syntax.md](./syntax.md)
|
|
- treerack command help: [../cmd/treerack/readme.md](../cmd/treerack/readme.md) or, if the command is installed,
|
|
`man treerack`, or `path/to/treerack help`
|
|
- the arithmetic calculator example: [./examples/acalc](./examples/acalc).
|
|
- additional examples: [./examples](./examples)
|
|
|
|
Happy parsing!
|