treerack/docs/manual.md

# Treerack Manual

This manual describes the primary use cases and workflows supported by Treerack.

## Prerequisits

We assume a working installation of the standard Go tooling.

This manual relies on the treerack command-line tool. We can install it using one of the following methods.

**A. source installation (requires make):**

1. clone the repository `git clone https://code.squareroundforest.org/arpio/treerack`
2. navigate to the source directory, run: `make install`. To install it to a custom location, use the `prefix`
environment variable, e.g. run `prefix=~/.local make install`
3. verify the installation: run `treerack version` and `man treerack`

**B. via go install:**

Alternatively, we _may be able to_ install directly using the Go toolchain:

1. run `go install code.squareroundforest.org/arpio/treerack/cmd/treerack`
2. verify: `treerack help`

## Hello syntax

A basic syntax definition looks like this:

```
hello = "Hello, world!"
```

This definition matches only the exact string "Hello, world!" and nothing else. To test the validity of this
rule, run:

```
treerack check-syntax --syntax-string 'hello = "Hello, world!"'
```

If successful, the command exits silently with code 0. (We can append && echo ok to advertise successful
execution).

To test the syntax against actual input content:

```
treerack check --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
```

To visualize the resulting Abstract Syntax Tree (AST), use the show subcommand:

```
treerack show --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
```

The output will be raw JSON:

```
{"name":"hello","from":0,"to":13,"text":"Hello, world!"}
```

For a more readable output, add the --pretty flag:

```
treerack show --pretty --syntax-string 'hello = "Hello, world!"' --input-string 'Hello, world!'
```

...then the output will look like this:

```
{
    "name": "hello",
    "from": 0,
    "to": 13,
    "text": "Hello, world!"
}
```

### Handling errors

If our syntax definition is invalid, check-syntax will fail:

```
treerack check-syntax --syntax-string 'foo = bar'
```

The above command will fail because the parser called foo references an undefined parser bar.

We can use check or show to detect when the input content does not match a valid syntax. Using the hello syntax,
we can try the following:

```
treerack check --syntax-string 'hello = "Hello, world!"' --input-string 'Hi!'
```

It will show that parsing the input failed and that it failed while using the parser hello.

## Basic syntax - An arithmetic calculator

In this section, we will build a basic arithmetic calculator. It will read a line from standard input, parse it
as an arithmetic expression, compute the result, and print it—effectively creating a REPL (Read-Eval-Print
Loop).

We will support addition +, subtraction -, multiplication *, division /, and grouping with parentheses ().

acalc.treerack:

```
// Define whitespace characters.
// The :ws flag marks this as the global whitespace handler.
ignore:ws = " " | [\t] | [\r] | [\n];

// Define the number format.
//
// The :nows flag ensures we do not skip whitespace *inside* the number token. We support integers, floats, and
// scientific notation (e.g., 1.5e3). Arbitrary leading zeros are disallowed to prevent confusion with octal
// literals.
num:nows = "-"? ("0" | [1-9][0-9]*) ("." [0-9]+)? ([eE] [+\-]? [0-9]+)?;

// define the supported operators:
add = "+";
sub = "-";
mul = "*";
div = "/";

// Grouping logic.
//
// Expressions can be enclosed in parentheses. This references 'expression', which is defined later,
// demonstrating recursive definitions. The :alias flag prevents 'group' from creating its own node in the AST;
// only the child 'expression' will appear.
group:alias = "(" expression ")";

// Operator Precedence.
//
// We group operators by precedence levels to ensure correct order of operations.
//
// Level 0 (High): Multiplication/Division
op0:alias = mul | div;

// Level 1 (Low): Addition/Subtraction
op1:alias = add | sub;

// Operands for each precedence level.
//
// operand0 can be a raw number or a grouped expression.
operand0:alias = num | group;

// operand1 can be a higher-precedence operand or a completed binary0 operation.
operand1:alias = operand0 | binary0;

// Binary Expressions.
//
// We define these hierarchically. 'binary0' handles high-precedence operations (mul/div).
binary0 = operand0 (op0 operand0)+;
binary1 = operand1 (op1 operand1)+;
binary:alias = binary0 | binary1;

// The generalized Expression.
//
// An expression is either a raw number, a group, or a binary operation.
expression:alias = num | group | binary;

// Root Definition.
//
// The final result is either a valid expression or the "exit" command. Since 'expression' is an alias, we need
// a concrete root parser to anchor the AST. Note: The :root flag is optional here because this is the last
// definition in the file.
result = expression | "exit"
```

### Testing the syntax

#### 1. Simple number

```
treerack show --pretty --syntax acalc.treerack --input-string 42
```

Output:

```
{
    "name": "result",
    "from": 0,
    "to": 2,
    "nodes": [
        {
            "name": "num",
            "from": 0,
            "to": 2,
            "text": "42"
        }
    ]
}
```

#### 2. Basic operation

```
treerack show --pretty --syntax acalc.treerack --input-string "42 + 24"
```

Output:

```
{
    "name": "expression",
    "from": 0,
    "to": 7,
    "nodes": [
        {
            "name": "binary1",
            "from": 0,
            "to": 7,
            "nodes": [
                {
                    "name": "num",
                    "from": 0,
                    "to": 2,
                    "text": "42"
                },
                {
                    "name": "add",
                    "from": 3,
                    "to": 4,
                    "text": "+"
                },
                {
                    "name": "num",
                    "from": 5,
                    "to": 7,
                    "text": "24"
                }
            ]
        }
    ]
}
```

#### 3. Precedence check

```
treerack show --pretty --syntax acalc.treerack --input-string "42 + 24 * 2"
```

Output:

```
{
    "name": "result",
    "from": 0,
    "to": 11,
    "nodes": [
        {
            "name": "binary1",
            "from": 0,
            "to": 11,
            "nodes": [
                {
                    "name": "num",
                    "from": 0,
                    "to": 2,
                    "text": "42"
                },
                {
                    "name": "add",
                    "from": 3,
                    "to": 4,
                    "text": "+"
                },
                {
                    "name": "binary0",
                    "from": 5,
                    "to": 11,
                    "nodes": [
                        {
                            "name": "num",
                            "from": 5,
                            "to": 7,
                            "text": "24"
                        },
                        {
                            "name": "mul",
                            "from": 8,
                            "to": 9,
                            "text": "*"
                        },
                        {
                            "name": "num",
                            "from": 10,
                            "to": 11,
                            "text": "2"
                        }
                    ]
                }
            ]
        }
    ]
}
```

#### 4. Grouping override

```
treerack show --pretty --syntax acalc.treerack --input-string "(42 + 24) * 2"
```

Notice how the 'group' alias node is not present, but now the expression of the addition is a factor in the
multiplication:

```
{
    "name": "result",
    "from": 0,
    "to": 13,
    "nodes": [
        {
            "name": "binary0",
            "from": 0,
            "to": 13,
            "nodes": [
                {
                    "name": "binary1",
                    "from": 1,
                    "to": 8,
                    "nodes": [
                        {
                            "name": "num",
                            "from": 1,
                            "to": 3,
                            "text": "42"
                        },
                        {
                            "name": "add",
                            "from": 4,
                            "to": 5,
                            "text": "+"
                        },
                        {
                            "name": "num",
                            "from": 6,
                            "to": 8,
                            "text": "24"
                        }
                    ]
                },
                {
                    "name": "mul",
                    "from": 10,
                    "to": 11,
                    "text": "*"
                },
                {
                    "name": "num",
                    "from": 12,
                    "to": 13,
                    "text": "2"
                }
            ]
        }
    ]
}
```

## Generator - Implementing the calculator

We will now generate the Go parser code and integrate it into a CLI application.

Initialize the project:

```
go mod init acalc && go mod tidy
```

Generate the parser:

```
treerack generate --syntax acalc.treerack > parser.go
```

Implement the application logic in main.go.

main.go:

```
package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"errors"
	"fmt"
	"io"
	"log"
	"os"
	"strings"
)

var errExit = errors.New("exit")

// repl runs the Read-Eval-Print Loop.
func repl(input io.Reader, output io.Writer) {

	// use buffered io, to be able to read the input line-by-line:
	buf := bufio.NewReader(os.Stdin)

	// our REPL loop:
	for {
		// print a basic prompt:
		if _, err := output.Write([]byte("> ")); err != nil {
			log.Fatalln(err)
		}

		// read the input and handle the errors:
		expr, err := read(buf)

		// Handle EOF (Ctrl+D)
		if errors.Is(err, io.EOF) {
			output.Write([]byte{'\n'})
			os.Exit(0)
		}

		// Handle explicit exit command
		if errors.Is(err, errExit) {
			os.Exit(0)
		}

		// Handle parser errors (allow user to retry)
		var perr *parseError
		if errors.As(err, &perr) {
			log.Println(err)
			continue
		}

		if err != nil {
			log.Fatalln(err)
		}

		// Evaluate and print
		result := eval(expr)
		if err := print(output, result); err != nil {
			log.Fatalln(err)
		}
	}
}

func read(input *bufio.Reader) (*node, error) {
	line, err := input.ReadString('\n')
	if err != nil {
		return nil, err
	}

	// Parse the line using the generated parser
	expr, err := parse(bytes.NewBufferString(line))
	if err != nil {
		return nil, err
	}

	if strings.TrimSpace(expr.Text()) == "exit" {
		return nil, errExit
	}

	// Based on our syntax, the root node always has exactly one child:
	// either a number or a binary operation.
	return expr.Nodes[0], nil
}

// eval always returns the calculated result as a float64:
func eval(expr *node) float64 {
	var value float64
	switch expr.Name {
	case "num":

		// the number format in our syntax is based on the JSON spec, so we can piggy-back on it for the number
		// parsing. In a real application, we would need to handle the errors here anyway, even if our parser
		// already validated the input:
		json.Unmarshal([]byte(expr.Text()), &value)
		return value
	default:

		// Handle binary expressions (recursively)
		// Format: Operand [Operator Operand]...
		value, expr.Nodes = eval(expr.Nodes[0]), expr.Nodes[1:]
		for len(expr.Nodes) > 0 {
			var (
				operator string
				operand  float64
			)

			operator, operand, expr.Nodes = expr.Nodes[0].Name, eval(expr.Nodes[1]), expr.Nodes[2:]
			switch operator {
			case "add":
				value += operand
			case "sub":
				value -= operand
			case "mul":
				value *= operand
			case "div":
				value /= operand // Go handles division by zero as ±Inf
			}
		}
	}

	return value
}

func print(output io.Writer, result float64) error {
	_, err := fmt.Fprintln(output, result)
	return err
}

func main() {
	// for testability, we define the REPL loop in a separate function so that the test code can call it with
	// in-memory buffers as input and output. Our main function calls it with the stdio handles:
	repl(os.Stdin, os.Stdout)
}
```

### Running the calculator

Our arithmetic calculator is now ready. We can run it via `go run .`. An example session may look like this:

```
$ go run .
> (42 + 24) * 2
132
> 42 + 24 * 2
90
> 1 + 2 + 3
6
> exit
```

We can find the source files for this example here: [./examples/acalc](./examples/acalc).

## Important Note: Unescaping

Treerack does not automatically handle escape sequences (e.g., converting \n to a literal newline). If our
syntax supports escaped characters—common in string literals—the user code is responsible for "unescaping" the
raw text from the AST node.

This is analogous to how we needed to parse the numbers in the calculator example to convert the string
representation of a number into a Go float64.

## Programmatically loading syntaxes

While generating static code via treerack generate is the recommended approach, we can also load definitions
dynamically at runtime.

```
package parser

import (
	"io"
	"code.squareroundforest.org/arpio/treerack"
)

func initAndParse(syntax, content io.Reader) (*treerack.Node, error) {
	s := &treerack.Syntax{}
	if err := s.ReadSyntax(syntax); err != nil {
		return nil, err
	}

	if err := s.Init(); err != nil {
		return nil, err
	}

	return s.Parse(content)
}
```

Caution: Be mindful of security implications when loading syntax definitions from untrusted sources.

## Programmatically defining syntaxes

In rare cases where a syntax must be constructed computationally, we can define rules via the Go API:

```
package parser

import (
	"io"
	"code.squareroundforest.org/arpio/treerack"
)

func initAndParse(content io.Reader) (*treerack.Node, error) {
	s := &treerack.Syntax{}

	// whitespace:
	s.Class("whitespace-chars", treerack.Alias, false, []rune{' ', '\t', '\r\, '\n'}, nil)
	s.Choice("whitespace", treerack.Whitespace, "whitespace-chars")

	s.Class("digit", treerack.Alias, false, nil, [][]rune{'0', '9'})
	s.Sequence("number", treerack.NoWhitespace, treerack.SequenceItem{Name: "digit", Min: 1})
	s.Class("operator", treerack.None, false, []rune{'+', '-'}, nil)
	s.Sequence(
		"expression",
		treerack.Root,
		treerack.SequenceItem{Name: "number"},
		treerack.SequenceItem{Name: "operator"},
		treerack.SequenceItem{Name: "number"},
	)

	if err := s.Init(); err != nil {
		return nil, err
	}

	return s.Parse(content)
}
```

## Summary

We have demonstrated how to use the Treerack tool to define, test, and implement a parser. We recommend the
following workflow:

1. draft: define a syntax in a .treerack file.
2. verify: use `treerack check` and `treerack show` to validate building blocks incrementally.
3. generate: use `treerack generate` to create embeddable Go code.

**Links:**

- the detailed documentation of the treerack definition language: [./syntax.md](./syntax.md)
- treerack command help: [../cmd/treerack/readme.md](../cmd/treerack/readme.md) or, if the command is installed,
  `man treerack`, or `path/to/treerack help`
- the arithmetic calculator example: [./examples/acalc](./examples/acalc).
- additional examples: [./examples](./examples)

Happy parsing!