Derivation Tree Reference#

Fandango constraints make use of symbols (anything enclosed in <...>) to express desired or required properties of generated inputs. During evaluation, any <SYMBOL> returns a derivation tree representing the composition of the produced or parsed string.

This derivation tree has a type of DerivationTree, as do all subtrees. DerivationTree objects support 213 functions, methods, and operators directly; after converting DerivationTree objects into standard Python types such as str, the entire Python ecosystem is available.

Derivation Tree Structure#

Let’s have a look at the structure of a derivation tree. Consider this derivation tree from the ISO 8601 date and time grammar:

_images/8a2abf9346eca73e3eecb8bb4a4ffa0041af1d7979c688828e21ae8804e7b3e9.png

The elements of the tree are designated as follows:

Node

Any connected element of a tree.

Child

An immediate descendant of a node. In the above tree, the <start> node has one child, <iso8601datetime>.

Descendant

A child of a node, or a descendant of one of its children. <iso8601month> is a descendant of <iso8601date>. All nodes (except the root node) are descendants of the root node.

Parent

The parent of a node \(N\) is the node that has \(N\) as a child. <start> is the parent of <iso8601datetime>.

Root

The node without parent; typically <start>.

Terminal Symbol

A node with at least one child.

Nonterminal Symbol

A node without children.

Concatenating all terminal symbols (using str(<SYMBOL>) or <SYMBOL>.value()) in a derivation tree yields the string represented. For the above tree, this would be 2025-10-27.

Evaluating Derivation Trees#

Since standard Python functions do not accept derivation trees as arguments, one must first convert them into an acceptable type. The value() method plays a central role in this.

<SYMBOL>.value() -> str | int | bytes

The method <SYMBOL>.value() evaluates <SYMBOL> and returns its value, which represents the concatenation of all descendants. The type of the return value is

  • A Unicode string (str) if all descendants are Unicode strings.

  • An integer (int) if all descendants are bits.

  • A byte string (bytes) in all other cases, notably if

    • any of the descendants of <SYMBOL> is a byte string, or

    • the descendants of <SYMBOL> contain bits and other elements.

  • None if <SYMBOL> expands into zero elements

Unicode Strings#

Any derivation tree that is composed only from Unicode strings will evaluate into a Python Unicode string (str). The descendants of <SYMBOL> are concatenated as strings.

Example - in

<foo> ::= <bar> "bara"
<bar> ::= "bar"

<foo>.value() will be the Unicode string "barbara".

This is the most common case.

Integers#

Any derivation tree that is composed only from bits will evaluate into a Python integer (int). The descendants are concatenated as bits; the most significant bit comes first.

Example - in

<foo> ::= <bar> 0 1 1
<bar> ::= 0 1 0

<foo>.value() will be the integer 0b010011, or 19.

Byte Strings#

Any derivation tree where any of the descendants is a byte string, or where the descendants are bits and other elements, will evaluate into a Python byte string (bytes). The descendants are all concatenated as follows:

  1. Any bit sequence \(B\) is converted into a bytes string with the most significant bit coming first.

  2. Any Unicode string \(S\) is converted into a bytes string representing \(S\) in UTF-8 encoding.

  3. The resulting byte strings are all concatenated.

Example 1 - in

<foo> ::= <bar> "foo"  # a Unicode string
<bar> ::= b"bar"       # a byte string

<foo>.value() will be the byte string b"barfoo".

Warning

If you mix byte strings and Unicode strings a grammar, Fandango will issue a warning.

Example 2 - in

<foo> ::= <bar> b"foo"    # a byte string
<bar> ::= 1 1 1 1 1 1 1 1 # a bit string

<foo>.value() will be the byte string b"foo\xff".

Warning

If you mix bits and Unicode strings in a grammar, Fandango will issue a warning.

General DerivationTree Functions#

These functions are available for all DerivationTree objects, regardless of the type they evaluate into.

Converters#

Since any <SYMBOL> has the type DerivationTree, one must convert it first into a standard Python type before passing it as argument to a standard Python function.

str(<SYMBOL>) -> str

Convert <SYMBOL> into a Unicode string. Byte strings in <SYMBOL> are converted using latin-1 encoding.

bytes(<SYMBOL>) -> bytes

Convert <SYMBOL> into a byte string. Unicode strings in <SYMBOL> are converted using utf-8 encoding.

int(<SYMBOL>) -> int

Convert <SYMBOL> into an integer, like the Python int() function. <SYMBOL> must be an int, or a Unicode string or byte string representing an integer literal.

float(<SYMBOL>) -> float

Convert <SYMBOL> into a floating-point number, like the Python float() function. <SYMBOL> must be an int, or a Unicode string or byte string representing a float literal.

complex(<SYMBOL>) -> complex

Convert <SYMBOL> into a complex number, like the Python complex() function. <SYMBOL> must be an int, or a Unicode string or byte string representing a float or complex literal.

bool(<SYMBOL>) -> bool

Convert <SYMBOL> into a truth value:

  • If <SYMBOL> evaluates into an integer (because it represents bits), the value will be True if the integer is non-zero.

  • If <SYMBOL> evaluates into a string (bytes or Unicode), the value will be True if the string is not empty.

Note that Python applies bool() conversion by default if a truth value is needed; hence, expressions like <flag_1> and <flag_2>, where both flags are bits, are allowed.

Node Attributes#

<SYMBOL>.sym() -> str | int | bytes

The symbol of the node:

  • for nonterminals, the symbol as a string (say, "<SYMBOL>")

  • for terminals, the value:

    • for Unicode strings, the value of the string (type str);

    • for bits, either 0 or 1 (type int)’

    • for bytes, the value of the byte string (type bytes).

<SYMBOL>.is_terminal() -> bool

True if <SYMBOL> is a terminal node.

<SYMBOL>.is_nonterminal() -> bool

True if <SYMBOL> is a nonterminal node.

<SYMBOL>.is_regex() -> bool

True if the (terminal) symbol of <SYMBOL> is a regular expression.

Accessing Children#

len(<SYMBOL>) -> int

Return the number of children of <SYMBOL>.

Note

To access the length of the string represented by <SYMBOL>, use len(str(<SYMBOL>)).

<SYMBOL>[n] -> DerivationTree

Access the nth child of <SYMBOL>, as a DerivationTree. <SYMBOL>[0] is the first child; <SYMBOL>[-1] is the last child.

Note

To access the nth character of <SYMBOL>, use str(<SYMBOL>)[n].

<SYMBOL>[start:stop] -> DerivationTree

Return a new DerivationTree which has the children <SYMBOL>[start] to <SYMBOL>[stop-1] as children. If start is omitted, children start from the beginning; if stop is omitted, children go up to the end, including the last one.

<SYMBOL>.children() -> list[DerivationTree]

Return a list containing all children of <SYMBOL>.

<SYMBOL>.children_values() -> list[str | int | bytes]

Return a list containing the values of all children of <SYMBOL>.

Note

Each element of the list can have a different type, depending on the type the value() method returns.

<SYMBOL_1> in <SYMBOL_2>

Return True if <SYMBOL_1> == CHILD for any of the children of <SYMBOL_2>.

VALUE in <SYMBOL>

Return True if VALUE == CHILD.value() for any of the children of <SYMBOL>.

Accessing Descendants#

<SYMBOL>.descendants() -> list[DerivationTree]

Return a list containing all descendants of <SYMBOL>; that is, all children and their transitive children.

<SYMBOL>.descendant_values() -> list[str | int | bytes]

Return a list containing the values of all descendants of <SYMBOL>; that is, the values of all children and their transitive children.

Note

Each element of the list can have a different type, depending on the type the value() method returns.

Accessing Parents#

<SYMBOL>.parent() -> DerivationTree | None

Return the parent of the current node, or None for the root node.

Accessing Sources#

<SYMBOL>.sources() -> list[DerivationTree]

Return a list containing all sources of <SYMBOL>. Sources are symbols used in generator expressions out of which the value of <SYMBOL> was created; see the section on data conversions for details.

Comparisons#

<SYMBOL_1> == <SYMBOL_2>

Returns True if both trees have the same structure and all nodes have the same values.

<SYMBOL> == VALUE

Returns True if <SYMBOL>.value() == VALUE

<SYMBOL_1> != <SYMBOL_2>

Returns True if both trees have a different structure or any nodes have different values.

<SYMBOL> == VALUE

Returns True if <SYMBOL>.value() == VALUE.

<SYMBOL_1> <|>|<=|>= <SYMBOL_2>

Returns True if <SYMBOL_1>.value() <|>|<=|>= <SYMBOL_2>.value().

<SYMBOL> <|>|<=|>= VALUE

Returns True if <SYMBOL>.value() <|>|<=|>= VALUE.

Debugging#

See the section on output formats for details on these representations.

<SYMBOL>.to_bits() -> str

Return a bit representation (0 and 1 characters) of <SYMBOL>.

<SYMBOL>.to_grammar() -> str

Return a grammar-like representation of <SYMBOL>.

<SYMBOL>.to_tree() -> str

Return a tree representation of <SYMBOL>, using Tree(...) constructors.

repr(<SYMBOL>) -> str

Return the internal representation of <SYMBOL>, as a DerivationTree constructor that can be evaluated as a Python expression.

Type-Specific Functions#

The bulk of available functions comes from the Python standard library.

Unicode Strings#

For derivation trees that evaluate into Unicode strings (str), all Python String methods are available, such as <SYMBOL>.startswith(), <SYMBOL>.endswith(), <SYMBOL>.strip(), and more. The method is invoked on the <SYMBOL>.value() string value.

Integers#

For derivation trees that evaluate into integers (int), all Python numeric operators and functions are available, including +, -, or abs(), as well as bitwise operators such as <<, &, ~, etc. Symbols can be used on either side of an operator; the operator is applied on the <SYMBOL>.value() integer value.

In addition, the Python methods on integer types can be used, such as <SYMBOL>.to_bytes() or <SYMBOL>.bit_count(). Methods are invoked on the <SYMBOL>.value() integer value.

Byte Strings#

For derivation trees that evaluate into byte strings (bytes), all Python bytes methods are available, including <SYMBOL>.decode(), <SYMBOL>.startswith(), <SYMBOL>.endswith(), etc. The method is invoked on the <SYMBOL>.value() byte string value.