Accessing Input Elements#
When dealing with complex input formats, attaching constraints can be complex, as elements can occur multiple times in a generated input. Fandango offers a few mechanisms to disambiguate these, and to specify specific contexts in which to search for elements.
Derivation Trees#
So far, we have always assumed that Fandango generates strings from the grammar. Behind the scenes, though, Fandango creates a far richer data structure - a so-called derivation tree that maintains the structure of the string and allows accessing individual elements. Every time Fandango sees a grammar rule
<symbol> ::= ...
it generates a derivation tree whose root is <symbol>
and whose children are the elements of the right-hand side of the rule.
Let’s have a look at our persons.fan
spec:
<start> ::= <person_name> "," <age>
<person_name> ::= <first_name> " " <last_name>
<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<age> ::= <digit>+
The <start>
rule says
<start> ::= <person_name> "," <age>
Then, a resulting derivation tree for <start>
looks like this:

As Fandango expands more and more symbols, it expands the derivation tree accordingly.
Since the grammar definition for <person_name>
says
<person_name> ::= <first_name> " " <last_name>
the above derivation tree would be extended to

And if we next extend <age>
and then <digit>
based on their definitions
<age> ::= <digit>+
our tree gets to look like this:

Repeating the process, it thus obtains a tree like this:

Note how the tree records the entire history of how it was created - how it was derived, actually.
To obtain a string from the tree, we traverse its children left-to-right,
ignoring all nonterminal symbols (in <...>
) and considering only the terminal symbols (in quotes).
This is what we get for the above tree:
Ex Pltz,18
And this is the string Fandango produces. However, viewing the Fandango results as derivation trees allows us to access elements of the Fandango-produced strings and to express constraints on them.
Diagnosing Derivation Trees#
To examine the derivation trees that Fandango produces, use the --format=grammar
output format.
This produces the output in a grammar-like format, where children are indented under their respective parents.
As an example, here is how to print a derivation tree from persons.fan
:
$ fandango fuzz -f persons.fan -n 1 --format=grammar
<start> ::= <person_name> ',' <age> # Position 0x0000 (0); 'Pl Seov,5150'
<person_name> ::= <first_name> ' ' <last_name> # Position 0x0001 (1); 'Pl Seov'
<first_name> ::= <name>
<name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter> # 'Pl'
<ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
<_ascii_uppercase_letter> ::= 'P' # Position 0x0002 (2)
<ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
<_ascii_lowercase_letter> ::= 'l' # Position 0x0003 (3)
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter> # 'Seov'
<ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
<_ascii_uppercase_letter> ::= 'S' # Position 0x0004 (4)
<ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
<_ascii_lowercase_letter> ::= 'e' # Position 0x0005 (5)
<ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
<_ascii_lowercase_letter> ::= 'o' # Position 0x0006 (6)
<ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
<_ascii_lowercase_letter> ::= 'v' # Position 0x0007 (7)
<age> ::= <digit> <digit> <digit> <digit> # '5150'
<digit> ::= <_digit>
<_digit> ::= '5' # Position 0x0008 (8)
<digit> ::= <_digit>
<_digit> ::= '1' # Position 0x0009 (9)
<digit> ::= <_digit>
<_digit> ::= '5' # Position 0x000a (10)
<digit> ::= <_digit>
<_digit> ::= '0' # Position 0x000b (11)
We see how the produced derivation tree consists of a <start>
symbol, whose <first_name>
and <last_name>
children expand into <name>
and letters; the <age>
symbol expands into <digit>
symbols.
The comments (after #
) show the individual positions into the input, as well as the values of compound symbols.
What is the full string represented by the above derivation tree?
Solution
It’s 'Pl Seov,5150'
, as you can find on the right-hand side of the first line.
Tip
The --format=grammar
option is great for debugging, especially binary formats.
Specifying Paths#
One effect of Fandango producing derivation trees rather than “just” strings is that we can define special operators that allow us to access subtrees (or sub-elements) of the produced strings - and express constraints on them.
This is especially useful if we want constraints to apply only in specific contexts - say, as part of some element <a>
, but not as part of an element <b>
.
Accessing Children#
The expression <foo>[N]
accesses the N
-th child of <foo>
, starting with zero.
If <foo>
is defined in the grammar as
<foo> ::= <bar> ":" <baz>
then <foo>[0]
returns the <bar>
element, <foo>[1]
returns ":"
, and <foo>[2]
returns the <baz>
element.
In our persons.fan
derivation tree for Ex Pltz
, for instance, <start>[0]
would return the <person_name>
element ("Ex Pltz"
), and <start>[2]
would return the <age>
element (18
).
We can use this to access elements in specific contexts.
For instance, if we want to refer to a <name>
element, but only if it is the child of a <first_name>
element, we can refer to it as <first_name>[0]
- the first child of a <first_name>
element:
<first_name> ::= <name>
Here is a constraint that makes Fandango produce first names that end with x
:
$ fandango fuzz -f persons.fan -n 10 -c '<first_name>[0].endswith("x")'
Sfsx Dnxwk,058
Hnufx Tjlne,41
Yzx Hqzgsq,4266
Mx Yarr,27881
Vx Imb,79
Bhrcjx Smkmyl,16568
Zhyaphx Mibrqmuoh,1609676
Xtx Btxsqe,31024
Dttnopx Xw,28135555
Gvpx Er,5221476
Note
As in Python, you can use negative indexes to refer to the last elements.
<age>[-1]
, for instance, gives you the last child of an <age>
subtree.
Warning
While symbols act as strings in many contexts, this is where they differ.
To access the first character of a symbol <foo>
, you need to explicitly convert it to a string first, as in str(<foo>)[0]
.
Slices#
Fandango also allows you to use Python slice syntax to access multiple children at once.
<name>[n:m]
returns a new (unnamed) root which has <name>[n]
, <name>[n + 1]
, …, <name>[m - 1]
as children.
This is useful, for instance, if you want to compare several children against a string:
$ fandango fuzz -f persons-faker.fan -n 10 -c '<name>[0:2] == "Ch"'
Christopher Church,79
Christopher Chambers,1799
Christopher Chambers,53
Christopher Chang,968
Charles Chambers,05
Christopher Chambers,1788
Christine Christensen,5
Christopher Chambers,2260483086
Christopher Chambers,10
Christopher Chang,1788
Would one also be able to use <start>[0:2] == "Ch"
to obtain inputs that all start with "Ch"
?
Solution
No, this would not work. Remember that in derivation trees, indexes refer to children, not characters. So, according to the rule
<start> ::= <person_name> "," <age>
<start>[0]
is a <person_name>
, <start>[1]
is a ","
, and <start>[2]
is an <age>
.
Hence, <start>[0:2]
refers to <start>
itself, which cannot be "Ch"
.
Indeed, to have the string start with "Ch"
, you (again) need to convert <start>
into a string first, and then access its individual characters:
$ fandango fuzz -f persons-faker.fan -n 10 -c 'str(<start>)[0:2] == "Ch"'
Christopher Burns,45
Chad Austin,640
Christina Horton,31
Christopher Hoover,84
Chad Allen,053
Charles Alexander,31
Christine Paul,9
Chad Rivera,801
Christine Paul,6
Christopher Burns,9
Fandango supports the full Python slice semantics:
An omitted first index defaults to zero, so
<foo>[:2]
returns the first two children.An omitted second index defaults to the size of the string being sliced, so
<foo>[2:]
returns all children starting with<foo>[2]
.Both the first and the second index can be negative again.
Selecting Children#
Referring to children by number, as in <foo>[0]
, can be a bit cumbersome.
This is why in Fandango, you can also refer to elements by name.
The expression <foo>.<bar>
allows accessing elements <bar>
when they are a direct child of a symbol <foo>
.
This requires that <bar>
occurs in the grammar rule defining <foo>
:
<foo> ::= ...some expansion that has <bar>...
To refer to the <name>
element as a direct child of a <first_name>
element, you thus write <first_name>.<name>
.
This allows you to express the earlier constraint in a possibly more readable form:
$ fandango fuzz -f persons.fan -n 10 -c '<first_name>.<name>.endswith("x")'
Qaaox Aaz,69
Owdyqx Oism,75
Ex Epc,8876
Lwx Heflw,093
Tiawox Pi,1
Wmaoqx Wmi,2976
Orx Ba,39625
Nzx Gju,10950
Lwx Peflw,093
Ukx Ze,5906762
Note
You can only access nonterminal children this way; <person_name>." "
(the space in the <person_name>
) gives an error.
Selecting Descendants#
Often, you want to refer to elements in a particular context set by the enclosing element. This is why in Fandango, you can also refer to descendants.
The expression <foo>..<bar>
allows accessing elements <bar>
when they are a descendant of a symbol <foo>
.
<bar>
is a descendant of <foo>
if
<bar>
is a child of<foo>
; orone of
<foo>
’s children has<bar>
as a descendant.
If that sounds like a recursive definition, that is because it is.
A simpler way to think about <foo>..<bar>
may be “All <bar>
s that occur within <foo>
”.
Let us take a look at some rules in our persons.fan
example:
<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<ascii_uppercase_letter> ::= "A" | "B" | "C" | ... | "Z"
To refer to all <ascii_uppercase_letter>
element as descendant of a <first_name>
element, you thus write <first_name>..<ascii_uppercase_letter>
.
Hence, to make all uppercase letters X
, but only as part of a first name, you may write
$ fandango fuzz -f persons.fan -n 10 -c '<first_name>..<ascii_uppercase_letter> == "X"'
Xv Oud,6
Xvwh Ekapxe,0
Xv Hklpln,72733
Xelx Yo,9
Xgq Jt,6
Xagkms Tzrer,40
Xsk Ku,7
Xmg Ngylx,808
Xyfd Tq,2312
Xch Ft,5959
Chains#
You can freely combine []
, .
, and ..
into chains that select subtrees.
What would the expression <start>[0].<last_name>..<ascii_lowercase_letter>
refer to, for instance?
Solution
This is easy:
<start>[0]
is the first element of start, hence a<person_name>
..<last_name>
refers to the child of type<last_name>
..<ascii_lowercase_letter>
refers to all descendants of type<ascii_lowercase_letter>
Let’s use this in a constraint:
$ fandango fuzz -f persons.fan -n 10 -c '<start>[0].<last_name>..<ascii_lowercase_letter> == "x"'
Edvdu Wxxx,181
Ww Vx,79945
Zkxs Hxx,34388
Oosvp Kxxx,905
Ti Mxxxx,903
Kvdqgj Hx,94
Niu Nxxxxx,9
Icpe Kxxxxx,49
Ydc Dxxx,71286
Do Dxx,42
Quantifiers#
By default, whenever you use a symbol <foo>
in a constraint, this constraint applies to all occurrences of <foo>
in the produced output string.
For your purposes, though, one such instance already may suffice.
This is why Fandango allows expressing quantification in constraints.
Star Expressions#
Error
The *
syntax is not operational yet.
In Fandango, you can prefix an element with *
to obtain a collection of all these elements within an individual string.
Hence, *<name>
is a collection of all <name>
elements within the generated string.
This syntax can be used in a variety of ways.
For instance, we can have a constraint check whether a particular element is in the collection:
"Pablo" in *<name>
This constraint evaluates to True if any of the values in *<name>
(= one of the two <name>
elements) is equal to "Pablo"
.
*
-expressions are mostly useful in quantifiers, which we discuss below.
Existential Quantification#
To express that within a particular scope, at least one instance of a symbol must satisfy a constraint, use a *
-expression in combination with any()
:
any(CONSTRAINT for ELEM in *SCOPE)
where
SCOPE
is a nonterminal (e.g.<age>
);ELEM
is a Python variable; andCONSTRAINT
is a constraint overELEM
Hence, the expression
any(n.startswith("A") for n in *<name>)
ensures there is at least one element <name>
that starts with an “A”:
Let us decompose this expression for a moment:
The expression
for n in *<name>
lets Python iterate over*<name>
(all<name>
objects within a person)…… and evaluate
n.startswith("A")
for each of them, resulting in a collection of Boolean values.The Python function
any(list)
returnsTrue
if at least one element inlist
is True.
So, what we get is existential quantification:
$ fandango fuzz -f persons.fan -n 10 -c 'any(n.startswith("A") for n in *<name>)'
Agez Yjj,93
Avblz Mxf,1540
Aya Ue,68946
Aqt Ft,30
Aar Hoj,398
Ayyfuovb Zokxpm,21957
Aoy Eqzjtg,7567511
Aea Ue,68946
Avblz Mxf,9675
Agez Ysrl,93
Universal Quantification#
Where there are existential quantifiers, there also are universal quantifiers.
Here we use the Python all()
function; all(list)
evaluates to True only if all elements in list
are True.
We use a *
-expression in combination with all()
:
all(CONSTRAINT for ELEM in *SCOPE)
Hence, the expression
all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)
ensures that all sub-elements <ascii_lowercase_letter>
in <first_name>
have the value “a”.
Again, let us decompose this expression:
The expression
for c in *<first_name>..<ascii_lowercase_letter>
lets Python iterate over all<ascii_lowercase_letter>
objects within<first_name>
…… and evaluate
c == "a"
for each of them, resulting in a collection of Boolean values.The Python function
all(list)
returnsTrue
if all elements inlist
are True.
So, what we get is universal quantification:
$ fandango fuzz -f persons.fan -n 10 -c 'all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)'
Ia Vs,218
Taa Cfwy,62147
Ga Uf,960
Oaaaa Bvsj,092
Kaaaaa Pcnb,2
Raaa Udaqol,539
Jaaaa Jauo,688
Caa Bnluh,3052
Xaaaaa Sjhn,487
Zaaa Raxyw,1001
By default, all symbols are universally quantified within <start>
, so a dedicated universal quantifier is only needed if you want to limit the scope to some sub-element.
This is what we do here with <first_name>..<ascii_lowercase_letter>
, limiting the scope to <first_name>
.
Given the default universal quantification, you can actually achieve the same effect as above without all()
and without a *
. How?
Solution
You can access all <ascii_lowercase_letter>
elements within <first_name>
directly:
$ fandango fuzz -f persons.fan -n 10 -c '<first_name>..<ascii_lowercase_letter> == "a"'