Accessing Input Elements#

When dealing with complex input formats, attaching constraints can be complex, as elements can occur multiple times in a generated input. Fandango offers a few mechanisms to disambiguate these, and to specify specific contexts in which to search for elements.

Derivation Trees#

So far, we have always assumed that Fandango generates strings from the grammar. Behind the scenes, though, Fandango creates a far richer data structure - a so-called derivation tree that maintains the structure of the string and allows accessing individual elements. Every time Fandango sees a grammar rule

<symbol> ::= ...

it generates a derivation tree whose root is <symbol> and whose children are the elements of the right-hand side of the rule.

Let’s have a look at our persons.fan spec:

<start> ::= <person_name> "," <age>
<person_name> ::= <first_name> " " <last_name>
<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<age> ::= <digit>+

The <start> rule says

<start> ::= <person_name> "," <age>

Then, a resulting derivation tree for <start> looks like this:

_images/ce67d5b85467add2e5b5e73ee365d938cd469e7a75e638dd6fa08b9547ae2769.png

As Fandango expands more and more symbols, it expands the derivation tree accordingly. Since the grammar definition for <person_name> says

<person_name> ::= <first_name> " " <last_name>

the above derivation tree would be extended to

_images/4fa4968f75dd5a18575ee1c39c54f82eb03018bb238f5949af7dad5babf22b94.png

And if we next extend <age> and then <digit> based on their definitions

<age> ::= <digit>+

our tree gets to look like this:

_images/b1b0b00242a346fd4a244929b1376d5a28a8bf90ee454fb065ab79e380f39720.png

Repeating the process, it thus obtains a tree like this:

_images/3e0b8dbf999a2cf38f640b33de35c945c5f3f5fbe6201f798a4d304564937296.png

Note how the tree records the entire history of how it was created - how it was derived, actually.

To obtain a string from the tree, we traverse its children left-to-right, ignoring all nonterminal symbols (in <...>) and considering only the terminal symbols (in quotes). This is what we get for the above tree:

Ex Pltz,18

And this is the string Fandango produces. However, viewing the Fandango results as derivation trees allows us to access elements of the Fandango-produced strings and to express constraints on them.

Diagnosing Derivation Trees#

To examine the derivation trees that Fandango produces, use the --format=grammar output format. This produces the output in a grammar-like format, where children are indented under their respective parents. As an example, here is how to print a derivation tree from persons.fan:

$ fandango fuzz -f persons.fan -n 1 --format=grammar
<start> ::= <person_name> ',' <age>  # Position 0x0000 (0); 'Pl Seov,5150'
  <person_name> ::= <first_name> ' ' <last_name>  # Position 0x0001 (1); 'Pl Seov'
    <first_name> ::= <name>
      <name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter>  # 'Pl'
        <ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
          <_ascii_uppercase_letter> ::= 'P'  # Position 0x0002 (2)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'l'  # Position 0x0003 (3)
    <last_name> ::= <name>
      <name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter>  # 'Seov'
        <ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
          <_ascii_uppercase_letter> ::= 'S'  # Position 0x0004 (4)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'e'  # Position 0x0005 (5)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'o'  # Position 0x0006 (6)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'v'  # Position 0x0007 (7)
  <age> ::= <digit> <digit> <digit> <digit>  # '5150'
    <digit> ::= <_digit>
      <_digit> ::= '5'  # Position 0x0008 (8)
    <digit> ::= <_digit>
      <_digit> ::= '1'  # Position 0x0009 (9)
    <digit> ::= <_digit>
      <_digit> ::= '5'  # Position 0x000a (10)
    <digit> ::= <_digit>
      <_digit> ::= '0'  # Position 0x000b (11)

We see how the produced derivation tree consists of a <start> symbol, whose <first_name> and <last_name> children expand into <name> and letters; the <age> symbol expands into <digit> symbols.

The comments (after #) show the individual positions into the input, as well as the values of compound symbols.

What is the full string represented by the above derivation tree?

Tip

The --format=grammar option is great for debugging, especially binary formats.

Specifying Paths#

One effect of Fandango producing derivation trees rather than “just” strings is that we can define special operators that allow us to access subtrees (or sub-elements) of the produced strings - and express constraints on them. This is especially useful if we want constraints to apply only in specific contexts - say, as part of some element <a>, but not as part of an element <b>.

Accessing Children#

The expression <foo>[N] accesses the N-th child of <foo>, starting with zero.

If <foo> is defined in the grammar as

<foo> ::= <bar> ":" <baz>

then <foo>[0] returns the <bar> element, <foo>[1] returns ":", and <foo>[2] returns the <baz> element.

In our persons.fan derivation tree for Ex Pltz, for instance, <start>[0] would return the <person_name> element ("Ex Pltz"), and <start>[2] would return the <age> element (18).

We can use this to access elements in specific contexts. For instance, if we want to refer to a <name> element, but only if it is the child of a <first_name> element, we can refer to it as <first_name>[0] - the first child of a <first_name> element:

<first_name> ::= <name>

Here is a constraint that makes Fandango produce first names that end with x:

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>[0].endswith("x")'
Frsx Rncu,1
Goctlx Nstb,483
Hx Qgchm,52
Ffsx Rncu,1
Jox Npfz,2564
Wtcx Fmjdet,8
Vdrx Xbarg,8343570
Kejx Nzdhn,39692
Idohx Yujavfu,88284
Vzjnhanx Uluxutg,36511933

Tip

As in Python, you can use negative indexes to refer to the last elements. <age>[-1], for instance, gives you the last child of an <age> subtree.

Important

While symbols act as strings in many contexts, this is where they differ. To access the first character of a symbol <foo>, you need to explicitly convert it to a string first, as in str(<foo>)[0].

Slices#

Fandango also allows you to use Python slice syntax to access multiple children at once. <name>[n:m] returns a new (unnamed) root which has <name>[n], <name>[n + 1], …, <name>[m - 1] as children. This is useful, for instance, if you want to compare several children against a string:

$ fandango fuzz -f persons-faker.fan -n 10 -c '<name>[0:2] == "Ch"'
Christopher Chen,30
Christopher Chen,268
Christopher Chen,269
Chelsea Chen,42
Christopher Chan,845371
Christopher Chen,228
Christopher Chen,868
Christopher Chen,663
Chelsea Chen,233
Christopher Chen,9400109215185281683315

Would one also be able to use <start>[0:2] == "Ch" to obtain inputs that all start with "Ch"?

Indeed, to have the string start with "Ch", you (again) need to convert <start> into a string first, and then access its individual characters:

$ fandango fuzz -f persons-faker.fan -n 10 -c 'str(<start>)[0:2] == "Ch"'
Charles Richard,5
Christopher Collins,66
Charles Hernandez,50
Christopher Collins,63
Charles Richard,88651
Christopher Higgins,235
Cheryl Osborne,67031
Christopher Glenn,0
Chad Johnson,4967
Christopher Rodriguez,791638

Fandango supports the full Python slice semantics:

  • An omitted first index defaults to zero, so <foo>[:2] returns the first two children.

  • An omitted second index defaults to the size of the string being sliced, so <foo>[2:] returns all children starting with <foo>[2].

  • Both the first and the second index can be negative again.

Selecting Children#

Referring to children by number, as in <foo>[0], can be a bit cumbersome. This is why in Fandango, you can also refer to elements by name.

The expression <foo>.<bar> allows accessing elements <bar> when they are a direct child of a symbol <foo>. This requires that <bar> occurs in the grammar rule defining <foo>:

<foo> ::= ...some expansion that has <bar>...

To refer to the <name> element as a direct child of a <first_name> element, you thus write <first_name>.<name>. This allows you to express the earlier constraint in a possibly more readable form:

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>.<name>.endswith("x")'
Wqgzqx Awcx,657
Mjcklx Jzy,991
Vsx Nic,15006
Cbx Nszfth,8835
Tulrx Qf,58061
Gxeglx Phd,45
Mjcklx Jzy,3659
Tulrx Ul,58061
Wqgzqx Mwdjox,657
Efqheokx Htb,3

Note

You can only access nonterminal children this way; <person_name>." " (the space in the <person_name>) gives an error.

Selecting Descendants#

Often, you want to refer to elements in a particular context set by the enclosing element. This is why in Fandango, you can also refer to descendants.

The expression <foo>..<bar> allows accessing elements <bar> when they are a descendant of a symbol <foo>. <bar> is a descendant of <foo> if

  • <bar> is a child of <foo>; or

  • one of <foo>’s children has <bar> as a descendant.

If that sounds like a recursive definition, that is because it is. A simpler way to think about <foo>..<bar> may be “All <bar>s that occur within <foo>”.

Let us take a look at some rules in our persons.fan example:

<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<ascii_uppercase_letter> ::= "A" | "B" | "C" | ... | "Z"

To refer to all <ascii_uppercase_letter> element as descendant of a <first_name> element, you thus write <first_name>..<ascii_uppercase_letter>.

Hence, to make all uppercase letters X, but only as part of a first name, you may write

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>..<ascii_uppercase_letter> == "X"'
Xc Tb,23742
Xc Qoh,4
Xxmfjr Mkjds,14
Xyxbl Stbtyi,41
Xlhxv Bg,6222
Xvvghy Rasn,237
Xkqzif Vz,657
Xmg Hxrrri,0753
Xb Oyqrp,4
Xlx Ldagrg,3

Chains#

You can freely combine [], ., and .. into chains that select subtrees. What would the expression <start>[0].<last_name>..<ascii_lowercase_letter> refer to, for instance?

Let’s use this in a constraint:

$ fandango fuzz -f persons.fan -n 10 -c '<start>[0].<last_name>..<ascii_lowercase_letter> == "x"'
Htdyik Oxxx,60
Grcke Exxxx,5
Chzufi Fxx,74
Ejip Kxx,592
Cgrz Lx,1
Ytc Rxxx,9841
Lwgjo Exxx,288
Xj Yxx,10983
Dlsa Uxxxxx,638
Awjkpy Axxxx,03

Quantifiers#

By default, whenever you use a symbol <foo> in a constraint, this constraint applies to all occurrences of <foo> in the produced output string. For your purposes, though, one such instance already may suffice. This is why Fandango allows expressing quantification in constraints.

Star Expressions#

In Fandango, you can prefix an element with * to obtain a collection of all these elements within an individual string. Hence, *<name> is a collection of all <name> elements within the generated string. This syntax can be used in a variety of ways. For instance, we can have a constraint check whether a particular element is in the collection:

"Pablo" in *<name>

This constraint evaluates to True if any of the values in *<name> (= one of the two <name> elements) is equal to "Pablo". *-expressions are mostly useful in quantifiers, which we discuss below.

Existential Quantification#

To express that within a particular scope, at least one instance of a symbol must satisfy a constraint, use a *-expression in combination with any():

any(CONSTRAINT for ELEM in *SCOPE)

where

  • SCOPE is a nonterminal (e.g. <age>);

  • ELEM is a Python variable; and

  • CONSTRAINT is a constraint over ELEM

Hence, the expression

any(n.startswith("A") for n in *<name>)

ensures there is at least one element <name> that starts with an “A”:

Let us decompose this expression for a moment:

  • The expression for n in *<name> lets Python iterate over *<name> (all <name> objects within a person)…

  • … and evaluate n.startswith("A") for each of them, resulting in a collection of Boolean values.

  • The Python function any(list) returns True if at least one element in list is True.

So, what we get is existential quantification:

$ fandango fuzz -f persons.fan -n 10 -c 'any(n.startswith("A") for n in *<name>)'
Af Zjp,87604
Hbw Addy,2113
Cuehql Avrpi,26
Aadwu Iniv,11
Aotmh Ydxz,5501
Aj Mgwfw,2099
Hbw Addy,6113
Cuehql Avrpi,22
Ay Zjp,87604
Aadwu Yfj,11

Universal Quantification#

Where there are existential quantifiers, there also are universal quantifiers. Here we use the Python all() function; all(list) evaluates to True only if all elements in list are True.

We use a *-expression in combination with all():

all(CONSTRAINT for ELEM in *SCOPE)

Hence, the expression

all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)

ensures that all sub-elements <ascii_lowercase_letter> in <first_name> have the value “a”.

Again, let us decompose this expression:

  • The expression for c in *<first_name>..<ascii_lowercase_letter> lets Python iterate over all <ascii_lowercase_letter> objects within <first_name>

  • … and evaluate c == "a" for each of them, resulting in a collection of Boolean values.

  • The Python function all(list) returns True if all elements in list are True.

So, what we get is universal quantification:

$ fandango fuzz -f persons.fan -n 10 -c 'all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)'
Sa Dmpu,1459
Iaa Jhlp,90
Haa Ry,3
Raaaa Lel,37152
Gaa Imqqns,1900
Waaaaa Chdrj,30
Taa Pcp,911
Ta Ui,10746
Naaaa Epi,3618
Raaa Matx,212

By default, all symbols are universally quantified within <start>, so a dedicated universal quantifier is only needed if you want to limit the scope to some sub-element. This is what we do here with <first_name>..<ascii_lowercase_letter>, limiting the scope to <first_name>.

Given the default universal quantification, you can actually achieve the same effect as above without all() and without a *. How?