Accessing Input Elements#

When dealing with complex input formats, attaching constraints can be complex, as elements can occur multiple times in a generated input. Fandango offers a few mechanisms to disambiguate these, and to specify specific contexts in which to search for elements.

Derivation Trees#

So far, we have always assumed that Fandango generates strings from the grammar. Behind the scenes, though, Fandango creates a far richer data structure - a so-called derivation tree that maintains the structure of the string and allows accessing individual elements. Every time Fandango sees a grammar rule

<symbol> ::= ...

it generates a derivation tree whose root is <symbol> and whose children are the elements of the right-hand side of the rule.

Let’s have a look at our persons.fan spec:

<start> ::= <person_name> "," <age>
<person_name> ::= <first_name> " " <last_name>
<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<age> ::= <digit>+

The <start> rule says

<start> ::= <person_name> "," <age>

Then, a resulting derivation tree for <start> looks like this:

_images/ce67d5b85467add2e5b5e73ee365d938cd469e7a75e638dd6fa08b9547ae2769.png

As Fandango expands more and more symbols, it expands the derivation tree accordingly. Since the grammar definition for <person_name> says

<person_name> ::= <first_name> " " <last_name>

the above derivation tree would be extended to

_images/4fa4968f75dd5a18575ee1c39c54f82eb03018bb238f5949af7dad5babf22b94.png

And if we next extend <age> and then <digit> based on their definitions

<age> ::= <digit>+

our tree gets to look like this:

_images/b1b0b00242a346fd4a244929b1376d5a28a8bf90ee454fb065ab79e380f39720.png

Repeating the process, it thus obtains a tree like this:

_images/3e0b8dbf999a2cf38f640b33de35c945c5f3f5fbe6201f798a4d304564937296.png

Note how the tree records the entire history of how it was created - how it was derived, actually.

To obtain a string from the tree, we traverse its children left-to-right, ignoring all nonterminal symbols (in <...>) and considering only the terminal symbols (in quotes). This is what we get for the above tree:

Ex Pltz,18

And this is the string Fandango produces. However, viewing the Fandango results as derivation trees allows us to access elements of the Fandango-produced strings and to express constraints on them.

Diagnosing Derivation Trees#

To examine the derivation trees that Fandango produces, use the --format=grammar output format. This produces the output in a grammar-like format, where children are indented under their respective parents. As an example, here is how to print a derivation tree from persons.fan:

$ fandango fuzz -f persons.fan -n 1 --format=grammar
<start> ::= <person_name> ',' <age>  # Position 0x0000 (0); 'Pl Seov,5150'
  <person_name> ::= <first_name> ' ' <last_name>  # Position 0x0001 (1); 'Pl Seov'
    <first_name> ::= <name>
      <name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter>  # 'Pl'
        <ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
          <_ascii_uppercase_letter> ::= 'P'  # Position 0x0002 (2)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'l'  # Position 0x0003 (3)
    <last_name> ::= <name>
      <name> ::= <ascii_uppercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter> <ascii_lowercase_letter>  # 'Seov'
        <ascii_uppercase_letter> ::= <_ascii_uppercase_letter>
          <_ascii_uppercase_letter> ::= 'S'  # Position 0x0004 (4)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'e'  # Position 0x0005 (5)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'o'  # Position 0x0006 (6)
        <ascii_lowercase_letter> ::= <_ascii_lowercase_letter>
          <_ascii_lowercase_letter> ::= 'v'  # Position 0x0007 (7)
  <age> ::= <digit> <digit> <digit> <digit>  # '5150'
    <digit> ::= <_digit>
      <_digit> ::= '5'  # Position 0x0008 (8)
    <digit> ::= <_digit>
      <_digit> ::= '1'  # Position 0x0009 (9)
    <digit> ::= <_digit>
      <_digit> ::= '5'  # Position 0x000a (10)
    <digit> ::= <_digit>
      <_digit> ::= '0'  # Position 0x000b (11)

We see how the produced derivation tree consists of a <start> symbol, whose <first_name> and <last_name> children expand into <name> and letters; the <age> symbol expands into <digit> symbols.

The comments (after #) show the individual positions into the input, as well as the values of compound symbols.

What is the full string represented by the above derivation tree?

Tip

The --format=grammar option is great for debugging, especially binary formats.

Specifying Paths#

One effect of Fandango producing derivation trees rather than “just” strings is that we can define special operators that allow us to access subtrees (or sub-elements) of the produced strings - and express constraints on them. This is especially useful if we want constraints to apply only in specific contexts - say, as part of some element <a>, but not as part of an element <b>.

Accessing Children#

The expression <foo>[N] accesses the N-th child of <foo>, starting with zero.

If <foo> is defined in the grammar as

<foo> ::= <bar> ":" <baz>

then <foo>[0] returns the <bar> element, <foo>[1] returns ":", and <foo>[2] returns the <baz> element.

In our persons.fan derivation tree for Ex Pltz, for instance, <start>[0] would return the <person_name> element ("Ex Pltz"), and <start>[2] would return the <age> element (18).

We can use this to access elements in specific contexts. For instance, if we want to refer to a <name> element, but only if it is the child of a <first_name> element, we can refer to it as <first_name>[0] - the first child of a <first_name> element:

<first_name> ::= <name>

Here is a constraint that makes Fandango produce first names that end with x:

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>[0].endswith("x")'
Igx Khr,2520
Hvhxx Yxwx,6546
Vvx Ttr,933
Beiplx Vpm,6
Ynx Ryxl,183
Oibox Cyym,6515
Buaenx Opy,8
Vx Ylposz,13551
Gwbnx Wh,95642
Dldrx Gbquv,603

Note

As in Python, you can use negative indexes to refer to the last elements. <age>[-1], for instance, gives you the last child of an <age> subtree.

Warning

While symbols act as strings in many contexts, this is where they differ. To access the first character of a symbol <foo>, you need to explicitly convert it to a string first, as in str(<foo>)[0].

Slices#

Fandango also allows you to use Python slice syntax to access multiple children at once. <name>[n:m] returns a new (unnamed) root which has <name>[n], <name>[n + 1], …, <name>[m - 1] as children. This is useful, for instance, if you want to compare several children against a string:

$ fandango fuzz -f persons-faker.fan -n 10 -c '<name>[0:2] == "Ch"'
Christine Chavez,95
Christopher Chavez,782
Christine Chavez,67079317
Christine Chavez,93
Christina Chavez,81524211
Christine Chavez,9748
Christine Chapman,36209
Christine Chavez,36209
Christopher Chavez,2791
Christine Chavez,6386

Would one also be able to use <start>[0:2] == "Ch" to obtain inputs that all start with "Ch"?

Indeed, to have the string start with "Ch", you (again) need to convert <start> into a string first, and then access its individual characters:

$ fandango fuzz -f persons-faker.fan -n 10 -c 'str(<start>)[0:2] == "Ch"'
Christina Ryan,80
Christopher Wilson,70
Charles Thomas,1
Chelsea Steele,67
Christina Ryan,3135
Christopher Lawrence,23
Charlene Gilbert,90
Christopher Miller,07
Christine Kelly,83
Christopher Lawrence,93

Fandango supports the full Python slice semantics:

  • An omitted first index defaults to zero, so <foo>[:2] returns the first two children.

  • An omitted second index defaults to the size of the string being sliced, so <foo>[2:] returns all children starting with <foo>[2].

  • Both the first and the second index can be negative again.

Selecting Children#

Referring to children by number, as in <foo>[0], can be a bit cumbersome. This is why in Fandango, you can also refer to elements by name.

The expression <foo>.<bar> allows accessing elements <bar> when they are a direct child of a symbol <foo>. This requires that <bar> occurs in the grammar rule defining <foo>:

<foo> ::= ...some expansion that has <bar>...

To refer to the <name> element as a direct child of a <first_name> element, you thus write <first_name>.<name>. This allows you to express the earlier constraint in a possibly more readable form:

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>.<name>.endswith("x")'
Lbcx Qqcm,529
Ix Dbqpu,70345
Zx Mgdw,4722
Cmx Dfqa,42
Qgzcx Ytgv,7783
Nzrphx Yp,28944
Cmx Szr,42
Lbcx Xov,27713
Kjecax Vqee,583
Fljrx Vpvlt,284

Note

You can only access nonterminal children this way; <person_name>." " (the space in the <person_name>) gives an error.

Selecting Descendants#

Often, you want to refer to elements in a particular context set by the enclosing element. This is why in Fandango, you can also refer to descendants.

The expression <foo>..<bar> allows accessing elements <bar> when they are a descendant of a symbol <foo>. <bar> is a descendant of <foo> if

  • <bar> is a child of <foo>; or

  • one of <foo>’s children has <bar> as a descendant.

If that sounds like a recursive definition, that is because it is. A simpler way to think about <foo>..<bar> may be “All <bar>s that occur within <foo>”.

Let us take a look at some rules in our persons.fan example:

<first_name> ::= <name>
<last_name> ::= <name>
<name> ::= <ascii_uppercase_letter><ascii_lowercase_letter>+
<ascii_uppercase_letter> ::= "A" | "B" | "C" | ... | "Z"

To refer to all <ascii_uppercase_letter> element as descendant of a <first_name> element, you thus write <first_name>..<ascii_uppercase_letter>.

Hence, to make all uppercase letters X, but only as part of a first name, you may write

$ fandango fuzz -f persons.fan -n 10 -c '<first_name>..<ascii_uppercase_letter> == "X"'
Xf Uka,243
Xagb Wb,09004
Xxvm Oy,0977
Xl Nfyk,95
Xc Xpmzjv,17447
Xba Xvzx,30
Xqw Ybjumq,6524
Xgvbdp Trcvd,47
Xnuw Ncazdz,986
Xg Bwps,3752

Chains#

You can freely combine [], ., and .. into chains that select subtrees. What would the expression <start>[0].<last_name>..<ascii_lowercase_letter> refer to, for instance?

Let’s use this in a constraint:

$ fandango fuzz -f persons.fan -n 10 -c '<start>[0].<last_name>..<ascii_lowercase_letter> == "x"'
Xf Kx,88029
Dr Mxx,29
Qatn Txx,1
Rwkdc Wxxxxx,03
Enphit Txxxx,03
Coblx Zxx,3519
Cltox Sxxx,43
Nnnld Xxxxx,087
Oewli Rxxx,959
Gssg Vx,9

Quantifiers#

By default, whenever you use a symbol <foo> in a constraint, this constraint applies to all occurrences of <foo> in the produced output string. For your purposes, though, one such instance already may suffice. This is why Fandango allows expressing quantification in constraints.

Star Expressions#

Error

The * syntax is not operational yet.

In Fandango, you can prefix an element with * to obtain a collection of all these elements within an individual string. Hence, *<name> is a collection of all <name> elements within the generated string. This syntax can be used in a variety of ways. For instance, we can have a constraint check whether a particular element is in the collection:

"Pablo" in *<name>

This constraint evaluates to True if any of the values in *<name> (= one of the two <name> elements) is equal to "Pablo". *-expressions are mostly useful in quantifiers, which we discuss below.

Existential Quantification#

To express that within a particular scope, at least one instance of a symbol must satisfy a constraint, use a *-expression in combination with any():

any(CONSTRAINT for ELEM in *SCOPE)

where

  • SCOPE is a nonterminal (e.g. <age>);

  • ELEM is a Python variable; and

  • CONSTRAINT is a constraint over ELEM

Hence, the expression

any(n.startswith("A") for n in *<name>)

ensures there is at least one element <name> that starts with an “A”:

Let us decompose this expression for a moment:

  • The expression for n in *<name> lets Python iterate over *<name> (all <name> objects within a person)…

  • … and evaluate n.startswith("A") for each of them, resulting in a collection of Boolean values.

  • The Python function any(list) returns True if at least one element in list is True.

So, what we get is existential quantification:

$ fandango fuzz -f persons.fan -n 10 -c 'any(n.startswith("A") for n in *<name>)'
Aaczi Jptuqc,84
Atbbci Khulnw,8
Aa Ypi,1106
Aekfm Oeyzz,1
Aeklm Oeyzz,1
Atbbci Kuulnw,8
Amupozbrn Klviiney,0688481
Aarznrknwbp Ygbwj,241
Aubucqlrp Lo,0172788667
Al Faxtpoxl,45506

Universal Quantification#

Where there are existential quantifiers, there also are universal quantifiers. Here we use the Python all() function; all(list) evaluates to True only if all elements in list are True.

We use a *-expression in combination with all():

all(CONSTRAINT for ELEM in *SCOPE)

Hence, the expression

all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)

ensures that all sub-elements <ascii_lowercase_letter> in <first_name> have the value “a”.

Again, let us decompose this expression:

  • The expression for c in *<first_name>..<ascii_lowercase_letter> lets Python iterate over all <ascii_lowercase_letter> objects within <first_name>

  • … and evaluate c == "a" for each of them, resulting in a collection of Boolean values.

  • The Python function all(list) returns True if all elements in list are True.

So, what we get is universal quantification:

$ fandango fuzz -f persons.fan -n 10 -c 'all(c == "a" for c in *<first_name>..<ascii_lowercase_letter>)'
Aaaaaa Ewphn,14
Haaa Rvoa,34
Saaaaa Vaaf,548
Waaaa Jgzar,4
Eaa Fsi,91
Baaaa Ryfv,58
Qaa Axu,359
Baa Lqz,57849
Eaaaaa Uw,98671
Ca Exanmq,51

By default, all symbols are universally quantified within <start>, so a dedicated universal quantifier is only needed if you want to limit the scope to some sub-element. This is what we do here with <first_name>..<ascii_lowercase_letter>, limiting the scope to <first_name>.

Given the default universal quantification, you can actually achieve the same effect as above without all() and without a *. How?