Fandango Language Reference#

This chapter specifies the exact syntax (and semantics) of Fandango specifications (.fan files).

General Structure#

We specify the Fandango syntax in form of a Fandango specification.

A .fan Fandango specification file consists of

<start> ::= <fandango>
<fandango> ::= <statement>*
<statement> ::= <production> | <constraint> | <python_statement> | <newline> | <comment>

Whitespace#

Besides grammar productions, constraints, and code, a .fan file can contain newlines (<newline>) and whitespace (<_>).

<newline> ::= ('\r'? '\n' | '\r' | '\f') := '\n'
<_> ::= r'[ \t]*' := ' '

Physical and Logical Lines#

As in Python, one can join two physical lines into a logical by adding a backslash \ at the end of the first line.

Comments#

Comments are as in Python; they are introduced by a # character and continue until the end of the line. By convention, there are two spaces in front of the # and one space after.

<comment> ::= <_>{2} '#' <_> r'[^\r\n\f]'* <newline>

Note

The actual implementation allows a comment at any end of a line.

Grammars#

A grammar is a set of productions. Each production defines a nonterminal followed by ::= and a number of alternatives.

An optional generator can define a Python function to produce a value during fuzzing.

Productions end with a newline or a ; character.

<production> ::= (
    <nonterminal> <_> '::=' <_> <alternatives> 
    <generator>? <comment>? (';' | <newline>))

Nonterminals#

A nonterminal is a Python identifier enclosed in angle brackets <...>. It starts with a letter (regular expression \w) or an underscore (_), followed by more letters, digits (regular expression \d), or underscores.

<nonterminal> ::= '<' <name> '>'
<name> ::= r'(\w|_)(\w|\d|_)*'

Like Python, Fandango allows all Unicode letters and digits in identifiers.

Note

For portability, we recommend to use only ASCII letters az, AZ, digits 09, and underscores _ in identifiers.

Alternatives#

The alternatives part of a production rule defines possible expansions (<concatenation>) for a nonterminal, separated by |.

<alternatives> ::= <concatenation> (<_> '|' <_> <concatenation>)*

Concatenations#

A concatenation is a sequence of individual operators (typically symbols such as strings).

<concatenation> ::= <operator> (<_> <operator>)*

Symbols and Operators#

An operator is a symbol (<symbol>), followed by an optional repetition operator.

<operator> ::= <symbol> | <kleene> | <plus> | <option> | <repeat>

A symbol can be a nonterminal, a string or bytes literal, a number (for bits), a generator call, or (parenthesized) alternatives.

<symbol> ::= (
      <nonterminal>
    | <string_literal>
    | <bytes_literal>
    | <number>
    | <generator_call>
    | '(' <alternatives> ')'
    )

Repetitions#

Any symbol can be followed by a repetition specification. The syntax {N,M} stands for a number of repetitions from N to M. For example, <a>{3,5} will match from 3 to 5 <a> symbols.

Both N and M can be omitted:

  • Omitting N creates a lower bound of zero.

  • Omitting M creates an infinite upper bound (i.e, any number of repetitions).

  • The comma may not be omitted, as this would create confusion with {N} (see below).

Tip

In Fandango, the number of repetitions is limited. Use the --max-repetitions M flag to change the limit.

Fandango supports a number of abbreviations for repetitions:

  • The form {N} stands for {N, N} (exactly N repetitions)

  • The form * stands for {0,} (zero or more repetitions)

  • The form + stands for {1,} (one or more repetitions)

  • The form ? stands for {0,1} (an optional element)

<kleene> ::= <symbol> '*'
<plus>  ::= <symbol> '+'
<option> ::= <symbol> '?'
<repeat> ::= (
     <symbol> '{' <python_expression> '}'
   | <symbol> '{' <python_expression>? ',' <python_expression>? '}')

String Literals#

Fandango supports the full Python syntax for string literals:

  • Short strings are enclosed in single ('...') or double ("...") quotes

  • Long strings are enclosed in triple quotes ('''...''' and """...""")

  • One can use escape sequences (\n) to express special characters

Fandango interprets Python raw strings (using an r prefix, as in r'foo') as regular expressions. During parsing, these are matched against the input; during fuzzing, they are instantiated into a matching string.

<string_literal> ::= (
      r'[rR]'
    | r'[uU]'
    | r'[fF]'
    | r'[fF][rR]'
    | r'[rR][fF]')? ( <short_string> | <long_string>)
<short_string> ::= (
      "'" (<string_escape_seq> | r"[^\\\r\n\f']")* "'"
    | '"' (<string_escape_seq> | r'[^\\\r\n\f"]')* '"')
<string_escape_seq> ::= '\\' r'.' | '\\' <newline>
<long_string> ::= (
      "'''" <long_string_item>* "'''"
    | '"""' <long_string_item>* '"""')

<long_string_item> ::= <long_string_char> | <string_escape_seq>
<long_string_char> ::= r'[^\\]'

Byte Literals#

Byte literals (a string prefixed with b) are interpreted as in Python; the rules for strings apply as well.

  • If a grammar can produce bytes (or bits), the associated files will be read and written in binary mode, reading and writing bytes instead of (encoded) characters.

  • If the grammar contains bytes and strings, then strings will be written in UTF-8 encoding into the binary file.

  • See the section on binary files for more details.

<bytes_literal> ::= (
      r'[bB]'
    | r'[bB][rR]'
    | r'[rR][bB]') (<short_bytes> | <long_bytes>)
<short_bytes> ::= (
      "'" (<short_bytes_char_no_single_quote> | <bytes_escape_seq>)* "'"
    | '"' ( <short_bytes_char_no_double_quote> | <bytes_escape_seq> )* '"')
<bytes_escape_seq> ::= '\\' r'[\u0000-\u007F]'
<short_bytes_char_no_single_quote> ::= (
      r'[\u0000-\u0009]'
    | r'[\u000B-\u000C]'
    | r'[\u000E-\u0026]'
    | r'[\u0028-\u005B]'
    | r'[\u005D-\u007F]')

<short_bytes_char_no_double_quote> ::= (
      r'[\u0000-\u0009]'
    | r'[\u000B-\u000C]'
    | r'[\u000E-\u0021]'
    | r'[\u0023-\u005B]'
    | r'[\u005D-\u007F]')
<long_bytes> ::= (
      "'''" <long_bytes_item>* "'''"
    | '"""' <long_bytes_item>* '"""')

<long_bytes_item> ::= <long_bytes_char> | <bytes_escape_seq>
<long_bytes_char> ::= r'[\u0000-\u005B]' | r'[\u005D-\u007F]'

Numbers#

Grammars can contain numbers, which are interpreted as bits. While the Fandango grammar supports arbitrary numbers, only the number literals 0 and 1 are supported (possibly with repetitions).

<number> ::= <integer> | <float_number> | <imag_number>
<integer> ::= <decimal_integer> | <oct_integer> | <hex_integer> | <bin_integer>
<decimal_integer> ::= <non_zero_digit> <digit>* | '0'+
<non_zero_digit> ::= r'[1-9]'
<digit> ::= r'[0-9]'
<oct_integer> ::= '0' r'[oO]' <oct_digit>+
<oct_digit> ::= r'[0-7]'

<hex_integer> ::= '0' r'[xX]' <hex_digit>+
<hex_digit> ::= r'[0-9a-fA-F]'

<bin_integer> ::= '0' r'[bB]' <bin_digit>+
<bin_digit> ::= r'[01]'
<float_number> ::= <point_float> | <exponent_float>
<point_float> ::= <int_part>? <fraction> | <int_part> '.'
<exponent_float> ::= ( <int_part> | <point_float> ) <exponent>
<int_part> ::= <digit>+
<fraction> ::= '.' <digit>+
<exponent> ::= r'[eE]' r'[+-]'? <digit>+

<imag_number> ::= (<float_number> | <int_part>) r'[jJ]'

Generators#

A generator is an expression that is evaluated during fuzzing to produce a value, which is then parsed into the given nonterminal. See the section on generators for details. It is added at the end of a production rule, separated by :=.

<generator> ::= <_> ':=' <_> <python_expression>

Future Fandango versions will also support invoking a generator as if it were a symbol.

<generator_call> ::= (<name>
    | <generator_call> '.' <name>
    | <generator_call> '[' <python_slices> ']'
    | <generator_call> <python_genexp>
    | <generator_call> '(' <python_arguments>? ')')

Constraints#

A constraint is a Python expression that is to be satisfied during fuzzing and parsing. It is introduced by the keyword where.

Constraints typically contain symbols (<...>); these are allowed wherever values are allowed. The constraint has to hold for all values of the given symbols.

Symbols in constraints have a DerivationTree type; see the Derivation Tree Reference for details.

<constraint> ::= 'where' <_> <python_expression> <comment>? (';' | <newline>)

Selectors#

Symbols in constraints can take the following special forms:

  • <A>.<B>: The constraint has to hold for all elements <B> that are a direct child of <A>

  • <A>..<B>: The constraint has to hold for all elements <B> that are a direct or indirect child of <A>

For details, see the section on derivation trees.

<selector> ::= (
      <selection>
    | <selector> '.' <selection>
    | <selector> '..' <selection>)

The [...] operator allows accessing individual children of a derivation tree:

<selection> ::= <base_selection> | <base_selection> '[' <python_slices> ']'

These operators can be combined and parenthesized:

<base_selection> ::= <nonterminal> | '(' <selector> ')'

Python Code#

In a .fan file, anything that is neither a grammar production rule nor a constraint is interpreted as Python code, parsed as <statement> in the official Python grammar.

Also, in the above spec, any nonterminal in the form <python_NAME> (say, <python_expression>) refers to <NAME> (say, <expression>) in the official Python grammar.

For more details on Python syntax and semantics, consult the Python language reference.

The Full Spec#

You can access the above spec fandango.fan for reference.

fandango.fan is sufficient for parsing .fan input without Python expressions or code:

$ echo '<start> ::= "a" | "b" | "c"' | fandango parse -f fandango.fan -o -
<start> ::= "a" | "b" | "c"

To complete the grammar, fandango.fan provides placeholders for included Python elements:

<python_statement> ::= 'pass' <newline>
<python_slices> ::= '0:1'
<python_arguments> ::= '1'
<python_expression> ::= '1' | <selector>
<python_genexp> ::= '[for' <_> <name> <_> 'in' <_> <name> ':' <_> <python_expression> ']'

Hence, it is also possible to produce Fandango specs (with set Python code) using fandango.fan. Hence, Fandango can be fuzzed with itself:

$ fandango fuzz -f fandango.fan -n 1
  # 17
<ÎïKÍ7AÚgìådgxDSXû> ::= (<ÂuäMºqÞ> <o> | Z <0noÌ>){1} | <ùnäHÞmNahAD¼8bº8Và> <Õ_MÐSÕìqßn4ÚÌ0çSr>  # +
;

Note that such generated files satisfy the Fandango syntax, but not its semantics. For instance, one would have to add extra constraints such that all used nonterminals are defined.