Fandango Language Reference#
This chapter specifies the exact syntax (and semantics) of Fandango specifications (.fan
files).
General Structure#
We specify the Fandango syntax in form of a Fandango specification.
A .fan
Fandango specification file consists of
grammar productions (
<production>
)constraints (
<constraint>
)Python code (
<python_statement>
).
<start> ::= <fandango>
<fandango> ::= <statement>*
<statement> ::= <production> | <constraint> | <python_statement> | <newline> | <comment>
Whitespace#
Besides grammar productions, constraints, and code, a .fan
file can contain newlines (<newline>
) and whitespace (<_>
).
<newline> ::= ('\r'? '\n' | '\r' | '\f') := '\n'
<_> ::= r'[ \t]*' := ' '
Physical and Logical Lines#
As in Python, one can join two physical lines into a logical by adding a backslash \
at the end of the first line.
Grammars#
A grammar is a set of productions.
Each production defines a nonterminal followed by ::=
and a number of alternatives.
An optional generator can define a Python function to produce a value during fuzzing.
Productions end with a newline or a ;
character.
<production> ::= (
<nonterminal> <_> '::=' <_> <alternatives>
<generator>? <comment>? (';' | <newline>))
Nonterminals#
A nonterminal is a Python identifier enclosed in angle brackets <...>
.
It starts with a letter (regular expression \w
) or an underscore (_
), followed by more letters, digits (regular expression \d
), or underscores.
<nonterminal> ::= '<' <name> '>'
<name> ::= r'(\w|_)(\w|\d|_)*'
Like Python, Fandango allows all Unicode letters and digits in identifiers.
Note
For portability, we recommend to use only ASCII letters a
…z
, A
…Z
, digits 0
…9
, and underscores _
in identifiers.
Alternatives#
The alternatives part of a production rule defines possible expansions (<concatenation>
) for a nonterminal, separated by |
.
<alternatives> ::= <concatenation> (<_> '|' <_> <concatenation>)*
Concatenations#
A concatenation is a sequence of individual operators (typically symbols such as strings).
<concatenation> ::= <operator> (<_> <operator>)*
Symbols and Operators#
An operator is a symbol (<symbol>
), followed by an optional repetition operator.
<operator> ::= <symbol> | <kleene> | <plus> | <option> | <repeat>
A symbol can be a nonterminal, a string or bytes literal, a number (for bits), a generator call, or (parenthesized) alternatives.
<symbol> ::= (
<nonterminal>
| <string_literal>
| <bytes_literal>
| <number>
| <generator_call>
| '(' <alternatives> ')'
)
Repetitions#
Any symbol can be followed by a repetition specification.
The syntax {N,M}
stands for a number of repetitions from N
to M
.
For example, <a>{3,5}
will match from 3 to 5 <a>
symbols.
Both N
and M
can be omitted:
Omitting
N
creates a lower bound of zero.Omitting
M
creates an infinite upper bound (i.e, any number of repetitions).The comma may not be omitted, as this would create confusion with
{N}
(see below).
Tip
In Fandango, the number of repetitions is limited.
Use the --max-repetitions M
flag to change the limit.
Fandango supports a number of abbreviations for repetitions:
The form
{N}
stands for{N, N}
(exactlyN
repetitions)The form
*
stands for{0,}
(zero or more repetitions)The form
+
stands for{1,}
(one or more repetitions)The form
?
stands for{0,1}
(an optional element)
<kleene> ::= <symbol> '*'
<plus> ::= <symbol> '+'
<option> ::= <symbol> '?'
<repeat> ::= (
<symbol> '{' <python_expression> '}'
| <symbol> '{' <python_expression>? ',' <python_expression>? '}')
String Literals#
Fandango supports the full Python syntax for string literals:
Short strings are enclosed in single (
'...'
) or double ("..."
) quotesLong strings are enclosed in triple quotes (
'''...'''
and"""..."""
)One can use escape sequences (
\n
) to express special characters
Fandango interprets Python raw strings (using an r
prefix, as in r'foo'
) as regular expressions.
During parsing, these are matched against the input; during fuzzing, they are instantiated into a matching string.
<string_literal> ::= (
r'[rR]'
| r'[uU]'
| r'[fF]'
| r'[fF][rR]'
| r'[rR][fF]')? ( <short_string> | <long_string>)
<short_string> ::= (
"'" (<string_escape_seq> | r"[^\\\r\n\f']")* "'"
| '"' (<string_escape_seq> | r'[^\\\r\n\f"]')* '"')
<string_escape_seq> ::= '\\' r'.' | '\\' <newline>
<long_string> ::= (
"'''" <long_string_item>* "'''"
| '"""' <long_string_item>* '"""')
<long_string_item> ::= <long_string_char> | <string_escape_seq>
<long_string_char> ::= r'[^\\]'
Byte Literals#
Byte literals (a string prefixed with b
) are interpreted as in Python; the rules for strings apply as well.
If a grammar can produce bytes (or bits), the associated files will be read and written in
binary
mode, reading and writing bytes instead of (encoded) characters.If the grammar contains bytes and strings, then strings will be written in UTF-8 encoding into the binary file.
See the section on binary files for more details.
<bytes_literal> ::= (
r'[bB]'
| r'[bB][rR]'
| r'[rR][bB]') (<short_bytes> | <long_bytes>)
<short_bytes> ::= (
"'" (<short_bytes_char_no_single_quote> | <bytes_escape_seq>)* "'"
| '"' ( <short_bytes_char_no_double_quote> | <bytes_escape_seq> )* '"')
<bytes_escape_seq> ::= '\\' r'[\u0000-\u007F]'
<short_bytes_char_no_single_quote> ::= (
r'[\u0000-\u0009]'
| r'[\u000B-\u000C]'
| r'[\u000E-\u0026]'
| r'[\u0028-\u005B]'
| r'[\u005D-\u007F]')
<short_bytes_char_no_double_quote> ::= (
r'[\u0000-\u0009]'
| r'[\u000B-\u000C]'
| r'[\u000E-\u0021]'
| r'[\u0023-\u005B]'
| r'[\u005D-\u007F]')
<long_bytes> ::= (
"'''" <long_bytes_item>* "'''"
| '"""' <long_bytes_item>* '"""')
<long_bytes_item> ::= <long_bytes_char> | <bytes_escape_seq>
<long_bytes_char> ::= r'[\u0000-\u005B]' | r'[\u005D-\u007F]'
Numbers#
Grammars can contain numbers, which are interpreted as bits.
While the Fandango grammar supports arbitrary numbers, only the number literals 0
and 1
are supported (possibly with repetitions).
<number> ::= <integer> | <float_number> | <imag_number>
<integer> ::= <decimal_integer> | <oct_integer> | <hex_integer> | <bin_integer>
<decimal_integer> ::= <non_zero_digit> <digit>* | '0'+
<non_zero_digit> ::= r'[1-9]'
<digit> ::= r'[0-9]'
<oct_integer> ::= '0' r'[oO]' <oct_digit>+
<oct_digit> ::= r'[0-7]'
<hex_integer> ::= '0' r'[xX]' <hex_digit>+
<hex_digit> ::= r'[0-9a-fA-F]'
<bin_integer> ::= '0' r'[bB]' <bin_digit>+
<bin_digit> ::= r'[01]'
<float_number> ::= <point_float> | <exponent_float>
<point_float> ::= <int_part>? <fraction> | <int_part> '.'
<exponent_float> ::= ( <int_part> | <point_float> ) <exponent>
<int_part> ::= <digit>+
<fraction> ::= '.' <digit>+
<exponent> ::= r'[eE]' r'[+-]'? <digit>+
<imag_number> ::= (<float_number> | <int_part>) r'[jJ]'
Generators#
A generator is an expression that is evaluated during fuzzing to produce a value, which is then parsed into the given nonterminal.
See the section on generators for details.
It is added at the end of a production rule, separated by :=
.
<generator> ::= <_> ':=' <_> <python_expression>
Future Fandango versions will also support invoking a generator as if it were a symbol.
<generator_call> ::= (<name>
| <generator_call> '.' <name>
| <generator_call> '[' <python_slices> ']'
| <generator_call> <python_genexp>
| <generator_call> '(' <python_arguments>? ')')
Constraints#
A constraint is a Python expression that is to be satisfied during fuzzing and parsing.
It is introduced by the keyword where
.
Constraints typically contain symbols (<...>
); these are allowed wherever values are allowed.
The constraint has to hold for all values of the given symbols.
Symbols in constraints have a DerivationTree
type; see the Derivation Tree Reference for details.
<constraint> ::= 'where' <_> <python_expression> <comment>? (';' | <newline>)
Selectors#
Symbols in constraints can take the following special forms:
<A>.<B>
: The constraint has to hold for all elements<B>
that are a direct child of<A>
<A>..<B>
: The constraint has to hold for all elements<B>
that are a direct or indirect child of<A>
For details, see the section on derivation trees.
<selector> ::= (
<selection>
| <selector> '.' <selection>
| <selector> '..' <selection>)
The [...]
operator allows accessing individual children of a derivation tree:
<selection> ::= <base_selection> | <base_selection> '[' <python_slices> ']'
These operators can be combined and parenthesized:
<base_selection> ::= <nonterminal> | '(' <selector> ')'
Python Code#
In a .fan
file, anything that is neither a grammar production rule nor a constraint is interpreted as Python code, parsed as <statement>
in the official Python grammar.
Also, in the above spec, any nonterminal in the form <python_NAME>
(say, <python_expression>
) refers to <NAME>
(say, <expression>
) in the official Python grammar.
For more details on Python syntax and semantics, consult the Python language reference.
The Full Spec#
You can access the above spec fandango.fan
for reference.
fandango.fan
is sufficient for parsing .fan
input without Python expressions or code:
$ echo '<start> ::= "a" | "b" | "c"' | fandango parse -f fandango.fan -o -
<start> ::= "a" | "b" | "c"
To complete the grammar, fandango.fan
provides placeholders for included Python elements:
<python_statement> ::= 'pass' <newline>
<python_slices> ::= '0:1'
<python_arguments> ::= '1'
<python_expression> ::= '1' | <selector>
<python_genexp> ::= '[for' <_> <name> <_> 'in' <_> <name> ':' <_> <python_expression> ']'
Hence, it is also possible to produce Fandango specs (with set Python code) using fandango.fan
.
Hence, Fandango can be fuzzed with itself:
$ fandango fuzz -f fandango.fan -n 1
# 17
<ÎïKÍ7AÚgìådgxDSXû> ::= (<ÂuäMºqÞ> <o> | Z <0noÌ>){1} | <ùnäHÞmNahAD¼8bº8Và> <Õ_MÐSÕìqßn4ÚÌ0çSr> # +
;
Note that such generated files satisfy the Fandango syntax, but not its semantics. For instance, one would have to add extra constraints such that all used nonterminals are defined.
Comments#
Comments are as in Python; they are introduced by a
#
character and continue until the end of the line. By convention, there are two spaces in front of the#
and one space after.Note
The actual implementation allows a comment at any end of a line.