Complex Input Structures#
The context-free grammars that Fandango uses can specify very complex input formats. In particular, they allow specifying recursive inputs - that is, element types that can contain other elements of the same type again. In this chapter, we explore some typical patterns as they occur within grammars.
Recursive Inputs#
A textbook example of a recursive input formats is an arithmetic expression.
Consider an operation such as addition (+
).
Its operands can be numbers (3 + 5
), but can also be other expressions, as in 2 + (3 - 8)
.
In a grammar for arithmetic expressions, this relationship is expressed as a rule
<expr> ::= <term> " + " <expr>
which indicates that the right-hand side of the addition +
operator can be another expression.
This is an example of a recursive grammar rule - a rule where an expansion refers back to the defined symbol.
Let us add a definition for <term>
, too, defining it as a number:
<term> ::= <number>
<number> ::= <digit>+
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
What is it that <expr>
can now expand into?
Have a look at the above rule first, and then check out the solution.
Solution
Actually, the above rules mean that <expr>
would expand into an infinitely long string <term> + <term> + <term> + ...
What we have here is a case of infinite recursion – the string would keep on expanding forever.
Warning
At this time, Fandango does not detect infinite recursions; it keeps running until manually stopped.
In order to avoid infinite recursion, we need to provide a non-recursive alternative, as in:
<expr> ::= <term> " + " <expr> | <term>
With this rule, we can now store the above definitions in a .fan
file additions.fan and get Fandango running:
$ fandango fuzz -f additions.fan -n 10
1619 + 29 + 023
94887 + 79
95
19 + 4152
6 + 119 + 60 + 4711
028 + 0137
6466
150 + 059 + 23517 + 79 + 15
7387 + 288 + 83932 + 8
1788 + 77805
We see that the above rules yield nice (recursive) chains of additions.
Warning
For each recursion in the grammar, there must be a non-recursive alternative.
More Repetitions#
In our definition of <number>
, above, we used the +
operator to state that an element should be repeated:
<number> ::= <digit>+
Instead of using the +
operator, though, we can also use a recursive rule.
How would one do that?
Solution
Here’s an equivalent <number>
definition that comes without +
:
<number> ::= <digit> <number> | <digit>
Indeed, we can define a <number>
as a <digit>
that is followed by another “number”.
Hint
Using shorthands such as *
, +
, and ?
can make Fandango parsers more efficient.
Arithmetic Expressions#
Let us put all of the above together into a grammar for arithmetic expressions.
expr.fan defines additional operators (-
, *
, /
), unary +
and -
, as well as subexpressions in parentheses.
It also ensures the conventional order of operations, giving multiplication and division a higher rank than addition and subtraction.
Hence, 2 * 3 + 4
gets interpreted as (2 * 3) + 4
(10), not 2 * (3 + 4)
(14).
<start> ::= <expr>
<expr> ::= <term> " + " <expr> | <term> " - " <expr> | <term>
<term> ::= <term> " * " <factor> | <term> " / " <factor> | <factor>
<factor> ::= "+" <factor> | "-" <factor> | "(" <expr> ")" | <int>
<int> ::= <digit>+
These are the expressions we get from this grammar:
$ fandango fuzz -f expr.fan -n 10
8
-+++++3592 + (07) / 93 / 17 + 0
-(+-+7 + 84 / 77 * 6 * 7 - 5) + 61
35 - ++++-74 / 01 * 4 + 95
--(1505 - 31) / 25 * 65 * 08 + 72
51 + 99 / -4 / (22) * 15
91 * 57 / 78 * 95 / 74 / 46 / 80 / 72 * 60 / 83 * 59 / 67 * 99 / 2 + 96
(--++-(9 - 29) + 39) + 90
(-(17) * 8) / 5 / 04 * 14 / 93 * 03 + 19
-26004 + (07) * 11 / 0 / 44 - 19
We see that the resulting expressions can become quite complex. If we had an arithmetic evaluation to test – say, from a programming language - these would all make a good start.
There are two ways we can still improve the grammar, though.
First, many programming languages assign a special meaning to numbers starting with zero, so we’d like to get rid of these.
Second, we only have integers so far. How would one go to extend the grammar with floating-point numbers?
Try this out for yourself by extending the above grammar.
Solution
To get rid of numbers starting with 0
, we can introduce a <lead_digit>
:
<int> ::= <digit> | <lead_digit> <digits>
<lead_digit> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Note that the option of having a single 0
as an <int>
is still there, as a <digit>
can expand into any digit.
To include floating-point numbers, we can add a <float>
element:
<float> ::= <int> "." <digits>
Feel free to further extend this - say, with an optional exponent, or by making the <int>
optional (.001
).
The resulting grammar expr-float.fan
produces all-valid numbers:
$ fandango fuzz -f expr-float.fan -n 10
(689 / 7) - 404.03 + 2
4107 / 607 * 2 + 9
((7 * 7 + 7) * 7 - 5) * 8 / 3 * 4 / 7 / 6 + 1
+--(191.45 * 6 / 5 / 4 + 4) * 9
880 + (9) * (2.11) + 4
((+(7 * 1 / 4 / 0 / 4 - 6)))
+1 / 6.2
(+4 / 92.97) * 8 * 6 * 0 * 6 + 5
75 / +910 * 1.89 / 9
(4483 * 770 / 0) - 6
With extra constraints, we can now have Fandango produce only expressions that satisfy further properties – say, evaluate to a value above 1000 in Python:
$ fandango fuzz -f expr-float.fan -n 10 -c 'eval(str(<start>)) > 1000'
748795 * 0.086 - 0
(7 * 6 * 4 - 8) * 6 * 2 * 8 / 4 * 3 / 5 * 9 / 7 - 1
+-52 * --+-120 + 6 - 4
4 * 84798 / +9 / 1 / 4 + 7
84716 - -+9 + 3 / 4 - 7
244602 - 155.10 * 6 / 2 * 9 + 9
++8128 * 182.86
+7369.79 + 1 / 7 * 7 * 5 + 1
+69602.217 * 6 * 7 * 4
(+43314 - 4) / 4 * 8 / 7 - 1
Note that some of these expressions raise divisions by zero errors:
ZeroDivisionError: float division by zero
In the next section, we’ll talk about accessing input elements in complex inputs, so we can impose further constraints on them.