Complex Input Structures

Complex Input Structures#

The context-free grammars that Fandango uses can specify very complex input formats. In particular, they allow specifying recursive inputs - that is, element types that can contain other elements of the same type again. In this chapter, we explore some typical patterns as they occur within grammars.

Recursive Inputs#

A textbook example of a recursive input formats is an arithmetic expression. Consider an operation such as addition (+). Its operands can be numbers (3 + 5), but can also be other expressions, as in 2 + (3 - 8). In a grammar for arithmetic expressions, this relationship is expressed as a rule

<expr> ::= <term> " + " <expr>

which indicates that the right-hand side of the addition + operator can be another expression. This is an example of a recursive grammar rule - a rule where an expansion refers back to the defined symbol.

Let us add a definition for <term>, too, defining it as a number:

<term> ::= <number>
<number>  ::= <digit>+
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

What is it that <expr> can now expand into? Have a look at the above rule first, and then check out the solution.

What we have here is a case of infinite recursion – the string would keep on expanding forever.

Warning

At this time, Fandango does not detect infinite recursions; it keeps running until manually stopped.

In order to avoid infinite recursion, we need to provide a non-recursive alternative, as in:

<expr> ::= <term> " + " <expr> | <term>

With this rule, we can now store the above definitions in a .fan file additions.fan and get Fandango running:

$ fandango fuzz -f additions.fan -n 10
1619 + 29 + 023
94887 + 79
95
19 + 4152
6 + 119 + 60 + 4711
028 + 0137
6466
150 + 059 + 23517 + 79 + 15
7387 + 288 + 83932 + 8
1788 + 77805

We see that the above rules yield nice (recursive) chains of additions.

Warning

For each recursion in the grammar, there must be a non-recursive alternative.

More Repetitions#

In our definition of <number>, above, we used the + operator to state that an element should be repeated:

<number> ::= <digit>+

Instead of using the + operator, though, we can also use a recursive rule. How would one do that?

Indeed, we can define a <number> as a <digit> that is followed by another “number”.

Hint

Using shorthands such as *, +, and ? can make Fandango parsers more efficient.

Arithmetic Expressions#

Let us put all of the above together into a grammar for arithmetic expressions. expr.fan defines additional operators (-, *, /), unary + and -, as well as subexpressions in parentheses. It also ensures the conventional order of operations, giving multiplication and division a higher rank than addition and subtraction. Hence, 2 * 3 + 4 gets interpreted as (2 * 3) + 4 (10), not 2 * (3 + 4) (14).

<start>   ::= <expr>
<expr>    ::= <term> " + " <expr> | <term> " - " <expr> | <term>
<term>    ::= <term> " * " <factor> | <term> " / " <factor> | <factor>
<factor>  ::= "+" <factor> | "-" <factor> | "(" <expr> ")" | <int>
<int>     ::= <digit>+

These are the expressions we get from this grammar:

$ fandango fuzz -f expr.fan -n 10
8
-+++++3592 + (07) / 93 / 17 + 0
-(+-+7 + 84 / 77 * 6 * 7 - 5) + 61
35 - ++++-74 / 01 * 4 + 95
--(1505 - 31) / 25 * 65 * 08 + 72
51 + 99 / -4 / (22) * 15
91 * 57 / 78 * 95 / 74 / 46 / 80 / 72 * 60 / 83 * 59 / 67 * 99 / 2 + 96
(--++-(9 - 29) + 39) + 90
(-(17) * 8) / 5 / 04 * 14 / 93 * 03 + 19
-26004 + (07) * 11 / 0 / 44 - 19

We see that the resulting expressions can become quite complex. If we had an arithmetic evaluation to test – say, from a programming language - these would all make a good start.

There are two ways we can still improve the grammar, though.

  1. First, many programming languages assign a special meaning to numbers starting with zero, so we’d like to get rid of these.

  2. Second, we only have integers so far. How would one go to extend the grammar with floating-point numbers?

Try this out for yourself by extending the above grammar.

The resulting grammar expr-float.fan produces all-valid numbers:

$ fandango fuzz -f expr-float.fan -n 10
(689 / 7) - 404.03 + 2
4107 / 607 * 2 + 9
((7 * 7 + 7) * 7 - 5) * 8 / 3 * 4 / 7 / 6 + 1
+--(191.45 * 6 / 5 / 4 + 4) * 9
880 + (9) * (2.11) + 4
((+(7 * 1 / 4 / 0 / 4 - 6)))
+1 / 6.2
(+4 / 92.97) * 8 * 6 * 0 * 6 + 5
75 / +910 * 1.89 / 9
(4483 * 770 / 0) - 6

With extra constraints, we can now have Fandango produce only expressions that satisfy further properties – say, evaluate to a value above 1000 in Python:

$ fandango fuzz -f expr-float.fan -n 10 -c 'eval(str(<start>)) > 1000'
748795 * 0.086 - 0
(7 * 6 * 4 - 8) * 6 * 2 * 8 / 4 * 3 / 5 * 9 / 7 - 1
+-52 * --+-120 + 6 - 4
4 * 84798 / +9 / 1 / 4 + 7
84716 - -+9 + 3 / 4 - 7
244602 - 155.10 * 6 / 2 * 9 + 9
++8128 * 182.86
+7369.79 + 1 / 7 * 7 * 5 + 1
+69602.217 * 6 * 7 * 4
(+43314 - 4) / 4 * 8 / 7 - 1

Note that some of these expressions raise divisions by zero errors:

ZeroDivisionError: float division by zero

In the next section, we’ll talk about accessing input elements in complex inputs, so we can impose further constraints on them.