Checking Outputs

Checking Outputs#

Since Fandango makes use of specifications both to produce and to parse strings, it can actually combine both to

first send an input to a program under test; and
then parse its output to check if it produced the correct result.

For this purpose, Fandango provides a means to combine both input and output in a single specification, used by the Fandango talk command. Let us see how this works.

Under Construction

Checking outputs is currently in beta. Check out the list of open issues.

Interaction Testing#

So far, we have only considered two settings. In fuzzing, Fandango sends a synthesized input to the program under test:

        sequenceDiagram
    Fandango->>Program under Test: Some input

During parsing, Fandango accepts and processes outputs from the program under test:

        sequenceDiagram
    Program under Test-->>Fandango: Some output

What we want, though, is an interaction - a means to first send an input to the program, and then parse its output to check it:

        sequenceDiagram
    Fandango->>Program under Test: Some input
    Program under Test-->>Fandango: Some output

For this, we need to specify which parts of the interaction are supposed to be sent or received by which party in the interaction.

Simple Input/Output Testing#

Fandango allows a simple means to combine inputs and outputs in a single .fan specification. The key idea is to identify individual nonterminals with the party that is supposed to produce it. In Fandango, this is done by prefixing the nonterminal with a party name (an identifier), followed by a colon (:). Hence, a nonterminal <fandango:string> refers to a <string> element that would be produced by a party named fandango (whoever that might be).

Fandango conveniently defines two standard parties:

In refers to the standard input of the program under test; and
Out refers to the standard output of the program under test.

Hence, in a Fandango spec, <In:id> refers to an <id> element that is received (or input) by the program, and <Out:result> is a <result> element that is sent (or output) by the program.

Important

Remember that In and Out describe the interaction from the perspective of the program under test.

With this, we can already write a first specification.

The UNIX cat command accepts some input, and outputs this very input unchanged. In Fandango, this interaction can be described in a file cat.fan as follows:

<start> ::= <In:input> <Out:output>
<input> ::= <line>
<output> ::= <line>
<line> ::= r'.*\n'

In this specification,

<input> and <output> define the inputs and outputs of cat, respectively, as a <string>; and
<string> defines a regular expression standing for any sequence of characters, including newlines.

Let us use Fandango with this spec to test the cat program.

The Fandango `talk` command#

Fandango provides a talk command that allows testing interactions. Like fuzz and parse, it takes as argument a -f option, followed by a .fan file; however, this one must contain party specifications. The remainder of the command line is the program to be tested (possibly with arguments).

In our case, this is how the invocation of Fandango looks like:

$ fandango talk -f cat.fan cat

This command does not issue any outputs (all of them are being sent to cat), but here is what is happening behind the scenes:

The cat command sends back the input via its output;
Fandango receives and parses the cat output.

We can also specify multiple interactions, as in

$ fandango talk -f cat.fan -n 10 cat

Now, each time, cat is started anew, as shown in this diagram:

        sequenceDiagram
    Fandango->>cat: "eroih&^%^32"
    cat-->>Fandango: "eroih&^%^32"
    Fandango->>cat2: "0[9481]^^^\n\n"
    cat2-->>Fandango: "0[9481]^^^\n\n"
    Fandango->>cat3: "ewifehfba"
    cat3-->>Fandango: "ewifehfba"

Note

Once a communication party is set for a nonterminal, it need not be repeated for its constituents. In the above example, we can define <input> as <string> without restating the In: prefix; from the first line, it is clear that <input> comes from In. Also, this allows multiple parties to share the same elements (such as <string>).

Oracles#

So far, our .fan specification has not really checked whether cat operates correctly. It does check the cat output against the <string> regular expression - but that is a “match-all” expression, meaning that anything is valid.

To check whether the cat output is correct, we must compare it against the input we sent and ensure that input and output are identical. For this, constraints are the ideal tool, as they allow us to reference arbitrary elements in the entire interaction. In our case, this simple constraint would suffice:

str(<input>) == str(<output>)

This constraint defines the full behavior of cat; it acts as an oracle that determines whether the behavior of the program under test is correct or not.

Let us add this constraint using a where clause to cat.fan, resulting in cat-oracle.fan:

<start> ::= <In:input> <Out:output>
<input> ::= <lines>
<output> ::= <lines>
<lines> ::= <line>+
<line> ::= r'.*\n'
where str(<input>) == str(<output>)

Again, we can test, and normally, nothing should happen.

$ fandango talk -f cat-oracle.fan cat

So far, we have mostly seen constraints as a precondition - that is, a condition that makes inputs valid in the first place. Our constraint here acts as a postcondition – that is, a condition that checks the output, possibly based on earlier input features.

More Complex Interactions#

Let us now test a program whose interaction scheme is a bit more complex. The UNIX bc command accepts a line with an arithmetic expression, and then produces the result. It keeps on doing so until the input ends. To compute 2 + 2, we can enter

$ bc
>>> 2 + 2
4
>>> (Ctrl-D)

Here, >>> is the prompt of the bc program; it goes to stderr and is only produced in interactive settings, so we can ignore it. A typical interaction between Fandango and bc would thus look like this:

        sequenceDiagram
    Fandango->>bc: 2 + 2\n
    bc-->>Fandango: 4\n
    Fandango->>bc: 3 * 7\n
    bc-->>Fandango: 21\n

Let us define a sequence of 10 interactions with bc. Using our earlier expression grammar expr.fan, we can define such interactions in a .fan spec bc.fan:

include('expr.fan')

<start> ::= <interaction>
<interaction> ::= <In:input> <Out:output>
<input> ::= <expr> '\n'
<output> ::= <int> '\n'

We see that the <input> now is an expression; and the expected <output> is an integer. This is how we can test bc:

$ fandango talk -f bc.fan bc

Our .fan spec checks that the bc indeed produces integers, but it does not check whether the result is correct, too. How would one do this? (Hint: use the Python eval() function.)

Solution

Add a constraint that evaluates the expression (in Python) and compares it against the bc result.

where eval(str(<input>)) == int(<output>)

If we actually do this, we will find that there are a few differences between the way that Python and bc interpret expressions:

$ fandango talk -f bc.fan -n 1 -c 'eval(str(<input>)) == int(<output>)' bc

FandangoFailedError: (FandangoFailedError(...), 'Timed out while waiting for message from remote party. Expected message from party: Out')
Traceback (most recent call last):

  File "/opt/hostedtoolcache/Python/3.13.7/x64/bin/fandango", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/runner/work/fandango/fandango/src/fandango/cli/__init__.py", line 45, in main
    last_status = run(command, args)
  File "/home/runner/work/fandango/fandango/src/fandango/cli/commands.py", line 464, in run
    print_exception(e)
    ~~~~~~~~~~~~~~~^^^
  File "/home/runner/work/fandango/fandango/src/fandango/logger.py", line 45, in print_exception
    raise e
  File "/home/runner/work/fandango/fandango/src/fandango/cli/commands.py", line 462, in run
    command(args)
    ~~~~~~~^^^^^^
  File "/home/runner/work/fandango/fandango/src/fandango/cli/commands.py", line 332, in talk_command
    fandango.fuzz(
    ~~~~~~~~~~~~~^
        solution_callback=solutions_callback,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        **settings,
        ^^^^^^^^^^^
    )
    ^
  File "/home/runner/work/fandango/fandango/src/fandango/api.py", line 422, in fuzz
    for s in generator:
             ^^^^^^^^^
  File "/home/runner/work/fandango/fandango/src/fandango/api.py", line 292, in generate_solutions
    yield from self.fandango.generate(max_generations=max_generations, mode=mode)
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/algorithm.py", line 359, in generate
    yield from self._generate_io(max_generations=max_generations)
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/algorithm.py", line 555, in _generate_io
    raise FandangoFailedError(
        f"Timed out while waiting for message from remote party. Expected message from party: {', '.join(forecast.get_msg_parties())}"
    )
fandango.errors.FandangoFailedError: (FandangoFailedError(...), 'Timed out while waiting for message from remote party. Expected message from party: Out')

To ensure complete testing, we need to

avoid + and - prefixes; these are not understood by bc;
avoid leading zeros in numbers; these are not permitted in Python;
allow small differences between floating point numbers, or restrict ourselves to integer operations.

Right now, we leave this as an exercise to the reader :-)

Testing Strategies#

If you find that checking results is complicated, welcome to the world of testing! Specifically, you have just encountered the oracle problem - the effort in specifying what a correct result should be. While Fandango makes it easy to produce inputs and to decompose outputs, the burden of specification is still on you.

Here are some established techniques to ease the oracle problem:

Compare against a different implementation.: By using Python eval(), above, we already make our lives much easier. However, we could also compare against, say, a different bc implementation. This is called differential testing.
Compare against a different version.: After having made a change to a program, we can check it against an older version to make sure there are no unexpected changes in behavior. This is called regression testing.
Compare the result of equivalent inputs.: Send two inputs to a program that should produce the same result and check for differences. In the case of bc, for instance, any term <a> + <b> should yield the same result as <b> + <a>. This is called metamorphic testing.

Under Construction

Future versions of this tutorial will further detail these strategies and how to integrate them into Fandango.

Troubleshooting Interactions#

Since interactions are always being sent to some party, and since the party outputs are being processed by Fandango, it may not always be easy to track which data is being sent, and where.

However, you can also make use of interaction specs in the regular fuzz and parse commands. The special --party=PARTY option allows you to produce outputs or parse inputs for just one given party PARTY in the interaction. The effect of --party is that it excludes all other parties from the interaction, allowing to produce or parse strings for just one party.

As an example, consider again our bc.fan example:

include('expr.fan')

<start> ::= <interaction>
<interaction> ::= <In:input> <Out:output>
<input> ::= <expr> '\n'
<output> ::= <int> '\n'

This is the effect of --party=In. See how the Out: part of the interaction has been excluded, also excluding <output> from production:

# Automatically generated from 'bc.fan'.
#
include('expr.fan')

<start> ::= <interaction>
<interaction> ::= <In:<input>
<input> ::= <expr> '\n'
<output> ::= <int> '\n'

This is the effect of --party=Out, excluding the In part, and consequently, <input>:

# Automatically generated from 'bc.fan'.
#
include('expr.fan')

<start> ::= <interaction>
<interaction> ::= <Out:<output>
<input> ::= <expr> '\n'
<output> ::= <int> '\n'

Typically, you provide such a --party option directly as part of some fuzz or parse command. To see what typical inputs to bc look like, use:

$ fandango fuzz -f bc.fan --party=In -n 10 --format=value

74 * 00279 * 488 * (-(-+5) * 2949 + +--96443 * +--0 + (16)) * +3 * -31 * -75 - +9

-+51222 / +3218

+95 / (+(+-46443 / -(+(+-60931 + 219 / 03 / 69 + 21) * 437) - --03) * +-0 / -97)

-19 * -208 + +0214 / (-153 - ++--(++---+5 + ++0) / +67 * -2381 / 190 + 69) + --68

-+(++((-+---(706 / 83 * 68) - (+52)) * +18 / 09 + -45) * +34 * -8 * +53 / 99) + -184

(++-+0 - (3 * (614 * +88703 * -+22) * -+55 - 478) * -6 * +4 * 838 - -89) / +8 - 300

-(-((++++-++-++1 + 9894 / +-+-(-61 / 8) * +0 / 59978)) / -+05) * -66 / -5 / 622 - +8

+-(-(-13039 * (+-(36 / 94 / 49 - +04) / +12 / +4 - 217)) / ++90 + 8) / +8 + 11 + 66

57 + 2 - ((-+4 - 4096 - -9929 * --(0 / +7554 + -60) + +--3) / (89) - +87) - ++13

74

Conversely, to see what typical outputs from bc would be expected, use:

$ fandango fuzz -f bc.fan --party=Out -n 10 --format=value