Data Generators and Fakers#

Often, you don’t want to generate totally random data; it suffices that some aspects of it are random. This naturally raises the question: Where can one get non-random, natural data from, and how can one integrate this into Fandango?

Augmenting Grammars with Data#

The straightforward solution would be to simply extend our grammar with more natural data. In order to obtain more natural first and last names in our ongoing names/age example, for instance, we could simply extend the persons.fan rule

<first_name> ::= <name>

to

<first_name> ::= <name> | "Alice" | "Bob" | "Eve" | "Pablo Diego José Francisco de Paula Juan Nepomuceno Cipriano de la Santísima Trinidad"

and extend the rule

<last_name> ::= <name>

to, say,

<last_name> ::= <name> | "Doe" | "Smith" | "Ruiz Picasso"

then we can have Fandango create names such as

Eve Wb,5491
Bob Tpldd,68
Cia Ruiz Picasso,39173
Alice Smith,7243
Pablo Diego José Francisco de Paula Juan Nepomuceno Cipriano de la Santísima Trinidad Ruiz Picasso,6
Bob Smith,20370
Bob Bfmm,00
Alice Smith,6519
Eve Smith,18201
Eve Tzzyn,2

Note that we still get a few “random” names; this comes as specified by our rules. By default, Fandango picks each alternative with equal likelihood, so there is a 20% chance for the first name and a 25% chance for the last name to be completely random.

Note

Future Fandango versions will have means to control these likelihoods.

Using Fakers#

Frequently, there already are data sources available that you’d like to reuse – and converting each of their elements into a grammar alternative is inconvenient. That is why Fandango allows you to specify a data source as part of the grammar - as a Python function that supplies the respective value. Let us illustrate this with an example.

The Python faker module is a great source of “natural” data, providing “fake” data for names, addresses, credit card numbers, and more.

Here’s an example of how to use it:

from faker import Faker
fake = Faker()
for i in range(10):
  print(fake.first_name())
Virginia
Richard
Misty
Diana
Daniel
Madison
Emily
William
Christopher
Penny

Have a look at the faker documentation to see all the fake data it can produce. The methods first_name() and last_name() are what we need. The idea is to extend the <first_name> and <last_name> rules such that they can draw on the faker functions. To do so, in Fandango, you can simply extend the grammar as follows:

<first_name> ::= <name> := fake.first_name()

The generator := EXPR assigns the value produced by the expression EXPR (in our case, fake.first_name()) to the symbol on the left-hand side of the rule (in our case, <first_name>).

Important

Whatever value the generator returns, it must be parseable by at least one of the alternatives in the rule. Our example works because <first_name> matches the format of fake.first_name().

Tip

If your generator returns a string, a “match-all” rule such as

<generated_string> ::= <char>* := generator()

will fit all possible string values returned by generator().

We can do the same for the last name, too; and then this is the full Fandango spec persons-faker.fan:

from faker import Faker
fake = Faker()

include('persons.fan')

<first_name> ::= <name> := fake.first_name()
<last_name> ::= <name> := fake.last_name()

Note

The Fandango include() function includes the Fandango definitions of the given file. This way, we need not repeat the definitions from persons.fan and only focus on the differences.

Note

Python code (from Python files) that you use in a generator (or in a constraint, for that matter) needs to be imported. Use the Python import features to do that.

Important

include(FILE) is for Fandango files, import MODULE is for Python modules.

This is what the output of the above spec looks like:

Jenny Novak,72
Lisa Gilbert,57
Carrie Castillo,29308
William Chapman,6
William Koch,08
Tina Stevenson,7
Joseph Carson,1978
Stephanie Vincent,5
Ryan Stuart,243
Carl Peterson,33

You see that all first and last names now stem from the Faker library.

Number Generators#

In the above output, the “age” fields are still very random, though. With generators, we can achieve much more natural distributions.

After importing the Python random module:

import random

we can make use of dozens of random number functions to use as generators. For instance, random.randint(A, B) return a random integer \(n\) such that \(A \le n \le B\) holds. To obtain a range of ages between 25 and 35, we can thus write:

<age> ::= <digit>+ := str(random.randint(25, 35));

Important

All Fandango generators must return strings or byte strings.

  • Use str(N) to convert a number N into a string

  • Use bytes([N]) to convert numbers N into bytes.

The resulting Fandango spec file produces the desired range of ages:

Dawn Davis,27
Timothy Tate,31
Kevin Campbell,34
Brandon Hill,32
James Jacobs,26
Jesse Love,27
Jennifer Arnold,26
Michelle Turner,28
Wendy Hill,28
Aaron Chambers,27

We can also create a Gaussian (normal) distribution this way:

<age> ::= <digit>+ := str(int(random.gauss(35)));

random.gauss() returns floating point numbers. However, the final value must fit the given symbol rules (in our case, <digit>+), so we convert the age into an integer (int()).

These are the ages we get this way:

Christopher Moore,34
Duane Blair,35
Kristin Brooks,35
Edward Greene,34
Angela Johnson,36
Earl Ellis,35
Alan Edwards,33
Rebecca Zavala,36
Mary Cooley,34
Todd Ware,34

In Statistical Distributions, we will introduce more ways to obtain specific distributions.

Generators and Random Productions#

In testing, you want to have a good balance between common and uncommon inputs:

  • Common inputs are important because they represent the typical usage, and you don’t want your program to fail there;

  • Uncommon inputs are important because they uncover bugs you may not find during alpha or beta testing, and thus avoid latent bugs (and vulnerabilities!) slipping into production.

We can easily achieve such a mix by adding rules such as

<first_name> ::= <name> | <natural_name>
<natural_name> ::= <name> := fake.first_name()

With this, both random names (<name>) and natural names (<natural_name>) will have a chance of 50% to be produced:

Christian Richards,5
Harry Rice,27
Ukonn Taylor,053
Brian Ellis,8
Vejbf Dean,2339
Fkhd Maxwell,3
Oya Atkins,89
Fpcu Williams,61105
Ikza Hall,09
Grgxkz Smith,1

Combining Generators and Constraints#

When using a generator, why does one still have to specify the format of the data, say <name>? This is so for two reasons:

  1. It allows the Fandango spec to be used for parsing existing data, and consequently, mutating it;

  2. It allows additional constraints to be applied on the generator result and its elements.

In our example, the latter can be used to further narrow down the set of names. If we want all last names to start with an S, for instance, we can invoke Fandango as

$ fandango fuzz -f persons-faker.fan -c '<last_name>.startswith("S")' -n 10

and we get

Jeffrey Shepherd,8044
Latoya Short,365
Charles Short,91
Melissa Shepherd,0
Sherri Shaw,76
Christopher Schmidt,62082
Frank Suarez,5823
Scott Strickland,4841
Gabriel Smith,7523
Kristy Shaw,7908

When to use Generators, and when Constraints#

One might assume that instead of a generator, one could also use a constraint to achieve the same effect. So, couldn’t one simply add a constraint that says

<first_name> == fake.first_name()

Unfortunately, this does not work.

The reason is that the faker returns a different value every time it is invoked, making it hard for Fandango to solve the constraint. Remember that Fandango solves constraints by applying mutations to a population, getting closer to the target with each iteration. If the target keeps on changing, the algorithm will lose guidance and will not progress towards the solution.

Likewise, in contrast to our example in Combining Generators and Constraints, one may think about using a constraint to set a limit to a number, say:

$ fandango fuzz -f persons-faker.fan -c '<last_name>.startswith("S")' -c 'int(<age>) >= 25 and int(<age>) <= 35' -n 10

This would work:

Vincent Smith,27
William Smith,27
Nicholas Small,27
Danielle Small,27
William Singleton,27
Riley Smith,27
Michelle Shepherd,27
James Sellers,27
^C
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.13.7/x64/bin/fandango", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/runner/work/fandango/fandango/src/fandango/cli/__init__.py", line 45, in main
    last_status = run(command, args)
  File "/home/runner/work/fandango/fandango/src/fandango/cli/commands.py", line 462, in run
    command(args)
    ~~~~~~~^^^^^^
  File "/home/runner/work/fandango/fandango/src/fandango/cli/commands.py", line 185, in fuzz_command
    population = fandango.fuzz(
        solution_callback=solutions_callback,
    ...<4 lines>...
        **settings,
    )
  File "/home/runner/work/fandango/fandango/src/fandango/api.py", line 388, in fuzz
    for s in generator:
             ^^^^^^^^^
  File "/home/runner/work/fandango/fandango/src/fandango/api.py", line 258, in generate_solutions
    yield from self.fandango.generate(max_generations=max_generations, mode=mode)
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/algorithm.py", line 357, in generate
    yield from self._generate_simple(max_generations=max_generations)
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/algorithm.py", line 405, in _generate_simple
    yield from self._perform_mutation(new_population)
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/algorithm.py", line 290, in _perform_mutation
    mutated_individual = yield from self.mutation_method.mutate(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/runner/work/fandango/fandango/src/fandango/evolution/mutation.py", line 76, in mutate
    ctx_tree = node_to_mutate.split_end()
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 679, in split_end
    inst = copy.deepcopy(self)
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 476, in __deepcopy__
    copied._parent = copy.deepcopy(self.parent, memo)
                     ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 476, in __deepcopy__
    copied._parent = copy.deepcopy(self.parent, memo)
                     ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 471, in __deepcopy__
    [copy.deepcopy(child, memo) for child in self._children]
     ~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 471, in __deepcopy__
    [copy.deepcopy(child, memo) for child in self._children]
     ~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 471, in __deepcopy__
    [copy.deepcopy(child, memo) for child in self._children]
     ~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/copy.py", line 144, in deepcopy
    y = copier(memo)
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 457, in __deepcopy__
    copied = DerivationTree(
        self.symbol,
    ...<5 lines>...
        origin_repetitions=list(self.origin_repetitions),
    )
  File "/home/runner/work/fandango/fandango/src/fandango/language/tree.py", line 138, in __init__
    if not isinstance(symbol, Symbol):
           ~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "<frozen abc>", line 119, in __instancecheck__
KeyboardInterrupt
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 2
      1 get_ipython().system('fandango fuzz -f persons-faker.fan -c \'<last_name>.startswith("S")\' -c \'int(<age>) >= 25 and int(<age>) <= 35\' -n 10 --validate --progress-bar=off')
----> 2 assert _exit_code == 0

AssertionError: 

But while the values will fit the constraint, they will not be randomly distributed. This is because Fandango treats and generates them as strings (= sequences of digits), ignoring thur semantics as numerical values. To obtain well-distributed numbers from the beginning, use a generator.

  1. If a value to be produced is random, it should be added via a generator.

  2. If a value to be produced is constant, it can go into a generator or a constraint.

  3. If a value to be produced must be part of a valid input, it should go into a constraint. (Constraints are checked during parsing and production.)