Bits and Bit Fields#
Some binary inputs use individual bits to specify contents.
For instance, you might have a flag
byte that holds multiple (bit) flags:
struct {
unsigned int italic: 1; // 1 bit
unsigned int bold: 1;
unsigned int underlined: 1;
unsigned int strikethrough: 1;
unsigned int brightness: 4; // 4 bits
} format_flags;
How does one represent such bit fields in a Fandango spec?
Representing Bits#
In Fandango, bits can be represented in Fandango using the special values 0
(for a zero bit) and 1
(for a non-zero bit).
Hence, you can define a <bit>
value as
<bit> ::= 0 | 1
With this, the above format_flag
byte would be specified as
<start> ::= <format_flag>
<format_flag> ::= <italic> <bold> <underlined> <strikethrough> <brightness>
<italic> ::= <bit>
<bold> ::= <bit>
<underlined> ::= <bit>
<strikethrough> ::= <bit>
<brightness> ::= <bit>{4}
<bit> ::= 0 | 1
A <format_flag>
symbol would thus always consist of these eight bits.
We can use the special option --format=bits
to view the output as a bit stream:
$ fandango fuzz --format=bits -f bits.fan -n 1 --start-symbol='<format_flag>'
fandango:WARNING: Symbol <start> defined, but not used
00110111
Note
The combination of --format=bits
and --start-symbol
is particularly useful to debug bit fields.
Internally, Fandango treats individual flags as integers, too.
Hence, we can also apply constraints to the individual flags.
For instance, we can profit from the fact that Python treats 0
as False and 1
as True:
$ fandango fuzz --format=bits -f bits.fan -n 10 -c '<italic> and <bold>'
11100010
11010101
11011110
11100100
11010110
11111111
11100110
11100001
11101100
11100101
Fandango strictly follows a âleft-to-rightâ order - that is, the order in which bits and bytes are specified in the grammar, the most significant bit is stored first.
Hence, we can also easily set the value of the entire brightness
field using a constraint:
$ fandango fuzz --format=bits -f bits.fan -n 1 -c '<brightness> == 0b1111'
00111111
Note
Fandango always strictly follows a âleft-to-rightâ order - that is, the order in which bits and bytes are specified in the grammar.
Of course, we can also give the number in decimal format:
$ fandango fuzz --format=bits -f bits.fan -n 1 -c '<brightness> == 15'
00111111
01001111
11111111
01101111
11101111
01111111
00101111
00001111
10101111
11001111
Note how the last four bits (the <brightness>
field) are always set to 1111
- the number 15.
Warning
When implementing a format, be sure to follow its conventions regarding
bit ordering (most or least significant bit first)
byte ordering (most or least significant byte first)
Parsing Bits#
Fandango also supports parsing inputs with bits.
This is what happens if we send a byte \xf0
(the upper four bits set) to the parser:
$ echo -n '\xf0' | fandango parse -f bits.fan -o - --format=bits
FandangoParseError: '<stdin>', position 0x0001 (1): mismatched input b'x'
FandangoParseError: 1 error(s) during parsing
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[6], line 2
1 get_ipython().system("echo -n '\\xf0' | fandango parse -f bits.fan -o - --format=bits --validate")
----> 2 assert _exit_code == 0
AssertionError:
We see that the input was properly parsed and decomposed into individual bits.
This is the resulting parse tree:
The grammar
format shows us that the values are properly assigned:
$ echo -n '\xf0' | fandango parse -f bits.fan -o - --format=grammar
Warning
To parse bits properly, they must come in multiples of eight.
Bits and Padding#
When generating binary inputs, you may need to adhere to specific lengths. Such lengths are often enforced by padding â that is, adding bits until the required length is achieved. For instance, let us assume you have a field consisting of some bits. However, the overall length of the field must be a multiple of eight to have it byte-aligned. For such padding, define the field as
<field> ::= <bits> <padding>
<padding> ::= 0*
combined with a constraint
where len(<field>) % 8 == 0
Note that applied on derivation trees, len()
always returns the number of child elements, not the string length; here, we use this to access the number of elements in <field>
.