Property testing and fuzzing · compiler development

Stream: compiler development

Topic: Property testing and fuzzing

Brendan Hansknecht (Jan 29 2024 at 00:23):

If any one is interested, I wrote up a short doc on property testing, fuzzing, and what might fit into roc in the future. Mostly an overview. Does not get into any sort of specific proposal with details around how it would be implemented:
https://docs.google.com/document/d/1FyAErWemFd24kFNhSGnqzlMQC73bgGONpDQviBpS-go/edit?usp=sharing

Brendan Hansknecht (Jan 29 2024 at 00:23):

Comments on the doc or question/comments here all welcome.

Kiryl Dziamura (Jan 29 2024 at 01:53):

Always been fascinated by property-based testing and contract-based programming but never had a chance to try them.

It was so exciting to find a local except in roc! I immediately imagined how well it all may work together, and then I encountered the fuzzer repo and understood that the prop-based testing was a matter of time.

Thanks for the document, can’t wait to read it!

Brendan Hansknecht (Jan 29 2024 at 03:41):

Related to this, but I remade roc-fuzz recently. It is totally a prototype right now, but it has all the core pieces for a proof of concept. If the linking wasn't such a pain, it could even be a decent platform. On macos it does seem to just work though. On linux, linking takes some magic to make c++ happy.

On macos you can directly do:

./scripts/bundle.sh
ROC_LINK_FLAGS="-lc++" roc run  --fuzz --optimize --prebuilt-platform examples/basic.roc

Will get an output like this:

Usage: fuzz-target [--help] [--version] {fuzz,minimize,raw,reduce-corpus,show}

Optional arguments:
  -h, --help     shows help message and exits
  -v, --version  prints version information and exits

Subcommands:
  fuzz          Run the fuzz target and attempt to find bugs.
  minimize      Attempt to minimize a test case to the smallest possible input.
  raw           Allows raw access to the underlying libFuzzer cli.
  reduce-corpus Reduces the corpus size by only keeping samples that increase code coverage.
  show          Show the crash or test case inputs.

All of those command just work and have reasonable defaults. They use libFuzzer to do the actual fuzzing and the data is generated the same way as cargo-fuzz with a very similar Arbitrary ability implementation. It would be possible to auto derive Arbitrary if this was added to the compiler and it could auto generate any roc type. Not 100% sure I have everything setup correctly, but the base shell and idea is displayed pretty well.

Richard Feldman (Jan 29 2024 at 04:01):

I like the idea of roc fuzz! :smiley:

Richard Feldman (Jan 29 2024 at 04:02):

incidentally I think we should do our own code coverage, not rely on llvm

Richard Feldman (Jan 29 2024 at 04:02):

we'll be able to do it way faster, and separately I'd like to have builtin code coverage tooling in some form anyway

Richard Feldman (Jan 29 2024 at 04:02):

(faster as in runtime performance)

Richard Feldman (Jan 29 2024 at 04:05):

one thing I'm curious about with the coverage-based generated inputs: what does the fuzzer specifically use the code coverage for?

Richard Feldman (Jan 29 2024 at 04:05):

like are the generated inputs a function of the coverage somehow?

Richard Feldman (Jan 29 2024 at 04:07):

as an aside, an interesting thing we could potentially do is to fuzz effects without actually running them - like automatically generate the data that comes back from the host after I/O "happened"

Brendan Hansknecht (Jan 29 2024 at 05:13):

incidentally I think we should do our own code coverage, not rely on llvm

Interesting, never really thought about this much. That said, for fuzzing specifically, we may still want to implement the llvm sanitizer coverage (it is different from the the source code based line coverage). This is what Go does to have capability with outside fuzz engines if someone wants to use them.

Brendan Hansknecht (Jan 29 2024 at 05:36):

what does the fuzzer specifically use the code coverage for?

So the fuzzing uses a form of code coverage called sanitizer coverage. For the flags that I turn on in llvm, here is essentially what we get:

Every single basic block in the entire code has a single byte counter. Every time the basic block is run the counter is incremented. (Correction: it is for every edge so it is from basic block to basic block pairs technically that get counted)
The program counter for each of these basic blocks is stored in a table as well (not 100% sure what this gets used for. I think it is mostly a unique key for each fuzzing locating that the fuzz engine can refer to and use to build structures to better understand the flow of the program. In LibFuzzer it looks like they use this to log when new function have been reached by fuzzing).
On every indirect call a function is first passed to the a special hook function that can record where the call is jumping to (implemented in the fuzz engine).
For every single cmp instruction and switch statement, the values being compared are passed to a special hook function (implemented by the fuzz engine). Also, the hook function specifically notes if the value is a constant or if 2 variables are being compared. This is great for getting values to test as inputs/parts of inputs.

Instead of simply being based on randomness, fuzzers are based on mutation of interesting inputs. Interesting inputs are inputs that explore new coverage paths.

In the case of go, they don't use most of the instrumentation. They just use the basic block counters (they probably will use more in the future, but they started simple). Their process in the roughest sense is:

Start with an input from the coordinator (old things from the corpus, seed data, or random if none exists yet)
Repeatedly mutate the input for x amount of tries.
If new coverage is found, pass the new input to the coordinator marking it as interesting (they also do minimization here).
On test failure, report to the user.
Loop to 1.

The corpus will be regularly minimized such that it only keeps examples that lead to unique coverage paths. Also the individual examples will be minimized to the smallest examples that hit the same coverage. This helps to keep the corpus small and targeted.

Overall, this leads to keeping around samples that explore more and more of the code while ignore the swaths of random examples that don't manage to explore anything interesting at all.

Brendan Hansknecht (Jan 29 2024 at 06:08):

Oh, interesting, they also have an option to record stack depth to help understand recursive functions. So many possible things to instrument.

Also, libFuzzer hooks into all of the memory copying and comparing functions so that it can use for example string comparison information to inform interesting values for fuzzing.

Richard Feldman (Jan 29 2024 at 12:29):

very cool! Yeah in principle it doesn't sound like a big change to add instructions like those to mono

Richard Feldman (Jan 29 2024 at 12:34):

I wonder how they deal with obsolete inputs, e.g. the data has been stored but the type of the outermost function has changed, so the stored inputs don't make sense anymore

Brendan Hansknecht (Jan 29 2024 at 15:43):

Go's file format has type info for each piece of data. Actually makes it a lot more human readable/writable. So they probably just let the user know of the type info being incorrect for their seeds/regression tests and just toss the rest of the corpus.

Brendan Hansknecht (Jan 29 2024 at 15:44):

Also, mono probably isn't low level enough (it doesn't encode very single basic block, just high level branches) and it misses all of the builtins.

Brendan Hansknecht (Jan 29 2024 at 16:16):

Anyway, long term, I'm sure we can add this somewhere in the compiler pipeline in a way that doesn't depend on llvm. The exception may be the builtins, but they probably can add this information separately at compiler build time, just serialize two version of the builtins (with/without llvm coverage).

Brendan Hansknecht (Jan 29 2024 at 22:55):

I was digging through some of the afl fuzzer docs and they give a good example of the power of fuzzing then looking at complex interfaces.

This is all looking at fuzzing the GNU patch utility. Used to apply diffs to a file.
All of these in similar amounts of execution cycles without any seed corpus:

Using Blind fuzzing L which is essentially non guided property based testing. It reached the 4 first bytes of a valid -c patch (265 edges of code coverage). So essentially useless in terms of what was tested cause it never generated a full valid header.
Using only Edge coverage based fuzzing. It managed to generate an entire valid patch chunk (2,070 edges of code coverage).
Using the full AFL model with all types of instrumentation, it was able generate 4 full valid patch chunks (2,597 edges of code coverage).

This is a single chunk patch -c example. Just as a reference

*** /tmp/helloWorld.roc 2024-01-29 14:46:43.445777000 -0800
--- examples/helloWorld.roc 2024-01-29 14:51:22.331617107 -0800
***************
*** 3,7 ****
--- 3,8 ----
      imports [pf.Stdout]
      provides [main] to pf

+ # A test change to show patch `-c` format
  main =
      Stdout.line "Hello, World!"

Obviously for unit tests and smaller functions, the blind fuzzer would do a better relative job.

Last updated: Jul 26 2025 at 12:14 UTC