Stream: compiler development

Topic: Property testing and fuzzing


view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 00:23):

If any one is interested, I wrote up a short doc on property testing, fuzzing, and what might fit into roc in the future. Mostly an overview. Does not get into any sort of specific proposal with details around how it would be implemented:
https://docs.google.com/document/d/1FyAErWemFd24kFNhSGnqzlMQC73bgGONpDQviBpS-go/edit?usp=sharing

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 00:23):

Comments on the doc or question/comments here all welcome.

view this post on Zulip Kiryl Dziamura (Jan 29 2024 at 01:53):

Always been fascinated by property-based testing and contract-based programming but never had a chance to try them.

It was so exciting to find a local except in roc! I immediately imagined how well it all may work together, and then I encountered the fuzzer repo and understood that the prop-based testing was a matter of time.

Thanks for the document, can’t wait to read it!

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 03:41):

Related to this, but I remade roc-fuzz recently. It is totally a prototype right now, but it has all the core pieces for a proof of concept. If the linking wasn't such a pain, it could even be a decent platform. On macos it does seem to just work though. On linux, linking takes some magic to make c++ happy.

On macos you can directly do:

./scripts/bundle.sh
ROC_LINK_FLAGS="-lc++" roc run  --fuzz --optimize --prebuilt-platform examples/basic.roc

Will get an output like this:

Usage: fuzz-target [--help] [--version] {fuzz,minimize,raw,reduce-corpus,show}

Optional arguments:
  -h, --help     shows help message and exits
  -v, --version  prints version information and exits

Subcommands:
  fuzz          Run the fuzz target and attempt to find bugs.
  minimize      Attempt to minimize a test case to the smallest possible input.
  raw           Allows raw access to the underlying libFuzzer cli.
  reduce-corpus Reduces the corpus size by only keeping samples that increase code coverage.
  show          Show the crash or test case inputs.

All of those command just work and have reasonable defaults. They use libFuzzer to do the actual fuzzing and the data is generated the same way as cargo-fuzz with a very similar Arbitrary ability implementation. It would be possible to auto derive Arbitrary if this was added to the compiler and it could auto generate any roc type. Not 100% sure I have everything setup correctly, but the base shell and idea is displayed pretty well.

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:01):

I like the idea of roc fuzz! :smiley:

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:02):

incidentally I think we should do our own code coverage, not rely on llvm

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:02):

we'll be able to do it way faster, and separately I'd like to have builtin code coverage tooling in some form anyway

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:02):

(faster as in runtime performance)

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:05):

one thing I'm curious about with the coverage-based generated inputs: what does the fuzzer specifically use the code coverage for?

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:05):

like are the generated inputs a function of the coverage somehow?

view this post on Zulip Richard Feldman (Jan 29 2024 at 04:07):

as an aside, an interesting thing we could potentially do is to fuzz effects without actually running them - like automatically generate the data that comes back from the host after I/O "happened"

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 05:13):

incidentally I think we should do our own code coverage, not rely on llvm

Interesting, never really thought about this much. That said, for fuzzing specifically, we may still want to implement the llvm sanitizer coverage (it is different from the the source code based line coverage). This is what Go does to have capability with outside fuzz engines if someone wants to use them.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 05:36):

what does the fuzzer specifically use the code coverage for?

So the fuzzing uses a form of code coverage called sanitizer coverage. For the flags that I turn on in llvm, here is essentially what we get:

  1. Every single basic block in the entire code has a single byte counter. Every time the basic block is run the counter is incremented. (Correction: it is for every edge so it is from basic block to basic block pairs technically that get counted)
  2. The program counter for each of these basic blocks is stored in a table as well (not 100% sure what this gets used for. I think it is mostly a unique key for each fuzzing locating that the fuzz engine can refer to and use to build structures to better understand the flow of the program. In LibFuzzer it looks like they use this to log when new function have been reached by fuzzing).
  3. On every indirect call a function is first passed to the a special hook function that can record where the call is jumping to (implemented in the fuzz engine).
  4. For every single cmp instruction and switch statement, the values being compared are passed to a special hook function (implemented by the fuzz engine). Also, the hook function specifically notes if the value is a constant or if 2 variables are being compared. This is great for getting values to test as inputs/parts of inputs.

Instead of simply being based on randomness, fuzzers are based on mutation of interesting inputs. Interesting inputs are inputs that explore new coverage paths.

In the case of go, they don't use most of the instrumentation. They just use the basic block counters (they probably will use more in the future, but they started simple). Their process in the roughest sense is:

  1. Start with an input from the coordinator (old things from the corpus, seed data, or random if none exists yet)
  2. Repeatedly mutate the input for x amount of tries.
  3. If new coverage is found, pass the new input to the coordinator marking it as interesting (they also do minimization here).
  4. On test failure, report to the user.
  5. Loop to 1.

The corpus will be regularly minimized such that it only keeps examples that lead to unique coverage paths. Also the individual examples will be minimized to the smallest examples that hit the same coverage. This helps to keep the corpus small and targeted.

Overall, this leads to keeping around samples that explore more and more of the code while ignore the swaths of random examples that don't manage to explore anything interesting at all.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 06:08):

Oh, interesting, they also have an option to record stack depth to help understand recursive functions. So many possible things to instrument.

Also, libFuzzer hooks into all of the memory copying and comparing functions so that it can use for example string comparison information to inform interesting values for fuzzing.

view this post on Zulip Richard Feldman (Jan 29 2024 at 12:29):

very cool! Yeah in principle it doesn't sound like a big change to add instructions like those to mono

view this post on Zulip Richard Feldman (Jan 29 2024 at 12:34):

I wonder how they deal with obsolete inputs, e.g. the data has been stored but the type of the outermost function has changed, so the stored inputs don't make sense anymore

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 15:43):

Go's file format has type info for each piece of data. Actually makes it a lot more human readable/writable. So they probably just let the user know of the type info being incorrect for their seeds/regression tests and just toss the rest of the corpus.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 15:44):

Also, mono probably isn't low level enough (it doesn't encode very single basic block, just high level branches) and it misses all of the builtins.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 16:16):

Anyway, long term, I'm sure we can add this somewhere in the compiler pipeline in a way that doesn't depend on llvm. The exception may be the builtins, but they probably can add this information separately at compiler build time, just serialize two version of the builtins (with/without llvm coverage).

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 22:55):

I was digging through some of the afl fuzzer docs and they give a good example of the power of fuzzing then looking at complex interfaces.

This is all looking at fuzzing the GNU patch utility. Used to apply diffs to a file.
All of these in similar amounts of execution cycles without any seed corpus:

This is a single chunk patch -c example. Just as a reference

Obviously for unit tests and smaller functions, the blind fuzzer would do a better relative job.

view this post on Zulip Johan Lövgren (Aug 26 2025 at 19:09):

Hi! I just wanted to pop in here to say that I was partly inspired by the discussion around property based testing and fuzzing here on Zulip and on the podcast to make a project related to this. I pitched and supervised a master thesis at my workplace with the aim to trying combining fuzzing and PBT in an existing .NET application: https://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1981750&dswid=-8206

view this post on Zulip Johan Lövgren (Aug 26 2025 at 19:12):

It was a bit a bit of a hassle to get the fuzzing engine working in tandem with .NET. So I can only agree with you @Brendan Hansknecht that it is quite unfortunate that getting fuzzing going on an arbitrary project is quite difficult.

view this post on Zulip Johan Lövgren (Aug 26 2025 at 19:19):

Another feature/difficulty, which I personally find more interesting, is that fuzzing was great at illuminating implicit constraints. A perhaps naive hope on our side was that we could replace our handwritten generators with the fuzzing engine. This was a goal since generators are somewhat arduous to write and can inadvertently introduce our own assumptions about the data. But then of course if our code is written to e.g. work with signed integers but only works for positive integers, and that is not reflected in the type, then the fuzzing engine will quickly generate illegal input data. So a possible lesson is that fuzzing-driven PBT will require more thoughtful design of the datatypes to be fuzzed. Hopefully in a way that is not too limiting.

view this post on Zulip Johan Lövgren (Aug 26 2025 at 19:22):

Another inspiration was this paper, which highlights difficulties with PBT in practice. (I really love PBT so this project aimed to explore ways in making it more palatable.)

view this post on Zulip Brendan Hansknecht (Aug 27 2025 at 01:20):

That's all really cool. Will have to look through the links when I get a chance


Last updated: Sep 09 2025 at 12:16 UTC