If any one is interested, I wrote up a short doc on property testing, fuzzing, and what might fit into roc in the future. Mostly an overview. Does not get into any sort of specific proposal with details around how it would be implemented:
https://docs.google.com/document/d/1FyAErWemFd24kFNhSGnqzlMQC73bgGONpDQviBpS-go/edit?usp=sharing
Comments on the doc or question/comments here all welcome.
Always been fascinated by property-based testing and contract-based programming but never had a chance to try them.
It was so exciting to find a local except
in roc! I immediately imagined how well it all may work together, and then I encountered the fuzzer repo and understood that the prop-based testing was a matter of time.
Thanks for the document, can’t wait to read it!
Related to this, but I remade roc-fuzz recently. It is totally a prototype right now, but it has all the core pieces for a proof of concept. If the linking wasn't such a pain, it could even be a decent platform. On macos it does seem to just work though. On linux, linking takes some magic to make c++ happy.
On macos you can directly do:
./scripts/bundle.sh
ROC_LINK_FLAGS="-lc++" roc run --fuzz --optimize --prebuilt-platform examples/basic.roc
Will get an output like this:
Usage: fuzz-target [--help] [--version] {fuzz,minimize,raw,reduce-corpus,show}
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
Subcommands:
fuzz Run the fuzz target and attempt to find bugs.
minimize Attempt to minimize a test case to the smallest possible input.
raw Allows raw access to the underlying libFuzzer cli.
reduce-corpus Reduces the corpus size by only keeping samples that increase code coverage.
show Show the crash or test case inputs.
All of those command just work and have reasonable defaults. They use libFuzzer to do the actual fuzzing and the data is generated the same way as cargo-fuzz with a very similar Arbitrary
ability implementation. It would be possible to auto derive Arbitrary
if this was added to the compiler and it could auto generate any roc type. Not 100% sure I have everything setup correctly, but the base shell and idea is displayed pretty well.
I like the idea of roc fuzz
! :smiley:
incidentally I think we should do our own code coverage, not rely on llvm
we'll be able to do it way faster, and separately I'd like to have builtin code coverage tooling in some form anyway
(faster as in runtime performance)
one thing I'm curious about with the coverage-based generated inputs: what does the fuzzer specifically use the code coverage for?
like are the generated inputs a function of the coverage somehow?
as an aside, an interesting thing we could potentially do is to fuzz effects without actually running them - like automatically generate the data that comes back from the host after I/O "happened"
incidentally I think we should do our own code coverage, not rely on llvm
Interesting, never really thought about this much. That said, for fuzzing specifically, we may still want to implement the llvm sanitizer coverage (it is different from the the source code based line coverage). This is what Go does to have capability with outside fuzz engines if someone wants to use them.
what does the fuzzer specifically use the code coverage for?
So the fuzzing uses a form of code coverage called sanitizer coverage. For the flags that I turn on in llvm, here is essentially what we get:
cmp
instruction and switch statement, the values being compared are passed to a special hook function (implemented by the fuzz engine). Also, the hook function specifically notes if the value is a constant or if 2 variables are being compared. This is great for getting values to test as inputs/parts of inputs.Instead of simply being based on randomness, fuzzers are based on mutation of interesting inputs. Interesting inputs are inputs that explore new coverage paths.
In the case of go, they don't use most of the instrumentation. They just use the basic block counters (they probably will use more in the future, but they started simple). Their process in the roughest sense is:
The corpus will be regularly minimized such that it only keeps examples that lead to unique coverage paths. Also the individual examples will be minimized to the smallest examples that hit the same coverage. This helps to keep the corpus small and targeted.
Overall, this leads to keeping around samples that explore more and more of the code while ignore the swaths of random examples that don't manage to explore anything interesting at all.
Oh, interesting, they also have an option to record stack depth to help understand recursive functions. So many possible things to instrument.
Also, libFuzzer hooks into all of the memory copying and comparing functions so that it can use for example string comparison information to inform interesting values for fuzzing.
very cool! Yeah in principle it doesn't sound like a big change to add instructions like those to mono
I wonder how they deal with obsolete inputs, e.g. the data has been stored but the type of the outermost function has changed, so the stored inputs don't make sense anymore
Go's file format has type info for each piece of data. Actually makes it a lot more human readable/writable. So they probably just let the user know of the type info being incorrect for their seeds/regression tests and just toss the rest of the corpus.
Also, mono probably isn't low level enough (it doesn't encode very single basic block, just high level branches) and it misses all of the builtins.
Anyway, long term, I'm sure we can add this somewhere in the compiler pipeline in a way that doesn't depend on llvm. The exception may be the builtins, but they probably can add this information separately at compiler build time, just serialize two version of the builtins (with/without llvm coverage).
I was digging through some of the afl fuzzer docs and they give a good example of the power of fuzzing then looking at complex interfaces.
This is all looking at fuzzing the GNU patch utility. Used to apply diffs to a file.
All of these in similar amounts of execution cycles without any seed corpus:
Blind fuzzing L
which is essentially non guided property based testing. It reached the 4 first bytes of a valid -c
patch (265 edges of code coverage). So essentially useless in terms of what was tested cause it never generated a full valid header.Edge coverage
based fuzzing. It managed to generate an entire valid patch chunk (2,070 edges of code coverage).AFL model
with all types of instrumentation, it was able generate 4 full valid patch chunks (2,597 edges of code coverage).This is a single chunk patch -c example. Just as a reference
Obviously for unit tests and smaller functions, the blind fuzzer would do a better relative job.
Last updated: Jul 06 2025 at 12:14 UTC