I wanted to spin up a discussion on fuzzing the new compiler specifically.
It sounds like for a while, we will be using afl++ as our fuzzer and eventually we will get to swap to the zig integrated fuzzer (but probably not until at least 0.15.0).
I'm trying to figure out what tooling will make fuzzing as seamless as possible. I know that it can be quite painful to manage corpus's and enable anyone to fuzz.
I think it is really useful to keep around at least a minimized corpus to help exploration, but we probably want to keep that out of the repo to avoid eating up tons of space with random garbage files. So I'm thinking that we may just need to cache the corpus for CI.
Generally speaking, for each fuzzer, we need an input corpus (just basic starter examples). We can optionally add a dictionary (like a bunch of roc keywords and symbols or ast node names depending on the layer being fuzzed). It is probably good to take found crashes/regressions and add them to the input corpus such that they can be used as unit tests. I'm thinking that by default, CI would just run through the input corpus once to ensure there are no regressions.
Also, there is a chance that we can overlap the fuzzing input corpus with snapshot tests. Like we might be able to preprocess all the snapshot tests to turn them into the input corpus for the various fuzzers.
This is all just open preliminary thinking, but overall, I fell that we will likely want some scripts to manage AFL such that fuzzing is easy for anyone to run.
Currently fuzzing has a few different dependencies (like llvm). I think with the release of 0.14.0 there may be less required dependencies, but not completely sure. So we probably will want to use nix to manage the dependencies.
Almost certainly the first really useful fuzzer will be the parser to formatter loop.
Anyway, I'm totally open to any and all ideas. I'm sure @Joshua Warner has some thoughts.
Oh also, when fuzzing, definitely should use the gpa with leak checks enabled.
I think fuzzing would want to at least guarantee we have none of these:
Problem.CompilerProblem
Yeah, for many cases, we can only check those limited things. For some cases, like parse -> format -> parse -> format, we can check for equivalent formatted outputs both times.
Also, if we add any verifiers to IR (like the old compIler has check mono ir), we can use that as well. I'm hoping we add a number of these verifiers.
Oh, an if fuzzing is good enough at exploring (which it may not be), we could theoretically give it access to essentially a repl and have it fuzz the interpreter vs the llvm backend
I would also like to do things like run the compiled code, grab the output and also run the interpreter and assert the output is the same
Yeah, that is what I was suggesting with my last comment. Probably would run the interpreter first. If the code is valid in the interpreter, run the backend (should always pass). Then run the code and assert equivalent output.
When fuzzing my languages (a json replacement, a one-liner expression language, a html templating language), by making sure to turn off std.mem.eqlBytes_allowed
(Zig 0.14.0-dev, named std.mem.backend_can_use_eql_bytes
in 0.13.0), I got amazing performance from the fuzzer even without any corpus or dictionary.
Once my tokenizer(s) had been tested enough, I then worked on a simple valid-syntax generator in order to make sure the fuzzer would generate only valid HTML ASTs when flipping bytes, in order to reliably target the parser.
Here's what that looks like: https://github.com/kristoff-it/superhtml/blob/main/src/fuzz/astgen.zig
So the fuzzer writes ccuc
and the executable turns that into:
<div>
<div></div>
</div>
<div></div>
For a full blown programming language getting a valid source code generator up and running is a bit more involved, but Matthew Lugg has been working on one for Zig, you might take inspiration from his work once you want to start fuzzing deeper layers of the compiler.
Awesome to know!
Why does tokenization have StringBegin
but not StringEnd
? Seems strange to generate: [StringBegin, OpenCurly, Expr, CloseCurly, String]
. Also, Is there a reason we don't encode the $
in tokeniziation? Why is it OpenCurly
and not DollarCurly
or similar?
Context: trying to write a fuzzer for tokenization that tokenizes, prints in a bare bones form, then tokenizers again and asserts they are the same.
Intermediate looks something like:
zzz [zzzz!] { zzz: zzzzzzzz "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" }
zzzzzz zzz.Zzzzzz
zzzz! = |_zzzz|
z = "~~~~~"
Zzzzzz.zzzz!("~~~~~~~${z}"")
cc @Joshua Warner
@Brendan Hansknecht yes, that does need to be changed a bit before it's unambiguous
The _intent_ is to generate a sequence of StringBegin, <interpolation>, StringPart, <interpolation>, StringPart
(for example)
The ambiguity right now is the OpenCurly/CloseCurly delimiters need to be specific to interpolations - otherwise that's ambiguous with having a string followed by a curly brace (for whatever reason).
In terms of not having StringEnd, that's not needed for disambiguation (at least, as long as I fix up the interpolation thing above). You can have StringBegin, <interpolation>, StringPart, StringBegin - and that's unambiguously a string with an interpolation, followed by a second string without.
Joshua Warner said:
The _intent_ is to generate a sequence of
StringBegin, <interpolation>, StringPart, <interpolation>, StringPart
(for example)
I guess I found the first bug (though not via fuzzing, was found well setting up fuzzing)
It curently generates StringBegin, <interpolation>, StringPart, <interpolation>, String
have a fix on my branch
Ok, first real fuzz bug of the new compiler:
id:000000,sig:06,src:000000,time:423,execs:751,op:quick,pos:224
thread 36762112 panic: index out of bounds: index 226, len 225
/Users/bren077s/Projects/roc/src/check/parse/tokenize.zig:406:58: 0x1023efff3 in decodeUnicode (repro-tokenize)
const utf8_char = std.unicode.utf8Decode(self.buf[self.pos .. self.pos + len]) catch {
^
/Users/bren077s/Projects/roc/src/check/parse/tokenize.zig:1050:59: 0x1023ef4ef in tokenize (repro-tokenize)
const info = self.cursor.decodeUnicode(b);
^
Looks to be a buffer overflow due to assuming we have enough characters for a full unicode character
Yep, I have some pending tokenizer updates that I'm waiting on Anthony's PR to land prior to posting
Good find on the unicode thing (that one's new to me)!
Ah, and a second unicode bug the call to decodeUnicode
on line 754 will always fail. It's position is 1 past the current position of the byte buffer. Cause it is .
followed by unicode instead of just unicode.
Also, PR for tokenizer fuzzer: https://github.com/roc-lang/roc/pull/7607
Fuzzer essentially instantly finds bugs, but that is fine for merging. The merge will just make sure the fuzzers keep compiling and don't bitrot. And we can use it to slowly burn down bugs.
Brendan Hansknecht said:
self.buf[self.pos .. self.pos + len
self.buf[self.pos..][0..len]
slicing by length
Ok, folded a few fixes into the PR
Still crashes really quick (likely the mock module I am generating from the tokens does not quite match what the tokenizer expects)
Looks like your having a lot of fun Brendan :grinning_face_with_smiling_eyes:
I mean it is kinda fun seeing weird tokenizer edge cases that fail fuzzing. That said, my print and then retokenize definitely has some bugs (that or the tokenizer has some assumptions and lost state info, maybe both)
What command should I be running to kick off fuzzing?
joshw@Joshuas-MacBook-Air-3 ~/s/g/r/roc (parser-zig-rewrite)> zig build -Dllvm -Dfuzz
joshw@Joshuas-MacBook-Air-3 ~/s/g/r/roc (parser-zig-rewrite)> ./zig-out/bin/fuzz-cli
[-] FATAL: forkserver is already up, but an instrumented dlopen() library loaded afterwards. You must AFL_PRELOAD such libraries to be able to fuzz them or LD_PRELOAD to run outside of afl-fuzz.
To ignore this set AFL_IGNORE_PROBLEMS=1 but this will lead to ambiguous coverage data.
In addition, you can set AFL_IGNORE_PROBLEMS_COVERAGE=1 to ignore the additional coverage instead (use with caution!).
fish: Job 1, './zig-out/bin/fuzz-cli' terminated by signal SIGABRT (Abort)
./zig-out/AFLplusplus/bin/afl-fuzz -i src/fuzz/tokenize-corpus/ -o /tmp/tokenize-out/ zig-out/bin/fuzz-tokenize
also, you don't need -Dllvm
, but you do need a system install version of llvm for afl++ to compile. Sadly, I was unable to get afl++ to compile with static llvm (might have to loop back to that at another point)
Oh also, even without afl and all that hassle, you can zig build test -Dfuzz
and it will build a repro-tokenize
file. That file can take data from stdin, or a file arg and use it to reproduce directly. It also prints out a lot more info
Hmm, having a little trouble adapting this to Anthony's changes.
zig build test
works, but not zig build fuzz-tokenize
.
Does this make sense to you?
fuzz-tokenize
└─ install generated to repro-tokenize
└─ zig build-exe repro-tokenize Debug native 1 errors
tokenize.zig:3:27: error: import of file outside module path: '../../collections/utils.zig'
const exitOnOom = @import("../../collections/utils.zig").exitOnOom;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
referenced by:
tokenize: tokenize.zig:981:53
zig_fuzz_test_inner: /Users/joshw/src/github.com/roc-lang/roc/src/fuzz/tokenize.zig:500:14
remaining reference traces hidden; use '-freference-trace' to see all reference traces
Ah, yeah. It's this todo: https://github.com/roc-lang/roc/blob/33a2c663e00c9309624978913a4f9ade3e66113f/build.zig#L111-L113
Feel free to comment out the tokenizer test for now.
I'll fix it up in a bit
technically, moving all fuzz executables to src/fuzz-tokenize.zig
and using relative imports would fix this. Otherwise, we need something likesrc/lib.zig
and have all the fuzz executables go through that.
The issues is that if you directly import src/check/parse/tokenize.zig
as your root module file, it is only allowed to import things form src/check/parse/...
Interesting. Confusingly, that isn't the first such import in that file. But I guess it must be processing things out of order or something
And here I was looking forward to fuzzing the tokenizer :P
Anyway, feel free to just disable, not exactly sure when I will next have time to look at it, but I'll fix it up if no one else does first. Just gonna be flying tomorrow, so not sure timing.
Which test should I be commenting out?
https://github.com/roc-lang/roc/blob/33a2c663e00c9309624978913a4f9ade3e66113f/build.zig#L125-L138
Fix the fuzz tests: https://github.com/roc-lang/roc/pull/7612
Also adds a readme!
Looks like we had our first merge conflict. Something simple, tokenize fuzzer got fixed and merged at the same time the tokenize function got updated to have malformed nodes. So now the fuzzer doesn't handle reprinting any of the malformed nodes.
I don't currently have time to work on this. cc @Joshua Warner in case he has time. He probably can add printing for the malformed nodes quicker than anyone else cause he knows what they are all.
Otherwise, anyone can fix my just handing a few extra cases in a switch statement. That or for now, giving them empty handling just to unblock merging a PR.
https://github.com/roc-lang/roc/pull/7617
Right now this just bails out. The slightly better approach would be to copy the corresponding range in the input, and even better would be making sure we have enough information to trigger the same issue.
Yep. Sounds totally good
Also, @Joshua Warner, not sure it is the best use of time, but I think the fuzzer is now at the state where it finds reasonable tokenizer bugs. Some may be in the reprint, but I think a lot are due to minor mistakes around exact token type such that when reprinted in basic form it leads to bugs.
Cool, will take a look in a bit
Great talk about Tigerbeetle's fuzzing setup. Maybe there are some aspects we could adopt for Roc. https://www.hytradboi.com/2025/c222d11a-6f4d-4211-a243-f5b7fafc8d79-rocket-science-of-simulation-testing
Yep
Sounds like they are mostly solving the continuous fuzzing case.
Which may be less important for roc, but still useful to gleam from
that was a great talk, it finally clicked for me the "level triggered" vs "edge triggered" thing. based on your comment Brendan I think that point might not have sunk in for you yet
I only quickly skimmed it cause today has been busy, need to give it a proper watch still.
I think most initial statement still mostly stands. It isn't that async level triggered setup wouldn't be great. It is more that roc is still super early on with limited resources and fuzzing only run on local machines. Long term, I would love to have a similar setup, but currently fuzzing is local only for roc and not tied to CI at all. That said, maybe grabbing a single machine and setting up what tigerbeetled did wouldn't be as much work as I expect. Just feels like a large investment in infra than roc is ready for. Especially given somehow you would prefer to notify folks asyncronously on fuzzing failures and don't want too much of a backlog to build up.
But I definitely might be making a mountain out of a mole hill, they made it look relatively simply to orchestrate all of this.
Just feels like a large investment in infra than roc is ready for
I have a linux "server" sitting at home that I'm happy to leave running a fuzzer full-time
Not sure if that is the kind of infra you're referring to, but happy to offer that if it helps
Yeah, that is part of it
Then it is just extra ci flows, website or notifications for failures, and orchestration code.
and don't want too much of a backlog to build up.
One of the suggestions of the talk was to not worry about keeping old failing seeds around indefinitely but only keep the N most recent seeds and rely on the fact that unresolved issues will be found again.
Yeah, sounds like most of the work is a minor database and a web frontend for that (along with a CI machine to run things)
So not too bad
Also, we use a corpus based method which has implications for multi machine setups, but I think roc can just do a single machine setup which would avoid much of that hassle.
Given we do coverage guide fuzzing, I don't think we can have a single integer seed. I think our inputs will remain a blob of text. So not as portable. I guess we could minimize and base64 encode them to at least make them trivial to copy.
Yeah, maybe this is less work than I initially thought, maybe I'll try to hack something crazy simple together.
Has anyone managed to run our fuzzer on linux? I am trying to but keep hitting linking issues. Wondering if it may be distro/config specific.
I haven't tried tbh
Actually I did early on, but could get it working either
Small PR that enabled me to get fuzzing working on linux (though sadly with system install AFL instead of zig compiled AFL): https://github.com/roc-lang/roc/pull/7651
Please ignore the absolute ugliness of this site: https://roc-lang.github.io/roc-compiler-fuzz/
This will be really nice, looking forward to building fuzzers and getting some high scores :grinning_face_with_smiling_eyes:
Wow awesome!
yooooo this is sweet!!!
Also, that site is happily taking contributions if anyone wants to make it pretty.
Aside, I really love minimized repros:
zig build repro-tokenize -- -b YW5k -v
This is just passing and
to our tokenizer. Which breaks cause we assume that and
is &&
.
CC: @Joshua Warner real tokenizer bugs likely will pop up at that site (though some are with the fuzz harness as well).
Now we just need more fuzzers to start exploring more of the code.
@Brendan Hansknecht -- how does the src/fuzz-corpus/parse/grab_bag.roc
work?
Is that something that the fuzzer will use to start with... and then randomly modify?
Yeah, we are required to give the fuzzer at least one seed
That test case seemed reasonable to me
Also, currently parsing helloworld hangs and I didn't want to figure out fixing it. So I just used something different
Nice. Does it help being large with everything in there together? I wonder if it would be better to have multiple files with simpler syntax?
I was also wanting something similar for the snapshot tests... so I'm wondering if these things could/should be combined somehow.
But they are also quite different use cases so probably should be kept different.
The fuzzer does a really good job at exploring especially cause I enabled cmplog
in ci. It leads to llvm telling the fuzzer all values that are used in comparisons (strings, ints, etc). So starting corpus isn't too imporant.
In CI, the fuzzer with cache the corpus and keep growing it.
Yeah, fuzzers theoretically could share with snapshot tests (probably a good idea to ensure valid inputs). Would just need a tool to generate a corpus from the snapshots. For example, the tokenizer and parser fuzzers both use .roc
files as input. So a script to extract all the original source from the snapshot test cases would create an amazing starting corpus for those fuzzers.
If later fuzzers start with sexpr IR, sharing would also be nice, but likely more complex.
Nice. I looks like we've already got a fairly solid testing framework emerging...
(depending if the fuzzers parse the sexpr or programmatically generate it)
I could definitely have the snapshot tool extract the roc source from each snapshot and dump it into a fuzzer corpus folder somewhere
for simulation testing
I'm not sure our fuzzing is simulation testing cause nothing is simulated, but none the less is great automatic bug finding.
I could definitely have the snapshot tool extract the roc source from each snapshot and dump it into a fuzzer corpus folder somewhere
That would be great. If you just add a flag to the snapshot tool to dump all the .roc
files into a folder, I'll integrate that into the fuzz CI.
I'll do that in my next PR
nothing is simulated
We're simulating the source file that is being parsed...
haha, I guess. I've always associated simulation testing with fake networks and databases and disks and what not.
Hopefully we can move up the stack someday, and simulate more interesting things... like here's a (generated) expression that should evaluate to some expected value.
Will be interesting to see how far we can take it and what still explores well. At a minimum, we should be able to do repl style expressions.
but not sure how well the fuzzer will explore that (and also have to be careful about infinite loops)
Do you think it's ok for the snapshots to be copied in all at the same level... or would you want them to maintain whatever folder structure they had from the snapshots/
folder?
When copying them into the corpus folder given as an argument to the snapshot tool
Same level is preferred
Should I give them a psuedo random name... or basically keep whatever they came with
We should probably keep the name; otherwise as we modify the snapshots we won't know which one they match up to (and thus which one to edit).
If we used something like the hash of the content, then it'd be hard to know which fuzz corpus we should _remove_ because it's the old version of that input. Otherwise old inputs would keep piling up as fuzz corpus entries, and that's probably not super valuable.
I don't want to check these into the repo
Let's keep just the snapshots as the source of truth
For fuzzer, hash or random name is fine
That's at least my default opinion for this
Ahhh got it
In that case I'd do hash, but whatever is fine.
Hardcore test case: zig build repro-parse -- -b ''
Break the parser with an empty string (we just have an incorrect assert that assumes parsing will generate something).
As a general note, we have the ability to trigger the fuzzer targetting any branch/commit. So if you ever want a PR to get fuzzed, we can do that.
Looking at the snapshot tool, with the flag to copy source into our fuzz corpus.
One issue is giving the files a .roc
extension is that now git wants to commit them to our repo...
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/fuzz-corpus/cupgstzv.roc
src/fuzz-corpus/hdddlnro.roc
src/fuzz-corpus/kbdtxaol.roc
src/fuzz-corpus/tmtpdcnw.roc
src/fuzz-corpus/ufzxxvfu.roc
Do you think we should;
.snap
or something/src/fuzz-corpus/
, and if we want to add something there we need to git add
it manuallyI'm leaning towards (3) -- so I'll run with that for now in my draft PR
Yeah, 3 sounds good
So it runs all the different fuzzers every 4 hours, but only if theres a new commit?
I've just noticed the scheduled runs have skipped the actual fuzz run.
Yeah, I messed something up. Should be fixed already
And yeah, runs every 4 hours for 30 minutes. Seemed like a reasonable cadence for now. Will keep fuzzing the same commit and expanding the corpus if no new commits exist.
Brendan Hansknecht said:
Hardcore test case:
zig build repro-parse -- -b ''
Break the parser with an empty string (we just have an incorrect assert that assumes parsing will generate something).
Does it count as a real bugfix if I resolve this one? :sweat_smile:
As we say in the smash bros community, "we take those"
@Brendan Hansknecht -- how did you install LLVM on your mac?
Did you use brew?
I'm going to try the release from LLVM's github instead and see if I can get fuzzing working on my macos
I just used brew
Also, you don't need llvm and afl to run the repro
That repro should work for everyone with only zig as a dep
Yeah repro is fine... I was wanting to run the fuzzer
I couldn't get the brew version working.
I've tried with a downloaded LLVM... and providing the path but having trouble with that too
Hey got it working using brew... :tada:
Needed to add export PATH=$PATH:/opt/homebrew/opt/llvm@18/bin
to my zshrc so it could find brew's llvm-config
Got it running with
$ ./zig-out/AFLplusplus/bin/afl-fuzz -i src/fuzz-corpus/parse -o /tmp/fuzz-parse-out zig-out/bin/fuzz-parse
Screenshot 2025-03-07 at 17.23.53.png
My new commands to use the fuzzer with zig 0.14.0, thank you @Brendan Hansknecht for helping me workaround my issues to get back online fuzzing again
// for some reason homebrew llvm isn't working, but system -afl is...
brew install afl++
zig build -Dfuzz -Dsystem-afl
zig build snapshot -- --fuzz-corpus ./src/fuzz-corpus/
afl-fuzz -i ./src/fuzz-corpus/ -o /tmp/corpus zig-out/bin/fuzz-parse
My best guess is that there is a bug with this build script: https://github.com/allyourcodebase/AFLplusplus
As such, system afl is required instead of using zig built afl. Might just be some form of version overlap issue and not an issue with the actual build script. Need to do more testing at some point.
how, might just be that we need to wait for aflplusplus to update to zig-0.14_afl_4.31c
maybe afl 21c doesn't work with newer zig
I've got it down to a couple of seconds before we get a crash now... :tada: (on the parser)
New record for fuzz-parser - 2min, 23 sec before my first crash
Let me get these headers done and hopefully that goes up
Have you been posting the all of the crashing input?
I've been making a new snapshot for each failure, which I think will help prevent against regressions and seed future fuzzing efforts.
You can see all the ones I have so far in my PR https://github.com/roc-lang/roc/pull/7672
One fuzzing hang that comes up semi often is having a ton of {
brackets. When Roc reformats that, it adds essentially infinite spaces due to indentation. This fails fuzzing due to the fuzzer thinking it is a hang.
For example, one fuzz failure I saw recently had ~7k {
brackets. When formatted, that led to 92,910,578 spaces being printed. It is unsurprising that is too slow.
This leads me to a few questions:
{
brakects to just nest infinitely deep and have ridiculously long lines?I get that metric tons of {
is contrived, but I think it is still best practice to consider and handle so we can maintain robust fuzzing.
i think more than 100 levels of indentation is overkill
let alone 7k+
i would just just move indenting to a function that panics if it exceeds some limit
or to be more graceful, returns a parse error and a malformed node
Yeah, that is my thought, maybe after a certain level of nesting, we should just bail and return a parse error
i could add that to my current PR unless it was already merged
I would actually rather we handled this with a fuzzer change
pathological parsing cases like this can come up irl in generated code
where almost nobody notices, but then one person is totally blocked by it and has to try to make some complex workaround
what kind of change would you like to see? some sort of filter on the fuzz inputs?
so if the payoff is something like "everyone gets faster builds" (e.g. u16 line counts instead of u32) I'm ok with that, but if the only problem is the fuzzer itself, I'd rather address this by changing the fuzzer than by changing the parser
yeah something like that - I'm not sure what options would be best there!
Fair enough. Though if we format exceptionally nested code to be deeply indented, it won't be readable either.
a good thing to note is that if it's taking this level of complexity to crash the fuzzer we must be doing something pretty good
We have some basic crashes too, but I think we are overall doing well.
feel free to send basic crashes to me if they seem legit
Brendan Hansknecht said:
Fair enough. Though if we format exceptionally nested code to be deeply indented, it won't be readable either.
To be more concrete here. Formating this code to ident just slows down parsing. So if this would be coming from generated code that no one is expected to read, we would just be making the experience worse by making the file way latger and way slower to parse.
That said, I do agree that given this is a contrived example, limiting the fuzzer is reasonable too. In the fuzzer, I could pre scan for nesting depth and limit.
Aside, we decided on tabs as the canonical form, right? So the formatter should be changed to use tabs for indentation instead of spaces?
i didn't know that we made that decision but that should be easier
that cuts the number of characters per indented line by on average 4
so in that worst case example that's 7k tabs instead of 28k soaxes
That might actually be longer than max fuzzer input length.
So changing to tabs might actually remove the hangs
sweet
I think make is 8 or 16k
i can make that change tomorrow
should be just a few loc change
Looks like this is a wee bit harder than I expected. Zig multiline string literals don't allow literal tabs - they recommend a (IMHO) pretty insane system of postprocessing strings at comptime to replace some other sigil in the string with a tab
Also, there might be a tokenization error with tabs currently, but I'm not sure There is not. Just an issue with Zig
I'm pretty much going to need to move all parser tests that contain indentation out to snapshot files and make running snapshots more ergonomic during my development workflow
yikes
Anthony Bullard said:
i think more than 100 levels of indentation is overkill
To me this makes me think of the way that Elm's compiler refuses to have tuples with more than 3 elements, and gracefully explains the rationale to the user.
It could probably be a selling point if Roc didn't allow for .. 10? 20? levels of nesting. I think nesting can always be managed with some refactoring. If the language itself would encourage this behavior from the early days, I think the global quality of Roc code would just increase.
I can also see some "management selling points" when you can say that a language has built-in mandatory opinions about code complexity :sweat_smile:
@Brendan Hansknecht Here's my PR on this https://github.com/roc-lang/roc/pull/7786
Addressed all review feedback
@Anthony Bullard just to show you the most commonly found fuzz failure. It is a variant of:
zig build repro-parse -- -b MApmb3I= -v
Which in this case is:
0
for
Most of the failures hit this panic:
thread 26416288 panic: Should have gotten a valid pattern, pos=3 peek=EndOfFile
/Users/bren077s/Projects/roc/src/check/parse/Parser.zig:1266:24: 0x1050855c7 in parsePattern (repro-parse)
std.debug.panic("Should have gotten a valid pattern, pos={d} peek={s}\n", .{ self.pos, @tagName(self.peek()) });
Probably an easy fix around EOF handling
cool i can find a fix for this rep quick
are we parsing this as a statement? expr? module?
NVM i can just read the code :wink:
https://github.com/roc-lang/roc/pull/7792 @Brendan Hansknecht
I realised I could wire up our coordinate into the fuzzer really easily.
So I've been fuzzing the whole compiler pipeline (at least everything we have so far up to type checking)... and it's been really great so far.
In case this helps anyone... here is how I'm running the fuzzer for the roc check
zig compiler pipeline.
brew install afl++
rm -rf /tmp/corpus/default/crashes
zig build -Dfuzz -Dsystem-afl
afl-fuzz -i ./src/fuzz-corpus/ -o /tmp/corpus zig-out/bin/fuzz-canonicalize
And this is what it looks like in my terminal...
Screenshot 2025-06-28 at 17.48.10.png
where do the crashes end up?
roc-lang.github.io/roc-compiler-fuzz
what I mean is like if I run it locally and it reports a number of crashes, how do I reproduce an individual crash so I can try to fix it?
I think you just have to run it passing in a file as the first arg. zig run fuzz-canonicalize -- /tm/corpus/default/crashes/...
I think
Fuzzing can make such curious crashes at times:
0
pr000000e:{e:0}pr000000e={p:0r}
This leads to (either an infinite or near infinite loop) in check.check_types.unify.Unifier.gatherRecordFields
. I'm quite surprised this even makes it past parsing and to canonicalization.
zig build repro-canonicalize -- -b MApwcjAwMDAwMGU6e2U6MH1wcjAwMDAwMGU9e3A6MHJ9 -v
would like to see the snapshot for that failure
this reminds me i would like to have a META option to limit the stages run on a snapshot
that one looks like:
rec : { e : 0 }
rec = { p: 0r }
so I suspect it's getting typed as an error, and something about trying to gather up all the record fields in an erroneous record is the problem
in this case both the type annotation and the record expression are invalid, but not sure if that's required to repro
this is probably the sort of situation that breaks the old compiler when you try to do the "run anyway despite errors" thing, so it's pretty great to see the fuzzer turning it up! :grinning_face_with_smiling_eyes:
I'd love to see a count of how many runs of the fuzzer it takes to generate a file / statement (not expr) that is actually completely valid with no reports
Anthony Bullard said:
I'd love to see a count of how many runs of the fuzzer it takes to generate a file / statement (not expr) that is actually completely valid with no reports
Just need to make an inverted fuzzer that only fails if everything goes successful though the complete compiler stack
Regression tests generator lol
First tokenizer fuzz failure in a long long time: zig build repro-tokenize -- -b Jyc= -v
Seems to be related to new single quote changes. I think it is a bug on the formatting side technically rather than truly a tokenizer bug.
We now allow for empty single quote literals, which was not allowed before.
I'll take a look
Also, we are getting some fun canonicalize failures now like: zig build repro-canonicalize -- -b IiJ1PSc= -v
It leads to a zig slice that is invalid:
thread 219880510 panic: start index 1 is larger than end index 0
/Users/bren077s/Projects/roc/src/check/canonicalize.zig:1269:42: 0x104a227af in canonicalize_expr (repro-canonicalize)
const inner_text = token_text[1 .. token_text.len - 1];
Interesting, because there's a snapshot with an empty single quote. In such a case, ''
is of length 2 so the slice is [1..1]
. Looks like a problem in the tokenizer. Likely it creates the token too son
Of note, for the tokenizer fuzzer, we try to generate a "canonical" version of each token. Then retokenize a second time
So that "canonical" version is probably wrong for single quotes now. It probably needs to be allowed to be empty
Oh, I think the for loop here just needs to be from 0..length:
https://github.com/roc-lang/roc/blob/9a32c422f290713a312e18a96cb6f43c850aa4d0/src/check/parse/tokenize.zig#L1662-L1667
Or maybe 1..length-1
?
It should be length-1, right
Looks like this code generated only open single quote, truncating the closing one. So ''
becomes '
thus the slice [1..(1 - 1)]
https://github.com/roc-lang/roc/pull/7941
General question, it is fair to say that all files under 16KB should definitely complete roc check in under a second, right?
16KB is just an arbitrary number I set for fuzzing and I bet the true number should be higher, but compiler perf wise, I assume we want be able to roc check much much faster than that.
For reference, Dict.roc is 60KB and is only 1776 lines.
Of course in the worst case fuzzing experience, it will find code that takes maximal time and generates a metric ton of errors. So it isn't truly representative.
Just thinking about fuzzer hangs and settings.
Also, how the heck does an input like this pass parsing and get to canonicalization? This feels pretty deeply wrong to me:
0]r={s=||{r={s=||{s={r=||{l={s=||{s={s={v={r={s={v=||{c00st=0t=c00st(0)c00st(0)t=c00st(0)
I get we want the compiler to be able to run as much as possible, but this has to fail parsing, right?
Hmm... I guess it does fail parsing, but we just keep going anyway:
[0]: check.parse.AST.Diagnostic{ .tag = check.parse.AST.Diagnostic.Tag.missing_header, .region = check.parse.AST.TokenizedRegion{ .start = 0, .end = 1 } }
[1]: check.parse.AST.Diagnostic{ .tag = check.parse.AST.Diagnostic.Tag.expr_unexpected_token, .region = check.parse.AST.TokenizedRegion{ .start = 55, .end = 56 } }
[2]: check.parse.AST.Diagnostic{ .tag = check.parse.AST.Diagnostic.Tag.expr_unexpected_token, .region = check.parse.AST.TokenizedRegion{ .start = 56, .end = 57 } }
Brendan Hansknecht said:
Also, how the heck does an input like this pass parsing and get to canonicalization? This feels pretty deeply wrong to me:
0]r={s=||{r={s=||{s={r=||{l={s=||{s={s={v={r={s={v=||{c00st=0t=c00st(0)c00st(0)t=c00st(0)
I get we want the compiler to be able to run as much as possible, but this has to fail parsing, right?
Maybe we want to support droid mode. The robots can plug and and skip all the human whitespace nonsense.
Brendan Hansknecht said:
General question, it is fair to say that all files under 16KB should definitely complete roc check in under a second, right?
Hindley-Milner type inference has pathological asymptotic time complexity if you just keep nesting let
s (or defs in our case), and relies on the fact that in practice people don't actually do that
but if a fuzzer did that, it would presumably get bad :smile:
Brendan Hansknecht said:
Also, how the heck does an input like this pass parsing and get to canonicalization? This feels pretty deeply wrong to me:
0]r={s=||{r={s=||{s={r=||{l={s=||{s={s={v={r={s={v=||{c00st=0t=c00st(0)c00st(0)t=c00st(0)
I get we want the compiler to be able to run as much as possible, but this has to fail parsing, right?
I think the right answer here is that parsing should generate a ton of error nodes, but then when we proceed to canonicalization, it finds essentially no work to do because it's all error nodes, so canonicalization and type-checking end up being no-ops
so you get the same outcome as if we "stopped at parsing" except:
but if a fuzzer did that, it would presumably get bad
Makes senses, we'll see. The fuzzer just optimizes for new exploration so may be unlikely, but not really sure.
I think the right answer here is that parsing should generate a ton of error nodes, but then when we proceed to canonicalization, it finds essentially no work to do because it's all error nodes, so canonicalization and type-checking end up being no-ops
If I understand what is happening, the parser generates a mostly valid tree by automatically adding a bunch of }
s at the end. Can then runs with tons of recursive lambda and expression checks. Can is very slow. The end result of Can is mostly a bunch of unused variable and duplicate definition complaints.
One thing that a lot of fuzzers count as new coverage is loop counts (maybe recursion counts?) going over some threshold
Yeah, bucketed loop counts is new coverage
so it isn't down to the individual iteration, but it does count overall
I predict as soon as any low hanging fruit is cleared out, it'll start finding things like that (unless we dissuade it somehow!)
Would it be reasonable to put some limit on that let recursion and start erroring after that?
Yeah, I'm sure when it comes up we can work around it. That said, right now there are lots of hangs with can in general (though hang is a pretty loose definition. Like the example above is considered a hang on the CI machine (old/weak cpu), but only takes 250ms on my M1 mac. That said, something that short taking 250ms is almost certainly a perf bug.
So probably worthwhile currently to consider a failure.
agreed!
At least that is my thought
Just want to make sure that what we get currently that is considered a hang is useful. I think it is, but thought it would be worth double checking.
And yeah, we still have tons of low hanging crashes in both parse and can
Last updated: Jul 06 2025 at 12:14 UTC