For testing in the new compiler, I want to take a few lessons from both the current snapshot testing system in test_syntax, as well as making a similar system applicable across the rest of the compiler passes. I also want to take more inspiration from llvm lit
tests and the zig compiler tests.
My thinking is each test should be in a single file, with the output of passes (e.g. the ast) simply concatenated on the end of each test. You'd have the test source, then ####
(or some other separator), then the formatted version of that source, then a separator, then a pretty-printed form of the AST, then the canonicalized ir, etc. This is in contrast to the current snapshot tests, where the formatted code and the ast are in separate files from the input. I also want to include any problems directly into the file. This should hopefully minimize needing to swap back and forth between
In contrast to the current test_syntax
tests which heavily test the parser and formatter but generally aren't semantically valid, I'd like to focus more on having most tests ending up making it all the way thru the compiler. In cases we don't want this, we can include a system of directives at the top of the test in order to control what parts of the compiler we run, and which passes we include the output of.
Also in contrast to the current test_syntax
system, adding a new test shouldn't require recompiling. This necessarily means these tests will use a custom test harness - probably just a separate binary compiled from a different main file in the source tree.
Inspired by the zig compiler, we can have a syntax for specifying the contents of multiple test files as part of the same test case.
Here's a quick example:
#### Mode: expression # <-- this directive will cause us to parse and compile the input as if it was a repl input
"test".length()
#### Formatted: (same)
#### AST:
(pnc_call (str "test") (ident "length")())
#### Can IR:
<tbd>
#### etc etc
Thoughts?
I think it is also important to be able to start part way down the stack. I should be able to make a small Can IR and simply run the resolve step with nothing else. Or run the interpreter theoretically.
That does imply we have not just a "debug dump" functionality for each IR, but also the ability to _parse_ that IR
Agreed that's valuable tho
I think this is one of the best features of MLIR.
Really helps with testing during development
That said, can get messy at scale.
I don't think we necessarily need this for every IR, but at least the IRs around complex passes.
Often times, that makes a much more concise and understandable test than starting all the way at the top level IR.
That said, I am simply used to doing that for pass level unit tests at this point. That same could be done in pure zig if generating IR is simple enough
I think in the rust compiler generating IR is so painful that it makes unit testing passes less likely to happen and more painful
If generating a zig ast is trivial, we could consider doing that for pass level tests instead of snapshots.
What do you mean by "generating a zig ast"?
Oh, one other thought:
With lit
and FileCheck
, you can choose to ignore 90% of an ir and just check something very specific. Sometimes that is really useful for making tests more understandable (focuses on what is expected to change instead of printing the entire IR). Not sure that is the right choice a lot of the time though. Well it reduces verbosity, it can often miss bugs, so :shrug:
Ahhh I should read up on that
What do you mean by "generating a zig ast"?
If generating a Parse/Can/Resolve/etc IR ast directly in zig code is easy enough, then maybe that is enough for unit testing. If it is verbose or painful at all to generate said IR, I definitely think we want parsing at all levels.
Ahhh. My somewhat naive prediction is that'll end up being more verbose than will be ergonomic for doing lots of unit tests.
Maybe there are zig tricks I don't know tho?
Yeah, that is my exact same thought and question
So I bet we will want a parser for the various IRs
It will make them way more testable
As a side note, we totally could just directly use lit and file check assuming we have a flag or special executable that simply prints out the IRs. (not saying it is worth it, just noting)
and this is filecheck, really simple: https://llvm.org/docs/CommandGuide/FileCheck.html
I had just assumed it would be a folder for each test, with a different file containing each IR. But I guess a single file may be easier?
I’m not active on these forums (only lurking) but I’d like to break the silence and share an advice. If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes. Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests. An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.
Yep, they will change pretty often - and that’s why an important part of this will be automated tooling to update the checked in snapshots as necessary. (I didn’t mention that above, but it’s already part of the test_syntax snapshots)
If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes.
I pretty strongly disagree with this overall premise. While end to end tests are fundamental for a compiler like this, even a handful of mediocre pass level tests often are quite valuable. E2E tests are often painful to debug. If something can be caught by a pass level test, it saves tons of times. On top of that, testing the core pillars of various transformations can help make them a lot more robust (bonus if you can fuzz and arbitrary pass). Yes, it is more work, but it is super important work to make a robust compiler and I have seen this kind of work save tons of time.
Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests.
This is a second rewrite of the compiler so we do have very strong understanding of the passes and ordering. I'm sure we'll shift some around, but that's ok.
An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.
The same can always be said of unit tests. It doesn't matter if it is a compiler, webserver, app, game, etc. Yet, the industry has for the most part realized that unit tests lead to better software despite needing to be thrown away or rewritten when software changes.
I think trying to focus solely on end to end tests would be a big mistake and a way to fundamentally make our compiler more brittle and harder to debug.
As I mentioned above after working with mlir for years, one of its pieces of magic is simply that every ir can be printed and parsed. This enables spinning up tons of pass level unit tests. It makes way more robust software, is more observalble and debuggable.
Aside for @Joshua Warner: another reason to enable pass by pass snapshot tests is cause most of the functionally can be tested without seeing the monster updates of full compiler snapshot tests. So adding a feature to the parser won't affect the can ir to resolve ir snapshot tests at all. But adding said feature if you only have these e2e test will affect all tests. Just makes it harder to validate what changed. (Still need the E2E tests, but maybe not as many)
One thing about forcing tests to start as real roc code that at least sounds good on the surface to me, is that forces us never have ambiguity about, "yeah but that state of the IR is not something the previous pass would ever generate so why are we trying to handle this". If there's a test, then it's unambiguously possible to hit that part of the state space. Combine that with good fuzzing that hopefully reaches fairly deep into the compiler (far from proven!), and you can maybe have the best of both worlds.
Yeah, I think that is great for larger examples. I think for smaller features and simple algorithm tests, the snippets are much nicer.
That said, the golden standard is for every ir to have a verifier for what is valid or not
And if every ir has a parser and verifier, you can start fuzzing at any point in the compiler and stop at any point. Which should enable much better fuzzing exploration
Not to mention the fuzzer can break if any invariant to broken at any or level
Though obviously correct by construction IRs see the best.
Not sure how well this can apply to your use case but it works perfectly well for me in Zine: https://kristoff.it/blog/dead-simple-snapshot-testing/ (bonus mention of 2 snapshot testing frameworks for Zig in the post)
Oh nice, that looks like a great strategy!
I started looking at function lifting, and it's going to be much easier to write tests for the build stages like that once we have IR dumping and parsing in place. Did anyone have specific ideas about how this should be implemented? Is the idea to manually write a sexpr printer and parser for each IR?
I think at least a printer
If we implement the phases roughly in order, we can just have Roc source go in, and each S-expr printer come out, we glue everything together, and check that the file has the expected content
If we want to be able to work on the later stages of the compiler before everything else is implemented I think we'll need to write parsers also. Manually writing the IR for the previous stage seems prohibitively tedious
Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. @Isaac Van Doren any way I can help out in writing the parsers/printers?
I'd say it would be helpful to take the IR in src/build/lift_functions/IR.zig
and have a go at building a pretty printer / S-expression thing for it
I have come up with a method for making snapshots in https://github.com/roc-lang/roc/pull/7608 which it very preliminary... but I figure we are in an exploratory phase so may as well just try things.
I've found Claude is able to write slabs of code at once, and I can iterate pretty quickly with ideas.
Not sure this is a good way to do it, but in my struct for the UnificationTable I added a field and then a debugLog
fn which writes to a file.
debug_capture: ?std.fs.File.Writer = null,
pub fn debugLog(self: *const UnificationTable, comptime fmt: []const u8, args: anytype) void {
if (builtin.is_test and self.debug_capture != null) {
self.debug_capture.?.print(fmt, args) catch unreachable;
}
}
The I use it in tests like this.
// create a snapshot file
const file = try std.fs.cwd().createFile("src/types/snapshots/unification_table_descriptor_modification.txt", .{});
defer file.close();
var table = try UnificationTable.init(std.testing.allocator, 4);
defer table.deinit();
// give the file to our struct
table.debug_capture = file.writer();
// write something to the file
table.debugLog("INFO: Modify descriptor change Mark from {} to {}\n", .{ Mark.NONE, Mark.OCCURS });
// internal methods to the UnificationTable also use
// self.debugLog to write to the file...
That's an interesting pattern! If this works for the typecheck tests, we can try applying it elsewhere
I'm sure there is a better way to do it... not sure if zig has interfaces or anything yet. Still a zig noob
Jared Ramirez said:
Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. Isaac Van Doren any way I can help out in writing the parsers/printers?
I'm not sure if there's a great way for us to work concurrently on this, but I'd be happy to pair!
I think it would be ideal to store the snapshots as multiline strings directly in the test source. Then you can see exactly what each test is doing in one place without having to deal with any extra files. We can use the @src
builtin to accomplish this. Described in this blog post https://tigerbeetle.com/blog/2024-05-14-snapshot-testing-for-the-masses/
I don't think that will scale for roc
We have way way too many cases and they work best as individual files
Otherwise, parse.zig would be 10000 lines of snapshots of the ast.
So I think the git and file based solution is definitely the way to go
Especially since roc files are naturally files anyway
I do like @Joshua Warner's idea to put section headers ina. Single file so you can scroll through the various IRs. Instead of each or being its own file
Yeah that's fair
Last updated: Jul 06 2025 at 12:14 UTC