Stream: compiler development

Topic: zig compiler - snapshot testing


view this post on Zulip Joshua Warner (Feb 06 2025 at 06:40):

For testing in the new compiler, I want to take a few lessons from both the current snapshot testing system in test_syntax, as well as making a similar system applicable across the rest of the compiler passes. I also want to take more inspiration from llvm lit tests and the zig compiler tests.

My thinking is each test should be in a single file, with the output of passes (e.g. the ast) simply concatenated on the end of each test. You'd have the test source, then #### (or some other separator), then the formatted version of that source, then a separator, then a pretty-printed form of the AST, then the canonicalized ir, etc. This is in contrast to the current snapshot tests, where the formatted code and the ast are in separate files from the input. I also want to include any problems directly into the file. This should hopefully minimize needing to swap back and forth between

In contrast to the current test_syntax tests which heavily test the parser and formatter but generally aren't semantically valid, I'd like to focus more on having most tests ending up making it all the way thru the compiler. In cases we don't want this, we can include a system of directives at the top of the test in order to control what parts of the compiler we run, and which passes we include the output of.

Also in contrast to the current test_syntax system, adding a new test shouldn't require recompiling. This necessarily means these tests will use a custom test harness - probably just a separate binary compiled from a different main file in the source tree.

Inspired by the zig compiler, we can have a syntax for specifying the contents of multiple test files as part of the same test case.

Here's a quick example:

#### Mode: expression # <-- this directive will cause us to parse and compile the input as if it was a repl input
"test".length()
#### Formatted: (same)
#### AST:
(pnc_call (str "test") (ident "length")())
#### Can IR:
<tbd>
#### etc etc

Thoughts?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:44):

I think it is also important to be able to start part way down the stack. I should be able to make a small Can IR and simply run the resolve step with nothing else. Or run the interpreter theoretically.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:45):

That does imply we have not just a "debug dump" functionality for each IR, but also the ability to _parse_ that IR

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:45):

Agreed that's valuable tho

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

I think this is one of the best features of MLIR.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

Really helps with testing during development

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

That said, can get messy at scale.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

I don't think we necessarily need this for every IR, but at least the IRs around complex passes.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:47):

Often times, that makes a much more concise and understandable test than starting all the way at the top level IR.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:48):

That said, I am simply used to doing that for pass level unit tests at this point. That same could be done in pure zig if generating IR is simple enough

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:49):

I think in the rust compiler generating IR is so painful that it makes unit testing passes less likely to happen and more painful

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:49):

If generating a zig ast is trivial, we could consider doing that for pass level tests instead of snapshots.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:51):

What do you mean by "generating a zig ast"?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:52):

Oh, one other thought:

With lit and FileCheck, you can choose to ignore 90% of an ir and just check something very specific. Sometimes that is really useful for making tests more understandable (focuses on what is expected to change instead of printing the entire IR). Not sure that is the right choice a lot of the time though. Well it reduces verbosity, it can often miss bugs, so :shrug:

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:52):

Ahhh I should read up on that

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:53):

What do you mean by "generating a zig ast"?

If generating a Parse/Can/Resolve/etc IR ast directly in zig code is easy enough, then maybe that is enough for unit testing. If it is verbose or painful at all to generate said IR, I definitely think we want parsing at all levels.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:54):

Ahhh. My somewhat naive prediction is that'll end up being more verbose than will be ergonomic for doing lots of unit tests.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:54):

Maybe there are zig tricks I don't know tho?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

Yeah, that is my exact same thought and question

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

So I bet we will want a parser for the various IRs

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

It will make them way more testable

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:55):

As a side note, we totally could just directly use lit and file check assuming we have a flag or special executable that simply prints out the IRs. (not saying it is worth it, just noting)

and this is filecheck, really simple: https://llvm.org/docs/CommandGuide/FileCheck.html

view this post on Zulip Luke Boswell (Feb 06 2025 at 08:26):

I had just assumed it would be a folder for each test, with a different file containing each IR. But I guess a single file may be easier?

view this post on Zulip László Benedek (Feb 06 2025 at 14:10):

I’m not active on these forums (only lurking) but I’d like to break the silence and share an advice. If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes. Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests. An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.

view this post on Zulip Joshua Warner (Feb 06 2025 at 15:33):

Yep, they will change pretty often - and that’s why an important part of this will be automated tooling to update the checked in snapshots as necessary. (I didn’t mention that above, but it’s already part of the test_syntax snapshots)

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 16:44):

If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes.

I pretty strongly disagree with this overall premise. While end to end tests are fundamental for a compiler like this, even a handful of mediocre pass level tests often are quite valuable. E2E tests are often painful to debug. If something can be caught by a pass level test, it saves tons of times. On top of that, testing the core pillars of various transformations can help make them a lot more robust (bonus if you can fuzz and arbitrary pass). Yes, it is more work, but it is super important work to make a robust compiler and I have seen this kind of work save tons of time.

Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests.

This is a second rewrite of the compiler so we do have very strong understanding of the passes and ordering. I'm sure we'll shift some around, but that's ok.

An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.

The same can always be said of unit tests. It doesn't matter if it is a compiler, webserver, app, game, etc. Yet, the industry has for the most part realized that unit tests lead to better software despite needing to be thrown away or rewritten when software changes.

I think trying to focus solely on end to end tests would be a big mistake and a way to fundamentally make our compiler more brittle and harder to debug.


As I mentioned above after working with mlir for years, one of its pieces of magic is simply that every ir can be printed and parsed. This enables spinning up tons of pass level unit tests. It makes way more robust software, is more observalble and debuggable.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 16:48):

Aside for @Joshua Warner: another reason to enable pass by pass snapshot tests is cause most of the functionally can be tested without seeing the monster updates of full compiler snapshot tests. So adding a feature to the parser won't affect the can ir to resolve ir snapshot tests at all. But adding said feature if you only have these e2e test will affect all tests. Just makes it harder to validate what changed. (Still need the E2E tests, but maybe not as many)

view this post on Zulip Joshua Warner (Feb 06 2025 at 21:18):

One thing about forcing tests to start as real roc code that at least sounds good on the surface to me, is that forces us never have ambiguity about, "yeah but that state of the IR is not something the previous pass would ever generate so why are we trying to handle this". If there's a test, then it's unambiguously possible to hit that part of the state space. Combine that with good fuzzing that hopefully reaches fairly deep into the compiler (far from proven!), and you can maybe have the best of both worlds.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:17):

Yeah, I think that is great for larger examples. I think for smaller features and simple algorithm tests, the snippets are much nicer.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:18):

That said, the golden standard is for every ir to have a verifier for what is valid or not

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:18):

And if every ir has a parser and verifier, you can start fuzzing at any point in the compiler and stop at any point. Which should enable much better fuzzing exploration

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:19):

Not to mention the fuzzer can break if any invariant to broken at any or level

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:19):

Though obviously correct by construction IRs see the best.

view this post on Zulip Loris Cro (Feb 10 2025 at 15:42):

Not sure how well this can apply to your use case but it works perfectly well for me in Zine: https://kristoff.it/blog/dead-simple-snapshot-testing/ (bonus mention of 2 snapshot testing frameworks for Zig in the post)

view this post on Zulip Joshua Warner (Feb 10 2025 at 15:49):

Oh nice, that looks like a great strategy!

view this post on Zulip Isaac Van Doren (Feb 14 2025 at 19:11):

I started looking at function lifting, and it's going to be much easier to write tests for the build stages like that once we have IR dumping and parsing in place. Did anyone have specific ideas about how this should be implemented? Is the idea to manually write a sexpr printer and parser for each IR?

view this post on Zulip Sam Mohr (Feb 14 2025 at 19:12):

I think at least a printer

view this post on Zulip Sam Mohr (Feb 14 2025 at 19:16):

If we implement the phases roughly in order, we can just have Roc source go in, and each S-expr printer come out, we glue everything together, and check that the file has the expected content

view this post on Zulip Isaac Van Doren (Feb 14 2025 at 19:26):

If we want to be able to work on the later stages of the compiler before everything else is implemented I think we'll need to write parsers also. Manually writing the IR for the previous stage seems prohibitively tedious

view this post on Zulip Jared Ramirez (Feb 15 2025 at 05:37):

Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. @Isaac Van Doren any way I can help out in writing the parsers/printers?

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:43):

I'd say it would be helpful to take the IR in src/build/lift_functions/IR.zig and have a go at building a pretty printer / S-expression thing for it

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:44):

I have come up with a method for making snapshots in https://github.com/roc-lang/roc/pull/7608 which it very preliminary... but I figure we are in an exploratory phase so may as well just try things.

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:44):

I've found Claude is able to write slabs of code at once, and I can iterate pretty quickly with ideas.

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:50):

Not sure this is a good way to do it, but in my struct for the UnificationTable I added a field and then a debugLog fn which writes to a file.

debug_capture: ?std.fs.File.Writer = null,

pub fn debugLog(self: *const UnificationTable, comptime fmt: []const u8, args: anytype) void {
    if (builtin.is_test and self.debug_capture != null) {
        self.debug_capture.?.print(fmt, args) catch unreachable;
    }
}

The I use it in tests like this.

// create a snapshot file
const file = try std.fs.cwd().createFile("src/types/snapshots/unification_table_descriptor_modification.txt", .{});
    defer file.close();

var table = try UnificationTable.init(std.testing.allocator, 4);
defer table.deinit();

// give the file to our struct
table.debug_capture = file.writer();

// write something to the file
table.debugLog("INFO: Modify descriptor change Mark from {} to {}\n", .{ Mark.NONE, Mark.OCCURS });

// internal methods to the UnificationTable also use
// self.debugLog to write to the file...

view this post on Zulip Sam Mohr (Feb 15 2025 at 05:57):

That's an interesting pattern! If this works for the typecheck tests, we can try applying it elsewhere

view this post on Zulip Luke Boswell (Feb 15 2025 at 06:00):

I'm sure there is a better way to do it... not sure if zig has interfaces or anything yet. Still a zig noob

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:08):

Jared Ramirez said:

Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. Isaac Van Doren any way I can help out in writing the parsers/printers?

I'm not sure if there's a great way for us to work concurrently on this, but I'd be happy to pair!

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:09):

I think it would be ideal to store the snapshots as multiline strings directly in the test source. Then you can see exactly what each test is doing in one place without having to deal with any extra files. We can use the @src builtin to accomplish this. Described in this blog post https://tigerbeetle.com/blog/2024-05-14-snapshot-testing-for-the-masses/

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:22):

I don't think that will scale for roc

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:22):

We have way way too many cases and they work best as individual files

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:23):

Otherwise, parse.zig would be 10000 lines of snapshots of the ast.

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:23):

So I think the git and file based solution is definitely the way to go

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:24):

Especially since roc files are naturally files anyway

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:24):

I do like @Joshua Warner's idea to put section headers ina. Single file so you can scroll through the various IRs. Instead of each or being its own file

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:34):

Yeah that's fair

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:02):

Looks like snapshot EXPECTED handling is now broken in a couple different ways:

  1. In several snapshots, we have lines in PROBLEMS that begin with # (comments extracted from roc code in a snippet) - and the extractSection logic doesn't handle that properly. Indeed, I think we'll need to adjust the format a bit if we want that to work properly in all cases.
  2. The repl snapshots now re-use the EXPECTED section to mean something different, but zig build update-expected doesn't aware of that and (un)helpfully puts the extracted problems into the EXPECTED section, thereby breaking the tests.

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:03):

@Richard Feldman IIRC you were wanting to keep the update-expected functionality separate from snapshot.zig - can you say more about that?

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:03):

I think neither of these bugs would have happened if the updated-expected logic (and the verification test) were better integrated with the snapshot code.

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:04):

ah damn

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:04):

I didn't think of update-expected

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:06):

so for repl snapshots, what I really want to confirm hasn't regressed is the repl output as a whole, which includes both diagnostic reports and the output values

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:06):

I don't really care about the distinction between like "here are the problems that were reported" and "here was the repl output" because at the end of the day I'm going to see both in the repl anyway, so it seemed reasonable to combine them

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:07):

(although actually I'm not sure if the problems are included in that repl snapshot output right yet)

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:07):

They're not

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:07):

so it seemed like at the time having that replace EXPECTED made sense

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:07):

but maybe they should just have their own separate snapshot format?

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:08):

Why not have an OUTPUT section?

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:08):

only for repl snapshots?

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:08):

that would work - would it be clear that that's the expected output?

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:09):

only for repl snapshots?

For the moment; but in the future I don't know why we couldn't/wouldn't expand that to exprs and full file snapshots

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:10):

would it be clear that that's the expected output

We can just _not_ update it unless a flag is passed

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:10):

--update-output or something

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:10):

And if that flag isn't passed, we can print a warning about some outputs having changed

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:11):

(Or even error out I suppose)

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:12):

Side note: for the expected problems, I kinda want to switch to a "lit"-style test, with problems noted in comments

view this post on Zulip Joshua Warner (Jul 21 2025 at 03:13):

e.g.

foo(bar) # EXPECTED: IDENTIFIER NOT DEFINED

(or whatever the error message would be)

view this post on Zulip Richard Feldman (Jul 21 2025 at 03:30):

gotcha, the output changes sound fine to me! :+1:

view this post on Zulip Joshua Warner (Jul 21 2025 at 18:01):

Made those changes as part of a somewhat more significant reorganization of the logic here: https://github.com/roc-lang/roc/pull/8089

view this post on Zulip Joshua Warner (Jul 21 2025 at 18:02):

The other stuff going on there is centralizing all the snapshot logic to go thru the same snapshot generation functions to hopefully prevent this sort of issue cropping up again in the future

view this post on Zulip Joshua Warner (Jul 21 2025 at 18:02):

Now instead of zig build update-expected, you do zig build snapshot -- --update-expected. There's also --check-expected, --update-output, and --check-output.

view this post on Zulip Joshua Warner (Jul 21 2025 at 18:07):

There's still a test that does the validation of snapshots (without doing updates), and that now lives in snapshot.zig, calling the same snapshot generation/collection code but disabling the output.

view this post on Zulip Joshua Warner (Jul 21 2025 at 18:11):

I'm not fully satisfied with the current workflow/interface; suggestions welcome!


Last updated: Jul 26 2025 at 12:14 UTC