zig compiler - snapshot testing · compiler development

For testing in the new compiler, I want to take a few lessons from both the current snapshot testing system in test_syntax, as well as making a similar system applicable across the rest of the compiler passes. I also want to take more inspiration from llvm lit tests and the zig compiler tests.

My thinking is each test should be in a single file, with the output of passes (e.g. the ast) simply concatenated on the end of each test. You'd have the test source, then #### (or some other separator), then the formatted version of that source, then a separator, then a pretty-printed form of the AST, then the canonicalized ir, etc. This is in contrast to the current snapshot tests, where the formatted code and the ast are in separate files from the input. I also want to include any problems directly into the file. This should hopefully minimize needing to swap back and forth between

In contrast to the current test_syntax tests which heavily test the parser and formatter but generally aren't semantically valid, I'd like to focus more on having most tests ending up making it all the way thru the compiler. In cases we don't want this, we can include a system of directives at the top of the test in order to control what parts of the compiler we run, and which passes we include the output of.

Also in contrast to the current test_syntax system, adding a new test shouldn't require recompiling. This necessarily means these tests will use a custom test harness - probably just a separate binary compiled from a different main file in the source tree.

Inspired by the zig compiler, we can have a syntax for specifying the contents of multiple test files as part of the same test case.

#### Mode: expression # <-- this directive will cause us to parse and compile the input as if it was a repl input
"test".length()
#### Formatted: (same)
#### AST:
(pnc_call (str "test") (ident "length")())
#### Can IR:
<tbd>
#### etc etc

Brendan Hansknecht (Feb 06 2025 at 06:44):

I think it is also important to be able to start part way down the stack. I should be able to make a small Can IR and simply run the resolve step with nothing else. Or run the interpreter theoretically.

Joshua Warner (Feb 06 2025 at 06:45):

That does imply we have not just a "debug dump" functionality for each IR, but also the ability to _parse_ that IR

Joshua Warner (Feb 06 2025 at 06:45):

Brendan Hansknecht (Feb 06 2025 at 06:46):

I don't think we necessarily need this for every IR, but at least the IRs around complex passes.

Brendan Hansknecht (Feb 06 2025 at 06:47):

Often times, that makes a much more concise and understandable test than starting all the way at the top level IR.

Brendan Hansknecht (Feb 06 2025 at 06:48):

That said, I am simply used to doing that for pass level unit tests at this point. That same could be done in pure zig if generating IR is simple enough

Brendan Hansknecht (Feb 06 2025 at 06:49):

I think in the rust compiler generating IR is so painful that it makes unit testing passes less likely to happen and more painful

Brendan Hansknecht (Feb 06 2025 at 06:49):

If generating a zig ast is trivial, we could consider doing that for pass level tests instead of snapshots.

Joshua Warner (Feb 06 2025 at 06:51):

Brendan Hansknecht (Feb 06 2025 at 06:52):

With lit and FileCheck, you can choose to ignore 90% of an ir and just check something very specific. Sometimes that is really useful for making tests more understandable (focuses on what is expected to change instead of printing the entire IR). Not sure that is the right choice a lot of the time though. Well it reduces verbosity, it can often miss bugs, so :shrug:

Joshua Warner (Feb 06 2025 at 06:52):

Brendan Hansknecht (Feb 06 2025 at 06:53):

If generating a Parse/Can/Resolve/etc IR ast directly in zig code is easy enough, then maybe that is enough for unit testing. If it is verbose or painful at all to generate said IR, I definitely think we want parsing at all levels.

Joshua Warner (Feb 06 2025 at 06:54):

Ahhh. My somewhat naive prediction is that'll end up being more verbose than will be ergonomic for doing lots of unit tests.

Joshua Warner (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:55):

As a side note, we totally could just directly use lit and file check assuming we have a flag or special executable that simply prints out the IRs. (not saying it is worth it, just noting)

Luke Boswell (Feb 06 2025 at 08:26):

I had just assumed it would be a folder for each test, with a different file containing each IR. But I guess a single file may be easier?

László Benedek (Feb 06 2025 at 14:10):

I’m not active on these forums (only lurking) but I’d like to break the silence and share an advice. If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes. Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests. An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.

Joshua Warner (Feb 06 2025 at 15:33):

Yep, they will change pretty often - and that’s why an important part of this will be automated tooling to update the checked in snapshots as necessary. (I didn’t mention that above, but it’s already part of the test_syntax snapshots)

Brendan Hansknecht (Feb 06 2025 at 16:44):

I pretty strongly disagree with this overall premise. While end to end tests are fundamental for a compiler like this, even a handful of mediocre pass level tests often are quite valuable. E2E tests are often painful to debug. If something can be caught by a pass level test, it saves tons of times. On top of that, testing the core pillars of various transformations can help make them a lot more robust (bonus if you can fuzz and arbitrary pass). Yes, it is more work, but it is super important work to make a robust compiler and I have seen this kind of work save tons of time.

This is a second rewrite of the compiler so we do have very strong understanding of the passes and ordering. I'm sure we'll shift some around, but that's ok.

The same can always be said of unit tests. It doesn't matter if it is a compiler, webserver, app, game, etc. Yet, the industry has for the most part realized that unit tests lead to better software despite needing to be thrown away or rewritten when software changes.

I think trying to focus solely on end to end tests would be a big mistake and a way to fundamentally make our compiler more brittle and harder to debug.

As I mentioned above after working with mlir for years, one of its pieces of magic is simply that every ir can be printed and parsed. This enables spinning up tons of pass level unit tests. It makes way more robust software, is more observalble and debuggable.

Brendan Hansknecht (Feb 06 2025 at 16:48):

Aside for @Joshua Warner: another reason to enable pass by pass snapshot tests is cause most of the functionally can be tested without seeing the monster updates of full compiler snapshot tests. So adding a feature to the parser won't affect the can ir to resolve ir snapshot tests at all. But adding said feature if you only have these e2e test will affect all tests. Just makes it harder to validate what changed. (Still need the E2E tests, but maybe not as many)

Joshua Warner (Feb 06 2025 at 21:18):

One thing about forcing tests to start as real roc code that at least sounds good on the surface to me, is that forces us never have ambiguity about, "yeah but that state of the IR is not something the previous pass would ever generate so why are we trying to handle this". If there's a test, then it's unambiguously possible to hit that part of the state space. Combine that with good fuzzing that hopefully reaches fairly deep into the compiler (far from proven!), and you can maybe have the best of both worlds.

Brendan Hansknecht (Feb 06 2025 at 22:17):

Yeah, I think that is great for larger examples. I think for smaller features and simple algorithm tests, the snippets are much nicer.

Brendan Hansknecht (Feb 06 2025 at 22:18):

That said, the golden standard is for every ir to have a verifier for what is valid or not

Brendan Hansknecht (Feb 06 2025 at 22:18):

And if every ir has a parser and verifier, you can start fuzzing at any point in the compiler and stop at any point. Which should enable much better fuzzing exploration

Brendan Hansknecht (Feb 06 2025 at 22:19):

Loris Cro (Feb 10 2025 at 15:42):

Not sure how well this can apply to your use case but it works perfectly well for me in Zine: https://kristoff.it/blog/dead-simple-snapshot-testing/ (bonus mention of 2 snapshot testing frameworks for Zig in the post)

Joshua Warner (Feb 10 2025 at 15:49):

Isaac Van Doren (Feb 14 2025 at 19:11):

I started looking at function lifting, and it's going to be much easier to write tests for the build stages like that once we have IR dumping and parsing in place. Did anyone have specific ideas about how this should be implemented? Is the idea to manually write a sexpr printer and parser for each IR?

Sam Mohr (Feb 14 2025 at 19:12):

Sam Mohr (Feb 14 2025 at 19:16):

If we implement the phases roughly in order, we can just have Roc source go in, and each S-expr printer come out, we glue everything together, and check that the file has the expected content

Isaac Van Doren (Feb 14 2025 at 19:26):

If we want to be able to work on the later stages of the compiler before everything else is implemented I think we'll need to write parsers also. Manually writing the IR for the previous stage seems prohibitively tedious

Jared Ramirez (Feb 15 2025 at 05:37):

Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. @Isaac Van Doren any way I can help out in writing the parsers/printers?

Luke Boswell (Feb 15 2025 at 05:43):

I'd say it would be helpful to take the IR in src/build/lift_functions/IR.zig and have a go at building a pretty printer / S-expression thing for it

Luke Boswell (Feb 15 2025 at 05:44):

I have come up with a method for making snapshots in https://github.com/roc-lang/roc/pull/7608 which it very preliminary... but I figure we are in an exploratory phase so may as well just try things.

Luke Boswell (Feb 15 2025 at 05:44):

I've found Claude is able to write slabs of code at once, and I can iterate pretty quickly with ideas.

Luke Boswell (Feb 15 2025 at 05:50):

Not sure this is a good way to do it, but in my struct for the UnificationTable I added a field and then a debugLog fn which writes to a file.

debug_capture: ?std.fs.File.Writer = null,

pub fn debugLog(self: *const UnificationTable, comptime fmt: []const u8, args: anytype) void {
    if (builtin.is_test and self.debug_capture != null) {
        self.debug_capture.?.print(fmt, args) catch unreachable;
    }
}

// create a snapshot file
const file = try std.fs.cwd().createFile("src/types/snapshots/unification_table_descriptor_modification.txt", .{});
    defer file.close();

var table = try UnificationTable.init(std.testing.allocator, 4);
defer table.deinit();

// give the file to our struct
table.debug_capture = file.writer();

// write something to the file
table.debugLog("INFO: Modify descriptor change Mark from {} to {}\n", .{ Mark.NONE, Mark.OCCURS });

// internal methods to the UnificationTable also use
// self.debugLog to write to the file...

Sam Mohr (Feb 15 2025 at 05:57):

That's an interesting pattern! If this works for the typecheck tests, we can try applying it elsewhere

Luke Boswell (Feb 15 2025 at 06:00):

I'm sure there is a better way to do it... not sure if zig has interfaces or anything yet. Still a zig noob

Isaac Van Doren (Feb 15 2025 at 21:08):

I'm not sure if there's a great way for us to work concurrently on this, but I'd be happy to pair!

Isaac Van Doren (Feb 15 2025 at 21:09):

I think it would be ideal to store the snapshots as multiline strings directly in the test source. Then you can see exactly what each test is doing in one place without having to deal with any extra files. We can use the @src builtin to accomplish this. Described in this blog post https://tigerbeetle.com/blog/2024-05-14-snapshot-testing-for-the-masses/

Brendan Hansknecht (Feb 15 2025 at 21:22):

Brendan Hansknecht (Feb 15 2025 at 21:23):

Brendan Hansknecht (Feb 15 2025 at 21:24):

I do like @Joshua Warner's idea to put section headers ina. Single file so you can scroll through the various IRs. Instead of each or being its own file

Isaac Van Doren (Feb 15 2025 at 21:34):

Joshua Warner (Jul 21 2025 at 03:02):

Joshua Warner (Jul 21 2025 at 03:03):

@Richard Feldman IIRC you were wanting to keep the update-expected functionality separate from snapshot.zig - can you say more about that?

Joshua Warner (Jul 21 2025 at 03:03):

I think neither of these bugs would have happened if the updated-expected logic (and the verification test) were better integrated with the snapshot code.

Richard Feldman (Jul 21 2025 at 03:04):

Richard Feldman (Jul 21 2025 at 03:06):

so for repl snapshots, what I really want to confirm hasn't regressed is the repl output as a whole, which includes both diagnostic reports and the output values

Richard Feldman (Jul 21 2025 at 03:06):

I don't really care about the distinction between like "here are the problems that were reported" and "here was the repl output" because at the end of the day I'm going to see both in the repl anyway, so it seemed reasonable to combine them

Richard Feldman (Jul 21 2025 at 03:07):

(although actually I'm not sure if the problems are included in that repl snapshot output right yet)

Joshua Warner (Jul 21 2025 at 03:07):

Richard Feldman (Jul 21 2025 at 03:07):

Joshua Warner (Jul 21 2025 at 03:08):

Richard Feldman (Jul 21 2025 at 03:08):

Joshua Warner (Jul 21 2025 at 03:09):

For the moment; but in the future I don't know why we couldn't/wouldn't expand that to exprs and full file snapshots

Joshua Warner (Jul 21 2025 at 03:10):

And if that flag isn't passed, we can print a warning about some outputs having changed

Joshua Warner (Jul 21 2025 at 03:11):

Joshua Warner (Jul 21 2025 at 03:12):

Side note: for the expected problems, I kinda want to switch to a "lit"-style test, with problems noted in comments

Joshua Warner (Jul 21 2025 at 03:13):

foo(bar) # EXPECTED: IDENTIFIER NOT DEFINED

Richard Feldman (Jul 21 2025 at 03:30):

Joshua Warner (Jul 21 2025 at 18:01):

Joshua Warner (Jul 21 2025 at 18:02):

The other stuff going on there is centralizing all the snapshot logic to go thru the same snapshot generation functions to hopefully prevent this sort of issue cropping up again in the future

Joshua Warner (Jul 21 2025 at 18:02):

Now instead of zig build update-expected, you do zig build snapshot -- --update-expected. There's also --check-expected, --update-output, and --check-output.

Joshua Warner (Jul 21 2025 at 18:07):

There's still a test that does the validation of snapshots (without doing updates), and that now lives in snapshot.zig, calling the same snapshot generation/collection code but disabling the output.

Joshua Warner (Jul 21 2025 at 18:11):

I'm not fully satisfied with the current workflow/interface; suggestions welcome!

Stream: compiler development

Topic: zig compiler - snapshot testing

Joshua Warner (Feb 06 2025 at 06:40):

Brendan Hansknecht (Feb 06 2025 at 06:44):

Joshua Warner (Feb 06 2025 at 06:45):

Joshua Warner (Feb 06 2025 at 06:45):

Brendan Hansknecht (Feb 06 2025 at 06:46):

Brendan Hansknecht (Feb 06 2025 at 06:46):

Brendan Hansknecht (Feb 06 2025 at 06:46):

Brendan Hansknecht (Feb 06 2025 at 06:46):

Brendan Hansknecht (Feb 06 2025 at 06:47):

Brendan Hansknecht (Feb 06 2025 at 06:48):

Brendan Hansknecht (Feb 06 2025 at 06:49):

Brendan Hansknecht (Feb 06 2025 at 06:49):

Joshua Warner (Feb 06 2025 at 06:51):

Brendan Hansknecht (Feb 06 2025 at 06:52):

Joshua Warner (Feb 06 2025 at 06:52):

Brendan Hansknecht (Feb 06 2025 at 06:53):

Joshua Warner (Feb 06 2025 at 06:54):

Joshua Warner (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:54):

Brendan Hansknecht (Feb 06 2025 at 06:55):

Luke Boswell (Feb 06 2025 at 08:26):

László Benedek (Feb 06 2025 at 14:10):

Joshua Warner (Feb 06 2025 at 15:33):

Brendan Hansknecht (Feb 06 2025 at 16:44):

Brendan Hansknecht (Feb 06 2025 at 16:48):

Joshua Warner (Feb 06 2025 at 21:18):

Brendan Hansknecht (Feb 06 2025 at 22:17):

Brendan Hansknecht (Feb 06 2025 at 22:18):

Brendan Hansknecht (Feb 06 2025 at 22:18):

Brendan Hansknecht (Feb 06 2025 at 22:19):

Brendan Hansknecht (Feb 06 2025 at 22:19):

Loris Cro (Feb 10 2025 at 15:42):

Joshua Warner (Feb 10 2025 at 15:49):

Isaac Van Doren (Feb 14 2025 at 19:11):

Sam Mohr (Feb 14 2025 at 19:12):

Sam Mohr (Feb 14 2025 at 19:16):

Isaac Van Doren (Feb 14 2025 at 19:26):

Jared Ramirez (Feb 15 2025 at 05:37):

Luke Boswell (Feb 15 2025 at 05:43):

Luke Boswell (Feb 15 2025 at 05:44):

Luke Boswell (Feb 15 2025 at 05:44):

Luke Boswell (Feb 15 2025 at 05:50):

Sam Mohr (Feb 15 2025 at 05:57):

Luke Boswell (Feb 15 2025 at 06:00):

Isaac Van Doren (Feb 15 2025 at 21:08):

Isaac Van Doren (Feb 15 2025 at 21:09):

Brendan Hansknecht (Feb 15 2025 at 21:22):

Brendan Hansknecht (Feb 15 2025 at 21:22):

Brendan Hansknecht (Feb 15 2025 at 21:23):

Brendan Hansknecht (Feb 15 2025 at 21:23):

Brendan Hansknecht (Feb 15 2025 at 21:24):

Brendan Hansknecht (Feb 15 2025 at 21:24):

Isaac Van Doren (Feb 15 2025 at 21:34):

Joshua Warner (Jul 21 2025 at 03:02):

Joshua Warner (Jul 21 2025 at 03:03):

Joshua Warner (Jul 21 2025 at 03:03):

Richard Feldman (Jul 21 2025 at 03:04):

Richard Feldman (Jul 21 2025 at 03:04):

Richard Feldman (Jul 21 2025 at 03:06):

Richard Feldman (Jul 21 2025 at 03:06):

Richard Feldman (Jul 21 2025 at 03:07):

Joshua Warner (Jul 21 2025 at 03:07):

Richard Feldman (Jul 21 2025 at 03:07):

Richard Feldman (Jul 21 2025 at 03:07):

Joshua Warner (Jul 21 2025 at 03:08):

Richard Feldman (Jul 21 2025 at 03:08):

Richard Feldman (Jul 21 2025 at 03:08):

Joshua Warner (Jul 21 2025 at 03:09):

Joshua Warner (Jul 21 2025 at 03:10):

Joshua Warner (Jul 21 2025 at 03:10):

Joshua Warner (Jul 21 2025 at 03:10):

Joshua Warner (Jul 21 2025 at 03:11):

Joshua Warner (Jul 21 2025 at 03:12):

Joshua Warner (Jul 21 2025 at 03:13):

Richard Feldman (Jul 21 2025 at 03:30):

Joshua Warner (Jul 21 2025 at 18:01):