Stream: compiler development

Topic: zig compiler - snapshot testing


view this post on Zulip Joshua Warner (Feb 06 2025 at 06:40):

For testing in the new compiler, I want to take a few lessons from both the current snapshot testing system in test_syntax, as well as making a similar system applicable across the rest of the compiler passes. I also want to take more inspiration from llvm lit tests and the zig compiler tests.

My thinking is each test should be in a single file, with the output of passes (e.g. the ast) simply concatenated on the end of each test. You'd have the test source, then #### (or some other separator), then the formatted version of that source, then a separator, then a pretty-printed form of the AST, then the canonicalized ir, etc. This is in contrast to the current snapshot tests, where the formatted code and the ast are in separate files from the input. I also want to include any problems directly into the file. This should hopefully minimize needing to swap back and forth between

In contrast to the current test_syntax tests which heavily test the parser and formatter but generally aren't semantically valid, I'd like to focus more on having most tests ending up making it all the way thru the compiler. In cases we don't want this, we can include a system of directives at the top of the test in order to control what parts of the compiler we run, and which passes we include the output of.

Also in contrast to the current test_syntax system, adding a new test shouldn't require recompiling. This necessarily means these tests will use a custom test harness - probably just a separate binary compiled from a different main file in the source tree.

Inspired by the zig compiler, we can have a syntax for specifying the contents of multiple test files as part of the same test case.

Here's a quick example:

#### Mode: expression # <-- this directive will cause us to parse and compile the input as if it was a repl input
"test".length()
#### Formatted: (same)
#### AST:
(pnc_call (str "test") (ident "length")())
#### Can IR:
<tbd>
#### etc etc

Thoughts?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:44):

I think it is also important to be able to start part way down the stack. I should be able to make a small Can IR and simply run the resolve step with nothing else. Or run the interpreter theoretically.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:45):

That does imply we have not just a "debug dump" functionality for each IR, but also the ability to _parse_ that IR

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:45):

Agreed that's valuable tho

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

I think this is one of the best features of MLIR.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

Really helps with testing during development

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

That said, can get messy at scale.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:46):

I don't think we necessarily need this for every IR, but at least the IRs around complex passes.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:47):

Often times, that makes a much more concise and understandable test than starting all the way at the top level IR.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:48):

That said, I am simply used to doing that for pass level unit tests at this point. That same could be done in pure zig if generating IR is simple enough

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:49):

I think in the rust compiler generating IR is so painful that it makes unit testing passes less likely to happen and more painful

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:49):

If generating a zig ast is trivial, we could consider doing that for pass level tests instead of snapshots.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:51):

What do you mean by "generating a zig ast"?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:52):

Oh, one other thought:

With lit and FileCheck, you can choose to ignore 90% of an ir and just check something very specific. Sometimes that is really useful for making tests more understandable (focuses on what is expected to change instead of printing the entire IR). Not sure that is the right choice a lot of the time though. Well it reduces verbosity, it can often miss bugs, so :shrug:

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:52):

Ahhh I should read up on that

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:53):

What do you mean by "generating a zig ast"?

If generating a Parse/Can/Resolve/etc IR ast directly in zig code is easy enough, then maybe that is enough for unit testing. If it is verbose or painful at all to generate said IR, I definitely think we want parsing at all levels.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:54):

Ahhh. My somewhat naive prediction is that'll end up being more verbose than will be ergonomic for doing lots of unit tests.

view this post on Zulip Joshua Warner (Feb 06 2025 at 06:54):

Maybe there are zig tricks I don't know tho?

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

Yeah, that is my exact same thought and question

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

So I bet we will want a parser for the various IRs

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:54):

It will make them way more testable

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 06:55):

As a side note, we totally could just directly use lit and file check assuming we have a flag or special executable that simply prints out the IRs. (not saying it is worth it, just noting)

and this is filecheck, really simple: https://llvm.org/docs/CommandGuide/FileCheck.html

view this post on Zulip Luke Boswell (Feb 06 2025 at 08:26):

I had just assumed it would be a folder for each test, with a different file containing each IR. But I guess a single file may be easier?

view this post on Zulip László Benedek (Feb 06 2025 at 14:10):

I’m not active on these forums (only lurking) but I’d like to break the silence and share an advice. If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes. Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests. An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.

view this post on Zulip Joshua Warner (Feb 06 2025 at 15:33):

Yep, they will change pretty often - and that’s why an important part of this will be automated tooling to update the checked in snapshots as necessary. (I didn’t mention that above, but it’s already part of the test_syntax snapshots)

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 16:44):

If you start creating IR tests then it will make refactoring much harder because you will have to change an insane amount of (test) code every time you change the IR or try to rearrange passes.

I pretty strongly disagree with this overall premise. While end to end tests are fundamental for a compiler like this, even a handful of mediocre pass level tests often are quite valuable. E2E tests are often painful to debug. If something can be caught by a pass level test, it saves tons of times. On top of that, testing the core pillars of various transformations can help make them a lot more robust (bonus if you can fuzz and arbitrary pass). Yes, it is more work, but it is super important work to make a robust compiler and I have seen this kind of work save tons of time.

Unless you feel that you can see the future and it is not a real issue, I really really recommend not doing this and instead focus solely on end to end tests.

This is a second rewrite of the compiler so we do have very strong understanding of the passes and ordering. I'm sure we'll shift some around, but that's ok.

An end to end test suite is much more valuable than a set of IR tests. This would help with alternative compiler implementations or various large scale refactors where otherwise you’d have to throw out or migrate a lot of test code.

The same can always be said of unit tests. It doesn't matter if it is a compiler, webserver, app, game, etc. Yet, the industry has for the most part realized that unit tests lead to better software despite needing to be thrown away or rewritten when software changes.

I think trying to focus solely on end to end tests would be a big mistake and a way to fundamentally make our compiler more brittle and harder to debug.


As I mentioned above after working with mlir for years, one of its pieces of magic is simply that every ir can be printed and parsed. This enables spinning up tons of pass level unit tests. It makes way more robust software, is more observalble and debuggable.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 16:48):

Aside for @Joshua Warner: another reason to enable pass by pass snapshot tests is cause most of the functionally can be tested without seeing the monster updates of full compiler snapshot tests. So adding a feature to the parser won't affect the can ir to resolve ir snapshot tests at all. But adding said feature if you only have these e2e test will affect all tests. Just makes it harder to validate what changed. (Still need the E2E tests, but maybe not as many)

view this post on Zulip Joshua Warner (Feb 06 2025 at 21:18):

One thing about forcing tests to start as real roc code that at least sounds good on the surface to me, is that forces us never have ambiguity about, "yeah but that state of the IR is not something the previous pass would ever generate so why are we trying to handle this". If there's a test, then it's unambiguously possible to hit that part of the state space. Combine that with good fuzzing that hopefully reaches fairly deep into the compiler (far from proven!), and you can maybe have the best of both worlds.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:17):

Yeah, I think that is great for larger examples. I think for smaller features and simple algorithm tests, the snippets are much nicer.

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:18):

That said, the golden standard is for every ir to have a verifier for what is valid or not

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:18):

And if every ir has a parser and verifier, you can start fuzzing at any point in the compiler and stop at any point. Which should enable much better fuzzing exploration

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:19):

Not to mention the fuzzer can break if any invariant to broken at any or level

view this post on Zulip Brendan Hansknecht (Feb 06 2025 at 22:19):

Though obviously correct by construction IRs see the best.

view this post on Zulip Loris Cro (Feb 10 2025 at 15:42):

Not sure how well this can apply to your use case but it works perfectly well for me in Zine: https://kristoff.it/blog/dead-simple-snapshot-testing/ (bonus mention of 2 snapshot testing frameworks for Zig in the post)

view this post on Zulip Joshua Warner (Feb 10 2025 at 15:49):

Oh nice, that looks like a great strategy!

view this post on Zulip Isaac Van Doren (Feb 14 2025 at 19:11):

I started looking at function lifting, and it's going to be much easier to write tests for the build stages like that once we have IR dumping and parsing in place. Did anyone have specific ideas about how this should be implemented? Is the idea to manually write a sexpr printer and parser for each IR?

view this post on Zulip Sam Mohr (Feb 14 2025 at 19:12):

I think at least a printer

view this post on Zulip Sam Mohr (Feb 14 2025 at 19:16):

If we implement the phases roughly in order, we can just have Roc source go in, and each S-expr printer come out, we glue everything together, and check that the file has the expected content

view this post on Zulip Isaac Van Doren (Feb 14 2025 at 19:26):

If we want to be able to work on the later stages of the compiler before everything else is implemented I think we'll need to write parsers also. Manually writing the IR for the previous stage seems prohibitively tedious

view this post on Zulip Jared Ramirez (Feb 15 2025 at 05:37):

Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. @Isaac Van Doren any way I can help out in writing the parsers/printers?

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:43):

I'd say it would be helpful to take the IR in src/build/lift_functions/IR.zig and have a go at building a pretty printer / S-expression thing for it

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:44):

I have come up with a method for making snapshots in https://github.com/roc-lang/roc/pull/7608 which it very preliminary... but I figure we are in an exploratory phase so may as well just try things.

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:44):

I've found Claude is able to write slabs of code at once, and I can iterate pretty quickly with ideas.

view this post on Zulip Luke Boswell (Feb 15 2025 at 05:50):

Not sure this is a good way to do it, but in my struct for the UnificationTable I added a field and then a debugLog fn which writes to a file.

debug_capture: ?std.fs.File.Writer = null,

pub fn debugLog(self: *const UnificationTable, comptime fmt: []const u8, args: anytype) void {
    if (builtin.is_test and self.debug_capture != null) {
        self.debug_capture.?.print(fmt, args) catch unreachable;
    }
}

The I use it in tests like this.

// create a snapshot file
const file = try std.fs.cwd().createFile("src/types/snapshots/unification_table_descriptor_modification.txt", .{});
    defer file.close();

var table = try UnificationTable.init(std.testing.allocator, 4);
defer table.deinit();

// give the file to our struct
table.debug_capture = file.writer();

// write something to the file
table.debugLog("INFO: Modify descriptor change Mark from {} to {}\n", .{ Mark.NONE, Mark.OCCURS });

// internal methods to the UnificationTable also use
// self.debugLog to write to the file...

view this post on Zulip Sam Mohr (Feb 15 2025 at 05:57):

That's an interesting pattern! If this works for the typecheck tests, we can try applying it elsewhere

view this post on Zulip Luke Boswell (Feb 15 2025 at 06:00):

I'm sure there is a better way to do it... not sure if zig has interfaces or anything yet. Still a zig noob

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:08):

Jared Ramirez said:

Yeah, I was running into the exact same thing when starting to look at function solving. For that stage though, it uses the function lift IR. Isaac Van Doren any way I can help out in writing the parsers/printers?

I'm not sure if there's a great way for us to work concurrently on this, but I'd be happy to pair!

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:09):

I think it would be ideal to store the snapshots as multiline strings directly in the test source. Then you can see exactly what each test is doing in one place without having to deal with any extra files. We can use the @src builtin to accomplish this. Described in this blog post https://tigerbeetle.com/blog/2024-05-14-snapshot-testing-for-the-masses/

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:22):

I don't think that will scale for roc

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:22):

We have way way too many cases and they work best as individual files

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:23):

Otherwise, parse.zig would be 10000 lines of snapshots of the ast.

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:23):

So I think the git and file based solution is definitely the way to go

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:24):

Especially since roc files are naturally files anyway

view this post on Zulip Brendan Hansknecht (Feb 15 2025 at 21:24):

I do like @Joshua Warner's idea to put section headers ina. Single file so you can scroll through the various IRs. Instead of each or being its own file

view this post on Zulip Isaac Van Doren (Feb 15 2025 at 21:34):

Yeah that's fair


Last updated: Jul 06 2025 at 12:14 UTC