zig compiler - spike · compiler development

Stream: compiler development

Topic: zig compiler - spike

Sam Mohr (Jan 31 2025 at 22:46):

Long story short, Richard proposed moving to Zig while we're rewriting so much of the compiler

Sam Mohr (Jan 31 2025 at 22:46):

And we all seemed to agree

Joshua Warner (Jan 31 2025 at 22:46):

Ok cool

Sam Mohr (Jan 31 2025 at 22:47):

So next weekend, I'll have a PR ready for us to look over the IRs for a Zig compiler

Sam Mohr (Jan 31 2025 at 22:47):

Written in Zig in our existing repo

Anthony Bullard (Jan 31 2025 at 22:49):

@Joshua Warner I was hoping we could talk about what we would want the parser to look like in a new Zig world. It's a good opportunity for us to implement a lot of the things we've talked about behind the scenes

Sam Mohr (Jan 31 2025 at 22:49):

We'll be all opening said PR in Zed and collaboratively shaping it how we think things should look, and then maybe starting to implement something pretty soon!

Sam Mohr (Jan 31 2025 at 22:50):

I'm busy this weekend vacationing out of town, but don't let my plans to set up these IRs next weekend stop people from doing something before I do for next weekend's meeting

Joshua Warner (Jan 31 2025 at 22:52):

For the parser and ast, I’d start by at least loosely following the design of the parser and ast for zig itself

Anthony Bullard (Jan 31 2025 at 22:52):

Probably wouldn't be a bad choice :-)

Anthony Bullard (Jan 31 2025 at 22:53):

Let's talk sometime in the next week if you have some time. Nothing formal, just over Zulip async

Anthony Bullard (Jan 31 2025 at 22:53):

@Sam Mohr I think someone should set our intention and add a top-level directory for the Zig project

Sam Mohr (Jan 31 2025 at 22:57):

I'll do that ASAP, not sure when the "possible" part is

Richard Feldman (Jan 31 2025 at 23:18):

to Ayaz's point about debuggability and struct-of-arrays, as far as I can tell Zig's MultiArrayList has the same ergonomics as a normal Rust Vec except it's automatically struct-of-arrays behind the scenes

Richard Feldman (Jan 31 2025 at 23:18):

so the (very serious) debuggability/ergonomics downside we've had of doing SoA in Rust may just be a non-issue in Zig

Richard Feldman (Feb 01 2025 at 00:56):

also @Andrew Kelley mentioned to me, and said it was ok to share:

if your dev loop ever gets longer than like 1 second let me know, I bet we can keep it under that number

Richard Feldman (Feb 01 2025 at 00:58):

here's the llvm bitcode builder: https://github.com/ziglang/zig/blob/4de2b1ea65e6b54cedfe56268a8bf8e9446addb0/src/codegen/llvm/Builder.zig

https://github.com/ziglang/zig/blob/4de2b1ea65e6b54cedfe56268a8bf8e9446addb0/src/codegen/llvm.zig#L9

Brendan Hansknecht (Feb 01 2025 at 02:06):

Also, with this rewrite, I assume we'll add in some of the platform changes. Specifically thinking it would be good to design from the beginning to assume that platforms pass in a gigantic struct of functions pointers. That struct would contain all effects and all roc_ special functions like for allocation. It would be used the entire call stack in roc. I think that will fix some linking problems and be very important for eventually wiring up a roc interpreter.

Brendan Hansknecht (Feb 01 2025 at 02:07):

Not to mention finally fix our llvm c-abi woes. I wonder if we can share some code for c abi with zig instead of doing it manually.

Luke Boswell (Feb 01 2025 at 02:10):

Yeah Richard's cooking a whole plan for fixing llvm by cooperating with the zig guys.

Brendan Hansknecht (Feb 01 2025 at 02:11):

Also, reminder to future llvm backend writer (maybe myself). Make sure to follow this to help make llvm happy: https://llvm.org/docs/Frontend/PerformanceTips.html

Luke Boswell (Feb 01 2025 at 02:11):

I'm definitely keen to scope out the platform related things we want to fix too. Like we can leave the shared memory stuff behind and just go straight to the new test provided by the host.

Ayaz Hafiz (Feb 01 2025 at 02:13):

what is the issue with llvm?

Brendan Hansknecht (Feb 01 2025 at 02:26):

Honestly the biggest annoyance with llvm is that it doesn't deal with calling conventions. Second biggest annoyance is that it expects the ir in a specific form to optimize well (like all allocas in the entry block)

Brendan Hansknecht (Feb 01 2025 at 02:27):

That is at least thinking from the code gen side

Brendan Hansknecht (Feb 01 2025 at 02:27):

Not a huge deal to do it manually, but would be nice to just share with the zig compiler since they have already had to solve some of these annoyances

Brendan Hansknecht (Feb 01 2025 at 02:28):

We currently have a lot of c-abi calling convention bugs in our llvm backend

Anthony Bullard (Feb 01 2025 at 02:43):

This is probably a dumb question - or will be construed as such - but is there a world where it's better and faster to compiler Roc to Zig?

Richard Feldman (Feb 01 2025 at 02:43):

Andrew shared this - https://github.com/ziglang/zig/blob/master/test/c_abi/cfuncs.c - it's not the actual implementation, but it's test cases written in Zig

Richard Feldman (Feb 01 2025 at 02:44):

Anthony Bullard said:

This is probably a dumb question - or will be construed as such - but is there a world where it's better and faster to compiler Roc to Zig?

it's definitely strictly slower, plus then we'd have to bundle an entire Zig compiler inside roc :sweat_smile:

Anthony Bullard (Feb 01 2025 at 02:45):

Well, I meant would there be a simplification to our compiler that would / could justify that dependency

Richard Feldman (Feb 01 2025 at 02:45):

compiling to C is more common, but yeah it's definitely slower

Brendan Hansknecht (Feb 01 2025 at 02:46):

I don't think compiling to c is that much simpler than compiling to llvm

Richard Feldman (Feb 01 2025 at 02:46):

it would be from the ABI perspective specifically

Richard Feldman (Feb 01 2025 at 02:46):

but in other ways probably harder

Brendan Hansknecht (Feb 01 2025 at 02:46):

Yep, that is the main gain. You don't have to implement c abi.

Richard Feldman (Feb 01 2025 at 02:46):

not sure if it would be a net win overall when you put the two together, not to mention what it would do to our build complexity :sweat_smile:

Anthony Bullard (Feb 01 2025 at 02:47):

I thought with Zig there is higher-level constructs that would make a lot of things simpler

Anthony Bullard (Feb 01 2025 at 02:47):

But just act like I never asked the question :rofl:

Brendan Hansknecht (Feb 01 2025 at 02:48):

Roc primitives are pretty simple and we already can write builtins in zig instead of raw llvm, so don't think otherwise using zig is a big difference

Richard Feldman (Feb 01 2025 at 02:48):

Andrew said:

https://github.com/ziglang/zig/blob/4de2b1ea65e6b54cedfe56268a8bf8e9446addb0/src/codegen/llvm.zig#L5423-L5722
it lowers from Zig IR ("AIR") to LLVM IR in this logic. So the best thing this codebase has to offer is perhaps the opportunity to port that ParamTypeIterator abstraction, which handles many calling conventions in addition to the C one
see also this: https://github.com/ziglang/zig/blob/4de2b1ea65e6b54cedfe56268a8bf8e9446addb0/lib/std/Target.zig#L3326-L3391

Brendan Hansknecht (Feb 01 2025 at 02:48):

Like we already have the ability to dish out to zig for anything complex that we don't want to write in raw llvm ir.

Brendan Hansknecht (Feb 01 2025 at 02:50):

Yeah, I bet if we can use some of zigs backend for llvm as a library, it will solve the hardest parts of working with llvm. That will be awesome

Richard Feldman (Feb 01 2025 at 02:50):

yeah I feel like if we:

start with Zig's C ABI test suite
convert the tests to test compiled Roc code instead of compiled Zig code
port Zig's C ABI code gen from consuming Zig IR to consuming Roc IR

...we should end up with something that works as well as Zig's does, which is to say - very well! :smiley:

Anthony Bullard (Feb 01 2025 at 02:51):

Yeah, like Joshua is saying, one advantage of Zig is that the Zig compiler is maybe approachable enough that if nothing else there could be some code sharing - or at least some patterns that we can pick up.

Richard Feldman (Feb 01 2025 at 02:51):

Andrew did add a caveat:

note that this isn't enough, you still have to follow certain undocumented rules about what LLVM types and attributes to use in parameters and return value

Richard Feldman (Feb 01 2025 at 02:52):

yeah plus Andrew is keen to help us out, so we're not just poking around Zig's code base in the dark with a flashlight

Anthony Bullard (Feb 01 2025 at 02:52):

By "sharing" I mean mostly copy/paste/iterate

Brendan Hansknecht (Feb 01 2025 at 02:57):

Richard Feldman said:

Andrew did add a caveat:

note that this isn't enough, you still have to follow certain undocumented rules about what LLVM types and attributes to use in parameters and return value

Yep, makes sense. But should ease some pain still.

Isaac Van Doren (Feb 01 2025 at 04:06):

Whoa, I was not expecting this, nice!!

Isaac Van Doren (Feb 01 2025 at 04:07):

I cannot believe how much has been changing in Roc land this last year :big_smile::star:

Joshua Warner (Feb 01 2025 at 04:18):

Tried translating an old tokenizer experiment from rust to zig: https://github.com/joshuawarner32/roc/blob/c80a3ba0e453abf495619a1abf8ed28c9e6ff909/zig/tokenize/src/main.zig

Brendan Hansknecht (Feb 01 2025 at 05:39):

This reminds me. For zigs tokenizer, didn't they remove the tag. Instead they just load the offset and reparse the token (cause the token is essentially the tag). Not saying we need to do that, but reminded of the DOD talk on the new zig irs and such.

Other random cool technique. For chomping whitespace and trivial things, some sort of SWAR mask may enable easily going 8 bytes at a time for essentially the same cost.

Joshua Warner (Feb 01 2025 at 06:07):

For chomping whitespace and trivial things, some sort of SWAR mask may enable easily going 8 bytes at a time for essentially the same cost.

Roc actually already does this! (I wrote that part :P)

Joshua Warner (Feb 01 2025 at 06:08):

https://github.com/roc-lang/roc/blob/main/crates/compiler/parse/src/blankspace.rs#L263

Sam Mohr (Feb 01 2025 at 06:13):

Time and time again, it seems the Roc parser is just Joshua turtles all the way down

Sam Mohr (Feb 01 2025 at 13:07):

How do people feel about scoping in a bit more than just functions, values, and strings to start with? If we want to allow for dev parallelism, we want to give people tasks they don't require as much coordination with. I think once the compiler is working all the way through with just strings, adding subsequent features will happen where a single change affects the whole pipeline. Seems like we can push that off to start with by making the stages bigger silos for now

Sam Mohr (Feb 01 2025 at 13:08):

Currently, we need:

functions
values
strings
something like tags for the lambda set later stages

Sam Mohr (Feb 01 2025 at 13:08):

Maybe adding in if-else or records could be something else that's not overly complex?

Sam Mohr (Feb 01 2025 at 13:16):

Also, a naive but parallelism friendly single-Env replacement would be to get rid of Env as a concept and to save everything in a stage's IR for a given module. When compiling a module for stage N, we compile its dependencies first and pass their N+1 IRs as args for the module that depends on them. For example, if module Foo imports module Bar, then during type specialization, we'd type specialize Bar first, and then pass the TypeSpecIR for Bar as an arg to specialize_types(typecheck_ir_for_module: TypeCheckIR, type_spec_ir_per_dep_modules: Map<ModuleID, TypeSpecIR>)

Sam Mohr (Feb 01 2025 at 13:17):

It would be better to load in the relevant data into the main module's IR to avoid having to look in two places for deps, but we should be able to make that improvement later

Sam Mohr (Feb 01 2025 at 13:18):

I'm gonna run with that approach for the Zig skeleton unless someone has a better but still simple idea

Richard Feldman (Feb 01 2025 at 13:29):

it's probably about the same if we sketch things out but don't have all of them implemented end to end right away

Richard Feldman (Feb 01 2025 at 13:29):

I think the main thing is that we start with a simple skeleton and don't try to bring everything in at the outset :big_smile:

Sam Mohr (Feb 01 2025 at 13:29):

Agreed

Richard Feldman (Feb 01 2025 at 13:30):

just a small handful seems fine

Sam Mohr (Feb 01 2025 at 13:33):

Tangentially, does anyone have opinions on whether typechecking would look how it does today or would we do it structurally differently if we rewrote it (as is the plan). Somewhat an @Ayaz Hafiz question I presume. I think I know what the IR will look like for most stages, but typechecking is like the one hole in my mental model

Ayaz Hafiz (Feb 01 2025 at 14:51):

i would say rewrite it if it makes it simpler for you

Ayaz Hafiz (Feb 01 2025 at 14:51):

the important thing is not whether it's fine rn imo

Ayaz Hafiz (Feb 01 2025 at 14:51):

it's can you build on top of it

Ayaz Hafiz (Feb 01 2025 at 14:52):

i would maybe write a small unification-style typechecker in a toy language to build an intuition for how it works if you all don't have a solid one already, you can pluck a parser for some random language off the internet and write a checker for it

Sam Mohr (Feb 01 2025 at 14:55):

That's a good idea

Sam Mohr (Feb 01 2025 at 14:56):

I think the unification strategy makes sense to me (roughly), though writing an impl myself should help a good deal

Sam Mohr (Feb 01 2025 at 14:57):

The thing that feels messy to me is adding type constraints right now, and I'm wondering if there's a way to reduce the vibe of "hopefully we add all the right constraints"

Sam Mohr (Feb 01 2025 at 14:58):

It feels easy in the current system to miss or add too many constraints

Sam Mohr (Feb 01 2025 at 15:00):

And I don't know a good strategy to help with that besides maybe grouping behaviors like "constrain a function type" and "constraint a record" and pulling them into separate functions that take a struct for their args, meaning it's harder to misalign on what constraints to add?

Sam Mohr (Feb 01 2025 at 15:00):

Idk

Ayaz Hafiz (Feb 01 2025 at 15:22):

maybe get rid of constraint gen

Ayaz Hafiz (Feb 01 2025 at 15:22):

just unify when you need to

Ayaz Hafiz (Feb 01 2025 at 15:22):

it's fine

Sam Mohr (Feb 01 2025 at 15:24):

Oh??

Sam Mohr (Feb 01 2025 at 15:24):

Do you know of a language that does that?

Sam Mohr (Feb 01 2025 at 15:24):

I'll just read their compiler

Anthony Bullard (Feb 01 2025 at 15:25):

I’ll follow along with you Sam, unification is where I went off the rails with my language

Sam Mohr (Feb 01 2025 at 15:35):

brother

Ayaz Hafiz (Feb 01 2025 at 15:56):

yes, plenty of languages keep constraints inline while typechecking

Ayaz Hafiz (Feb 01 2025 at 15:58):

fn infer(..) -> type_variable {
...
case
  Apply(f, x) {
    t_f_in_infer = make_var()
    t_f_out_infer = make_var()
    t_f_infer = make_var(TFn(t_f_in_infer, t_f_out_infer))
    t_f = infer(f)
    t_x = infer(x)
    unify(t_f, t_f_infer)
    unify(t_x, t_f_in_infer)
    t_f_out_infer

Sam Mohr (Feb 01 2025 at 15:59):

Okay, cool

Sam Mohr (Feb 01 2025 at 15:59):

I'll mess with it

Ayaz Hafiz (Feb 01 2025 at 15:59):

If unify(t_f, t_f_infer) fails, emit an error about f not being a function. If unify(t_x, t_f_in_infer) fails, an error about the argument type being wrong

Sam Mohr (Feb 01 2025 at 16:00):

Makes sense!

Richard Feldman (Feb 01 2025 at 16:07):

one thing that took me awhile to understand is the "occurs check"

Richard Feldman (Feb 01 2025 at 16:07):

as I understand it (Ayaz, feel free to correct or elaborate on any of this!) the basic problem it's trying to solve is that if you have an infinitely recursive type, you don't want the type-checker to get stuck in an infinite loop hopping from one type to another trying to infer their types, but since they depend on each other, it just never terminates

Richard Feldman (Feb 01 2025 at 16:08):

so the "occurs check" is to say "hey, does this particular type variable occur anywhere in this other type?" (because if so, we have an infinite loop!)

Richard Feldman (Feb 01 2025 at 16:08):

so it just descends into that other type and tries to find any instances of the requested variable - as soon as it finds one, it returns

Richard Feldman (Feb 01 2025 at 16:08):

so the occurs check can't possibly get stuck in a loop

Richard Feldman (Feb 01 2025 at 16:09):

so there are certain places in unification where we're like "ok before proceeding, stop and do an occurs check to make sure that proceeding won't get stuck"

Richard Feldman (Feb 01 2025 at 16:09):

(and then bail out and report an error about it being an infinite type if we detect one)

Richard Feldman (Feb 01 2025 at 16:10):

I'm not sure if there are other use cases where it comes up, but my understanding is that this is its main job

Richard Feldman (Feb 01 2025 at 16:10):

also, there's the concept of rank

Richard Feldman (Feb 01 2025 at 16:12):

Folkert and I spent a lot of time figuring out how that one works in Elm's compiler, and what we concluded was:

it seems to be necessary for both performance and correctness
you have to increment it and decrement it at the right times
"The way it works is that it's for like...defs, and you sort of increase it when...decreasing def depth, or something? Well, but then with function arguments it's...well, and then pattern matching it's....anyway, it's definitely something. And if you don't increment or decrement it where you needed to, or increment or decrement where you shouldn't, type-checking breaks." :shrug:

Richard Feldman (Feb 01 2025 at 16:13):

it would be rad to actually understand for real how it's supposed to work :laughing:

Richard Feldman (Feb 01 2025 at 16:13):

there's also a thing called pools, and I have zero clue what they do

Sam Mohr (Feb 01 2025 at 16:14):

The explanation for ranks is linked here in the compiler: https://github.com/roc-lang/roc/blob/670d2550603b9bb29ab238b6ed6180778a5d107d/crates/compiler/solve/src/solve.rs#L78

Sam Mohr (Feb 01 2025 at 16:14):

http://okmij.org/ftp/ML/generalization.html#levels

Sam Mohr (Feb 01 2025 at 16:15):

I'll try to re-read it before next week

Richard Feldman (Feb 01 2025 at 16:17):

The rank tracks the number of let-bindings a variable is "under". Top-level definitions have rank 1. A let in a top-level definition gets rank 2, and so on.

This is a better explanation than mine from above: "The way it works is that it's for like...defs, and you sort of increase it when...decreasing def depth, or something? Well, but then with function arguments it's...well, and then pattern matching it's....anyway, it's definitely something."

Richard Feldman (Feb 01 2025 at 16:18):

yeah, so the tricky part is taking that general idea and figuring out exactly which situations need it, and then making sure to pair the incrmeents and decrements correctly (e.g. decrement the current rank the appropriate number of times when exiting a particular scope)

Sam Mohr (Feb 01 2025 at 16:20):

Yeah, wondering if there's a good way to encode it into our approach so we can't mess it up

Sam Mohr (Feb 01 2025 at 16:20):

Gonna take some brain power for that, probably

Anthony Bullard (Feb 01 2025 at 16:20):

This is a let in the ML understanding, so every successive def is a new let (since in ML there is only expressions)

Anthony Bullard (Feb 01 2025 at 16:21):

That’s why SML and OCaml use let x = … in because the next let is a child expression

Richard Feldman (Feb 01 2025 at 16:22):

yeah we already convert to that representation

Richard Feldman (Feb 01 2025 at 16:23):

as in, we start with a flat list of defs but then convert them to nested during canonicalization

Richard Feldman (Feb 01 2025 at 16:23):

in other words:

a = 1
b = a + 1
c = b + 1
c + 1

...canonicalizes to the equivalent of:

a = 1
    b = a + 1
        c = b + 1
        c + 1

Richard Feldman (Feb 01 2025 at 16:24):

I forget what the reason for this is - maybe because they each have their own scopes?

Sam Mohr (Feb 01 2025 at 16:25):

Not sure what the original reason was, but it does make the IR simpler. "either it's an expression, or a single def followed by an expression."

Richard Feldman (Feb 01 2025 at 16:26):

there's also a correctness aspect to it

Anthony Bullard (Feb 01 2025 at 16:26):

I just know that no ML-derived type system really works without it

Richard Feldman (Feb 01 2025 at 16:26):

we originally didn't do this, and when we switched to it, it fixed bugs

Richard Feldman (Feb 01 2025 at 16:26):

yeah

Richard Feldman (Feb 01 2025 at 16:26):

I don't remember the exact reason though, this would have been like 2019 :sweat_smile:

Richard Feldman (Feb 01 2025 at 16:27):

that said, I don't think we actually need to rewrite the IR into that form if we don't want to

Richard Feldman (Feb 01 2025 at 16:27):

but we do need to pass the info to the type-checker as if it were in that form, if that makes sense

Anthony Bullard (Feb 01 2025 at 16:27):

I think it has to do with a new var has to be scoped and it has a lifetime that has to then be resolved and rolled up into its parent

Richard Feldman (Feb 01 2025 at 16:28):

yeah like the "add this to the scope, increment the rank" and "remove it from the scope and decrement the rank" stuff needs to happen as if it were in that nested form, regardless of whether it is in the IR

Anthony Bullard (Feb 01 2025 at 16:28):

Yes, I want to move parsing to be a simpler IR that’s flatter, but can will have to continue to do what it does today

Richard Feldman (Feb 01 2025 at 16:29):

amusingly, I think in the someday-future when we start doing adjacency nodes (not in this version of the compiler!) it might end up being the same haha

Richard Feldman (Feb 01 2025 at 16:29):

like there might not end up being a distinction at the IR level between nested and not-nested adjacent defs

Brendan Hansknecht (Feb 01 2025 at 16:31):

Richard Feldman said:

as in, we start with a flat list of defs but then convert them to nested during canonicalization

I'm really surprised this makes any difference. Feels like you could just walk the dense list and derive all of the same information.

Richard Feldman (Feb 01 2025 at 16:39):

yeah that's what I mean about the canonical IR being decoupled from this, but the type checker needing to be told it's in the nested form even if it's not

Richard Feldman (Feb 01 2025 at 16:47):

speaking of general principles, we talked about this a bit on the call, but I definitely think we should put a stronger value on human-readable code in tests

Sam Mohr (Feb 01 2025 at 16:48):

Luke is in alignment with you there, and I think also Joshua. I'm confident we'll make that happen!

Richard Feldman (Feb 01 2025 at 16:52):

e.g. parsing tests consistently start with Roc code as inputs, and I think that should also be true of the other stages of the compiler

Richard Feldman (Feb 01 2025 at 16:52):

and as far as outputs go, transforming them into something more human-readable than raw IR nodes also seems like a good idea

Richard Feldman (Feb 01 2025 at 16:52):

and also fuzzing right out the gates :big_smile:

Richard Feldman (Feb 01 2025 at 16:53):

I think fuzzing would have caught a lot of those Rank mistakes earlier

Richard Feldman (Feb 01 2025 at 16:53):

and missing occurs checks

Anthony Bullard (Feb 01 2025 at 16:53):

Yes each IR should have a human readable representation that is divorced from the specific languages default formatting

Brendan Hansknecht (Feb 01 2025 at 18:02):

Also, not sure how helpful this would be, but for mlir, it is useful to be able to write tests in terms of human readable ir and single pass transformations. That said, it doesn't always work to break down at that level. But can make IRs and passes a lot more debuggable if you can do tests like that easily.

Richard Feldman (Feb 01 2025 at 19:48):

a thought: I wonder if it would be helpful to start making an early API and test suite for editor commands

Richard Feldman (Feb 01 2025 at 19:49):

like even if we don't have the implementations right away, just having the pieces in there as part of the skeleton so we can think about which operations have which inputs and outputs

Richard Feldman (Feb 01 2025 at 19:49):

so that can inform how everything fits together

Richard Feldman (Feb 01 2025 at 19:50):

instead of trying to implement it after the fact on top of a process that was designed with only cli builds in mind

Anthony Bullard (Feb 01 2025 at 19:51):

I’m assuming you are talking about the Roc Editor, not LSP Code Actions?

Richard Feldman (Feb 01 2025 at 19:51):

I think it's the same consideration regardless

Sam Mohr (Feb 01 2025 at 19:51):

Structural editing would be the difference, I think

Anthony Bullard (Feb 01 2025 at 19:51):

(I wasn’t around when that was a live topic of discussion so I don’t know the first thing about the former)

Sam Mohr (Feb 01 2025 at 19:51):

Otherwise should be basically the same

Richard Feldman (Feb 01 2025 at 19:51):

like "the user has asked for this information and provided this input"

Richard Feldman (Feb 01 2025 at 19:52):

I don't think structural editing is as good an idea as I used to :sweat_smile:

Richard Feldman (Feb 01 2025 at 19:52):

but also I don't think it makes that much of a difference

Richard Feldman (Feb 01 2025 at 19:52):

incidentally, I know that Rust and C# were both designed to be editor-first compilers, but they're both unacceptably slow to me, so I'm not saying we should do what they did

Richard Feldman (Feb 01 2025 at 19:53):

just that we should keep these use cases in mind early

Richard Feldman (Feb 01 2025 at 20:02):

a good example of what I'm talking about:

"find all places where this type is used" needs to start with a source code range in one file and then return source file ranges in potentially multiple other modules
this requires knowing how to translate from source file range in one file into type ID
then it requires going through other modules and figuring out where that type ID is used within those modules
then it requires translating all those usage sites back into source locations

Richard Feldman (Feb 01 2025 at 20:06):

so it would be good to make sure that once compilation is done, we can do all of that without it being a big pain :big_smile:

Sam Mohr (Feb 01 2025 at 20:10):

Yeah, that can be pretty agnostic. Good idea

Ayaz Hafiz (Feb 01 2025 at 20:10):

batch compiler and language server compilers (query compilers) are totally different use cases. i wouldn't try to fit both in one

Ayaz Hafiz (Feb 01 2025 at 20:11):

regarding list of statements vs expressions - they are isomorphic. just nested expressions are a lot easier in practice to work with. you can get the type of a whole expression at one node vs looking at the last item in the statement list

Joshua Warner (Feb 01 2025 at 20:13):

i wouldn't try to fit both in one

The roslyn C# compiler is built exactly like that, and it seems to have worked out reasonably well at least? Typescript too, I think?

Would be interesting to talk to someone on one of those teams to get their take

Ayaz Hafiz (Feb 01 2025 at 20:14):

yes but have two different modes

Anthony Bullard (Feb 01 2025 at 20:15):

Ayaz Hafiz said:

regarding list of statements vs expressions - they are isomorphic. just nested expressions are a lot easier in practice to work with. you can get the type of a whole expression at one node vs looking at the last item in the statement list

I wouldn’t try to use statements past parsing

Ayaz Hafiz (Feb 01 2025 at 20:15):

regarding ranks - this is purely a performance optimization. it's not required. to build an intutition i think Mimram's Program=Proof is the best resource; see section 4.4.4 specifically starting at page 209. He explains the inference mechanism in the most succinct and understandable way i've seen. https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching/INF551/course.pdf

Anthony Bullard (Feb 01 2025 at 20:15):

Just trying to minimize the size of the parse AST

Anthony Bullard (Feb 01 2025 at 20:16):

And make it more fault tolerant for error reporting

Richard Feldman (Feb 01 2025 at 20:16):

Ayaz Hafiz said:

regarding ranks - this is purely a performance optimization. it's not required.

right, but if you have them and get them wrong, it seems to break correctness :sweat_smile:

Ayaz Hafiz (Feb 01 2025 at 20:16):

yes because if you get it wrong you either over generalize or under generalize

Ayaz Hafiz (Feb 01 2025 at 20:17):

the alternative is to generalize based on the kind of expression

Richard Feldman (Feb 01 2025 at 20:18):

Ayaz Hafiz said:

batch compiler and language server compilers (query compilers) are totally different use cases. i wouldn't try to fit both in one

do you think we should make two completely different compilers? :thinking:

Ayaz Hafiz (Feb 01 2025 at 20:19):

occurs check - this is also not required in a language with infinite types, it is typically employed only to detect recursion and stop it. The current implementation uses it to insert "recursion points" that mark where the compiler should box a value at runtime. this is also unnecessary. Normal unification works just as well as long as you add a check to see that if you have already tried to unify two types A and B, if A and B are seen again, you short circuit the unification (hopefully it's clear why this is correct but i can explain if helpful). Then during code generation you determine the recursion point separately.

Ayaz Hafiz (Feb 01 2025 at 20:19):

no, i would just start with the batch compiler use case. because programs are smaller queries are going to be fast even over the batch. then the query use case can be solved later

Richard Feldman (Feb 01 2025 at 20:20):

ahh right, I forgot about detecting recursion!

Richard Feldman (Feb 01 2025 at 20:22):

the "have we seen this already?" check sounds simpler in that we could just implement it once, as opposed to having to call occurs in a bunch of different places, right?

Ayaz Hafiz (Feb 01 2025 at 20:23):

in principle you only need to run occurs after generalization so i dont think it makes a difference in terms of number of places it's called

Ayaz Hafiz (Feb 01 2025 at 20:23):

but i would push it down until its actually needed

Richard Feldman (Feb 01 2025 at 20:24):

you mean in code gen?

Ayaz Hafiz (Feb 01 2025 at 20:24):

yes

Richard Feldman (Feb 01 2025 at 20:24):

seems reasonable! :+1:

Richard Feldman (Feb 01 2025 at 20:33):

it'll be nice that we won't need to deal with optional record fields in the type checker

Richard Feldman (Feb 01 2025 at 20:33):

that will really simplify the record logic

Richard Feldman (Feb 01 2025 at 20:33):

oh yeah, we should remember to handle as whenever we're doing pattern matches and imports and such

Richard Feldman (Feb 01 2025 at 20:34):

right now it's implemented pretty inconsistently :sweat_smile:

Anthony Bullard (Feb 01 2025 at 21:02):

Do we have a strong direction around what patterns we will use to get around Zig's lack of interfaces? _I_ know plenty of ways to deal with it, but as I'm sitting here lookin at our current parsing code, we use a number of traits. The most obvious being Parser.

Anthony Bullard (Feb 01 2025 at 21:03):

And I'd like to at least know that I'm going down a road that is going to be well-received by others here

Brendan Hansknecht (Feb 01 2025 at 21:04):

Why is parser an interface and not just a concrete type with methods?

Anthony Bullard (Feb 01 2025 at 21:06):

It's a pattern from FP languages for recursive descent

Anthony Bullard (Feb 01 2025 at 21:06):

I'm actually thinking about explicitly NOT trying to copy that pattern

Richard Feldman (Feb 01 2025 at 21:06):

yeah

Richard Feldman (Feb 01 2025 at 21:06):

I don't think we need it

Anthony Bullard (Feb 01 2025 at 21:07):

I have a structure for parsers (in procedural languages) that I'm pretty familiar with

Anthony Bullard (Feb 01 2025 at 21:07):

I'm going to see how far it can take me

Anthony Bullard (Feb 01 2025 at 21:07):

And then I'll present it

Brendan Hansknecht (Feb 01 2025 at 21:07):

Also, as for uses of traits/interfaces in general. I think the zig equivalent of a trait is a comptime returned struct with different function implementations based on the concrete implementation. And interfaces are the same thing but runtime function pointers.

Anthony Bullard (Feb 01 2025 at 21:08):

Can you show an example of what you are talking about? Because a Zig trait equivalent is very verbose usually

Anthony Bullard (Feb 01 2025 at 21:09):

Unless you want to do some very tedious things in comptime and have all implementations live in one file

Anthony Bullard (Feb 01 2025 at 21:09):

But a Zig wizard I am not

Richard Feldman (Feb 01 2025 at 21:10):

I think the recursive descent parser design Josh has in mind wouldn't want this anyway

Brendan Hansknecht (Feb 01 2025 at 21:10):

Yeah, I guess for what I am thinking it requires central registration for traits, but they could still be implemented in various files.

Brendan Hansknecht (Feb 01 2025 at 21:13):

Oh, looks like you can do it without central registration in zig: https://github.com/permutationlock/zimpl?tab=readme-ov-file#example

Brendan Hansknecht (Feb 01 2025 at 21:14):

That said, I highly doubt we'll need this in the compiler. Almost everything is static concrete types or can be designed to be so.

Joshua Warner (Feb 01 2025 at 21:19):

Fixed some bugs in the tokenizer I pushed earlier: https://github.com/joshuawarner32/roc/blob/zig-tokenizer/zig/tokenize/src/main.zig

Now it:

Can properly handle string interpolation, including nested interpolation
Handles unicode identifiers
Tokenizes all the current syntax tests without reporting an error (which is not to say it does so correctly - but there are no _obvious_ problems)

Anthony Bullard (Feb 01 2025 at 21:20):

Did you make it handle snake_case identifiers and import statements in the header?

Joshua Warner (Feb 01 2025 at 21:22):

It does not currently handle snake case; I think those'll probably be tokenized incorrectly at present

Joshua Warner (Feb 01 2025 at 21:22):

Not hard to add tho

Anthony Bullard (Feb 01 2025 at 21:22):

Yeah I did that on my local copy

Joshua Warner (Feb 01 2025 at 21:22):

There's not a whole lot it needs to do in order to handle import statements, with the one exception of making a token for the import keyword (which needs to be done)

Richard Feldman (Feb 01 2025 at 21:23):

it looks like there's already enough here to try out zig's builtin fuzzing on! :smiley:

Joshua Warner (Feb 01 2025 at 21:23):

Indeed!

Joshua Warner (Feb 01 2025 at 21:23):

There is one minor syntax tweak I'd like to make, which I think was discussed elsewhere, and that's changing up the ext syntax from {a: Str}ext to (IIRC?) {a: Str, ..ext} - or something similar

Richard Feldman (Feb 01 2025 at 21:23):

that's it exactly! :100:

Anthony Bullard (Feb 01 2025 at 21:23):

Sweet

Anthony Bullard (Feb 01 2025 at 21:24):

@Joshua Warner What are your thoughts on the discussion above?

Joshua Warner (Feb 01 2025 at 21:24):

Awesome. The old syntax requires some bending over backwards that I'm going to happily rip out of the zig tokenizer.

Anthony Bullard (Feb 01 2025 at 21:25):

Re: replacing the trait-laden system we have in Rust with a more straight-forward parser struct with just a ton of methods on it working on some sort of input/state struct?

Joshua Warner (Feb 01 2025 at 21:25):

Yeah, that's exactly my thinking.

Joshua Warner (Feb 01 2025 at 21:25):

Straight-forward recursive descent

Anthony Bullard (Feb 01 2025 at 21:25):

Sweet. I'll continue exploring down this path then

Anthony Bullard (Feb 01 2025 at 21:25):

Using your tokenizer output as input

Joshua Warner (Feb 01 2025 at 21:26):

Maybe we should have a branch on the main roc repo we can collaborate on?

Anthony Bullard (Feb 01 2025 at 21:26):

Thank you for saving me a couple of hours of work with that!

Anthony Bullard (Feb 01 2025 at 21:26):

Probably

Anthony Bullard (Feb 01 2025 at 21:27):

Right now I don't have a ton of time so I'm just poking around - remembering how to write Zig, and exploring the design space

Joshua Warner (Feb 01 2025 at 21:27):

If you're interested in design space exploration, here's an old prototype I wrote: https://github.com/joshuawarner32/roc/blob/cypress/crates/compiler/cypress/src/parse.rs

Anthony Bullard (Feb 01 2025 at 21:29):

Cool, I'll check it out

Anthony Bullard (Feb 01 2025 at 21:29):

This is a design I've used (with some small tweaks) in parsers in Dart, Go, and Zig: https://gist.github.com/gamebox/947afc1fb18cae753be2d25a7ee777dd

Anthony Bullard (Feb 01 2025 at 21:30):

Of course the AST-level methods would differ a lot

Joshua Warner (Feb 01 2025 at 21:31):

Yep, that looks like roughly the pattern I typically use.

Richard Feldman (Feb 01 2025 at 21:31):

btw Josh, check this out: https://github.com/ziglang/zig/pull/21257#issuecomment-2336865183

Anthony Bullard (Feb 01 2025 at 21:31):

Your prototype above is pretty INTERESTING!

Joshua Warner (Feb 01 2025 at 21:32):

(on the thing I posted above) I'm not super gung-ho on that direction at the moment, and it's probably not a thing we want to try to quickly prototype with, but that is somewhat closer zig's parser, and also has some perf advantages. (IIRC, it was ~2x faster than the current roc parser, prior to any attempts at micro optimization) But harder to read/develop.

Anthony Bullard (Feb 01 2025 at 21:32):

Yeah, it looks very DOD style

Anthony Bullard (Feb 01 2025 at 21:32):

Seems like something Andrew K would enjoy

Anthony Bullard (Feb 01 2025 at 21:33):

But micro-optimizing the parser perf doesn't make a ton of sense given how much compile time is usually devoted to it

Anthony Bullard (Feb 01 2025 at 21:33):

_IF_ it makes the parser harder maintain or debug

Joshua Warner (Feb 01 2025 at 21:34):

It was like 30% win, 70% loss for debuggability

Joshua Warner (Feb 01 2025 at 21:35):

The fact that the parser state was easily serializable and inspectable at any point make it really easy to visualize what's going.

Anthony Bullard (Feb 01 2025 at 21:36):

That's great. I'll try to read through it tonight when I have more time

Joshua Warner (Feb 01 2025 at 21:37):

But at the same time it's a very _weird_ pattern. The combination of the rigid linear layout of the AST, roc's somewhat weird grammar, and trying to do it "stackless" made it all a lot more headache than it really needed to be.

Richard Feldman (Feb 01 2025 at 21:38):

the thing I linked to makes it possible to do a really direct state machine style

Richard Feldman (Feb 01 2025 at 21:38):

where you're basically like "if I encounter this condition, GOTO this other place"

Richard Feldman (Feb 01 2025 at 21:38):

except instead of actual gotos, they're all labeled

Anthony Bullard (Feb 01 2025 at 21:39):

Sorry Richard, I don't have any experience with that kind of parser (I'm a one-trick pony with RD), do you have an example to share?

Anthony Bullard (Feb 01 2025 at 21:39):

(I can imagine a simple example, but at the scale of a full PL grammar it's hard)

Anthony Bullard (Feb 01 2025 at 21:39):

Actually it reminds me of the grammar that's made by Nearly

Richard Feldman (Feb 01 2025 at 21:40):

it's what simdjson uses for the parsing step, except they use actual goto :laughing:

Richard Feldman (Feb 01 2025 at 21:40):

(I don't have a link handy and I'm on mobile)

Anthony Bullard (Feb 01 2025 at 21:40):

Which is a tool I was forced into using for making the Formula language in our product (in JS)

Anthony Bullard (Feb 01 2025 at 21:40):

I'll take a look at it

Richard Feldman (Feb 01 2025 at 21:40):

I think they do it for performance, but it seems like it would be super straightforward to read and understand

Anthony Bullard (Feb 01 2025 at 21:40):

It's been a while since I read about simdjson

Richard Feldman (Feb 01 2025 at 21:41):

don't look at the lexer :laughing:

Anthony Bullard (Feb 01 2025 at 21:41):

I think complicated state machines can be very hard to understand a non-linear flow

Anthony Bullard (Feb 01 2025 at 21:42):

But I am open minded!

Richard Feldman (Feb 01 2025 at 21:43):

oh btw, I think we should restrict line lengths to u16 and total number of lines per file to u16

Richard Feldman (Feb 01 2025 at 21:43):

so regions can be more compact

Brendan Hansknecht (Feb 01 2025 at 21:43):

Think of it as a while loop with a switch statement in it over the current state of the state machine. It really isn't anything special.

Brendan Hansknecht (Feb 01 2025 at 21:43):

Just more optimized than that

Richard Feldman (Feb 01 2025 at 21:44):

I'm fine saying you can't have a line that's more than 65k lines long, and also that you can't have a .roc file that's more than 65k lines long

Anthony Bullard (Feb 01 2025 at 21:46):

I guess we’ll cross the “Why can’t you compile my 80k LOC module” bridge when we get there

Anthony Bullard (Feb 01 2025 at 21:46):

In some ways you could say we’ve made it if we do

Richard Feldman (Feb 01 2025 at 21:47):

oh that's easy: "Because .roc files aren't allowed to be that long. You have to split it up."

Brendan Hansknecht (Feb 01 2025 at 21:47):

I'm so excited for when this rewrite leads to having proper debug info in the llvm backend

Richard Feldman (Feb 01 2025 at 21:48):

:thinking: I wonder how debugging in an interpreter backend could work

Richard Feldman (Feb 01 2025 at 21:49):

step debugging I mean

Richard Feldman (Feb 01 2025 at 21:49):

I don't think it could possibly be compatible with gdb/lldb unless I'm missing something

Richard Feldman (Feb 01 2025 at 21:50):

I think it would need to do its own thing

Agus Zubiaga (Feb 01 2025 at 21:50):

Richard Feldman said:

oh btw, I think we should restrict line lengths to u16 and total number of lines per file to u16

Maybe even u8 line lengths?

Brendan Hansknecht (Feb 01 2025 at 21:50):

Probably need to look into pdb more. Probably has things to learn from

Agus Zubiaga (Feb 01 2025 at 21:50):

Never mind I guess that wouldn’t help because of alignment

Richard Feldman (Feb 01 2025 at 21:52):

yeah unless we did SoA regions or something :sweat_smile:

Brendan Hansknecht (Feb 01 2025 at 21:53):

Did zig end up going with a raw u32 offset? Then they recalculated line and col info if it is needed? Not sure that is a better choice. It is more flexible though.

Richard Feldman (Feb 01 2025 at 21:54):

hm actually in Zig it might be pretty easy to do a packed struct of a u32 split into a u20 and u12

Richard Feldman (Feb 01 2025 at 21:54):

oh yeah I think they did

Richard Feldman (Feb 01 2025 at 21:55):

I wonder about that for editor tooling, but maybe it's fine?

Richard Feldman (Feb 01 2025 at 21:55):

like the find all references example

Brendan Hansknecht (Feb 01 2025 at 21:55):

Cause you only need line and col for warnings and errors....I guess you also need it for expect and dbg messages...

Hmm, don't you need it for every node in llvm for debug info?

Richard Feldman (Feb 01 2025 at 21:55):

oh yeah probably

Brendan Hansknecht (Feb 01 2025 at 21:56):

And yeah, I think something like U22, U12 sounds amazing. Like pick a reasonable line that allows for longer files at the cost of shorter lines

Anthony Bullard (Feb 01 2025 at 21:59):

Using offset only is pretty typical

Anthony Bullard (Feb 01 2025 at 21:59):

You can just pass the input to the reporter and have it calculate the line:col positions on the fly

Anthony Bullard (Feb 01 2025 at 22:00):

My parser above does that

Richard Feldman (Feb 01 2025 at 22:03):

u32 split into a u20 and u12 would allow individual lines to be 4096 bytes and then each file could have up to 1M lines

Andrew Kelley (Feb 01 2025 at 22:03):

Joshua Warner said:

Tried translating an old tokenizer experiment from rust to zig: https://github.com/joshuawarner32/roc/blob/c80a3ba0e453abf495619a1abf8ed28c9e6ff909/zig/tokenize/src/main.zig

saw this last night and made this: https://clbin.com/m5TF7

when you have 2+ ArrayList with the same length, that's a great case for MultiArrayList which is equivalent but will have only 1 length field for all of them, as well as manage exactly 1 allocation for all the arrays

Andrew Kelley (Feb 01 2025 at 22:05):

Richard Feldman said:

so the (very serious) debuggability/ergonomics downside we've had of doing SoA in Rust may just be a non-issue in Zig

with the llvm backend it's still a pain in the ass to debug MultiArrayList, however, with our self-hosted backends (currently x86_64 only, aarch64 to follow next) we have custom dwarf + lldb fork that recognizes zig std lib types and makes debugging really nice. same thing goes for hash maps

this project may be interesting to watch long term since jacobly (author of above lldb fork as well as zig's x86_64 backend) has taken an interest in it

Brendan Hansknecht (Feb 01 2025 at 22:07):

Anthony Bullard said:

You can just pass the input to the reporter and have it calculate the line:col positions on the fly

Yeah, would just require calculating all line and col offsets when generating debug info. So not sure it is that much of a gain.

Brendan Hansknecht (Feb 01 2025 at 22:13):

I know this is a silly metric, but I hope with the new compiler rewrite I'll feel confident figuring out any part of the compiler. Currently, parser is manageable but pretty opaque to me. Some of type checking is fine, but a lot is hard to follow fully. Obviously don't really know where to start with our more complex type related passes. Then I can understand all the backends, but I think some of that is due to lots of exposure rather than them being clean or simple.

Anthony Bullard (Feb 01 2025 at 22:17):

Brendan Hansknecht said:

Anthony Bullard said:

You can just pass the input to the reporter and have it calculate the line:col positions on the fly

Yeah, would just require calculating all line and col offsets when generating debug info. So not sure it is that much of a gain.

For a normal use case you are reporting multiple orders of magnitude fewer problems than you are parsing AST nodes

Andrew Kelley (Feb 01 2025 at 22:18):

Richard Feldman said:

it looks like there's already enough here to try out zig's builtin fuzzing on! :smiley:

Fuzzing is very alpha right now, I recommend to join zig development if you want to play with it. I mean really alpha like I have a branch with everything rewritten that I haven't merged yet. I expect that to become beta quality during the 0.15.0 release cycle. I will say though having it integrated into the compiler tool chain and unit tests is going to be a killer feature. I even have a web UI where you can watch the fuzzer find new code paths in realtime

Anthony Bullard (Feb 01 2025 at 22:18):

I’ve used it in three parser projects and it was still fast enough

Andrew Kelley (Feb 01 2025 at 22:26):

Richard Feldman said:

I'm fine saying you can't have a line that's more than 65k lines long, and also that you can't have a .roc file that's more than 65k lines long

would you believe me if I told you we had a hand-written, hand-maintained file in the zig compiler codebase that uses 78% of this quota :grinning:

Brendan Hansknecht (Feb 01 2025 at 22:40):

Andrew Kelley said:

Richard Feldman said:

it looks like there's already enough here to try out zig's builtin fuzzing on! :smiley:

Fuzzing is very alpha right now, I recommend to join zig development if you want to play with it. I mean really alpha like I have a branch with everything rewritten that I haven't merged yet. I expect that to become beta quality during the 0.15.0 release cycle. I will say though having it integrated into the compiler tool chain and unit tests is going to be a killer feature. I even have a web UI where you can watch the fuzzer find new code paths in realtime

I assume it is easy to plug zig into existing c++ fuzzers currently? Like libfuzzer and afl? Just need to compile zig with some extra llvm flags for coverage info, right?

Luke Boswell (Feb 02 2025 at 00:19):

Glad to see everyone hyped about this. I'm going to move this discussion into our compiler development channel to recover the contributor coord thread. :smiley:

Notification Bot (Feb 02 2025 at 00:19):

292 messages were moved here from #contributing > contributor coordination meeting - Feb 2024 #2 by Luke Boswell.

Luke Boswell (Feb 02 2025 at 00:22):

Joshua Warner said:

Maybe we should have a branch on the main roc repo we can collaborate on?

I think this would be a good idea.

We discussed all the zig code should live under src/, so I think just start with that structure is ok.

Also for the parser stage - tokenizer specifically.. I think it would live somewhere like
/src/check/parse/tokenize/... based on the presentation we were organising verything according to it's function and nesting from the top-down, phases->stages->etc

Luke Boswell (Feb 02 2025 at 00:26):

One thing I'm really unsure about because I haven't researched it yet, is if we want or need to organise all of the zig modules into separate zig "packages" or if that is just unnecessary.

One of the goals is for it to be easy for tooling to reach in and use the code for multiple purposes.

Do we need to define these boundaries ahead of time and separate them into packages? or are zig modules easy to refer to, like rust crates?

Richard Feldman (Feb 02 2025 at 00:45):

I think we should use a separate folder but not a separate branch

Richard Feldman (Feb 02 2025 at 00:45):

better to just keep merging stuff into main since we're not going to cause merge conflicts with any existing PRs anyway :big_smile:

Luke Boswell (Feb 02 2025 at 00:49):

To clarify this /src/check/parse/tokenize/..., in my mind this is a structure like;

check phase
parse stage
tokenize function

I like to structure things based on their function or behaviour, for at least the top one or two levels. But I appreciate that this can be a controversial topic.

Brendan Hansknecht (Feb 02 2025 at 00:50):

Yeah, trying to organize roughly based on pass ordering and IR sounds like a great idea. Should help avoid dependency messes.

Brendan Hansknecht (Feb 02 2025 at 00:51):

Though tokenize may just be tokenize.zig in this case. Only needs to be a folder if it is so complex that it needs many files

Andrew Kelley (Feb 02 2025 at 00:54):

Brendan Hansknecht said:

Andrew Kelley said:

Richard Feldman said:

it looks like there's already enough here to try out zig's builtin fuzzing on! :smiley:

Fuzzing is very alpha right now, I recommend to join zig development if you want to play with it. I mean really alpha like I have a branch with everything rewritten that I haven't merged yet. I expect that to become beta quality during the 0.15.0 release cycle. I will say though having it integrated into the compiler tool chain and unit tests is going to be a killer feature. I even have a web UI where you can watch the fuzzer find new code paths in realtime

I assume it is easy to plug zig into existing c++ fuzzers currently? Like libfuzzer and afl? Just need to compile zig with some extra llvm flags for coverage info, right?

Yeah that all works, I recently tried out that use case and fixed a few codegen things to make it a smooth experience. I think Loris has a nice example somewhere, let me grab a link and make sure it still builds

Brendan Hansknecht (Feb 02 2025 at 02:02):

Useful talks to watch if you want more context on DOD and how things are done in the zig compiler. Likely will help with thinking about how to model data structures in the new compiler:
Andrew Kelley Practical Data Oriented Design (DoD)
Data-Oriented Design Revisited: Type Safety in the Zig Compiler - Matthew Lugg

Brendan Hansknecht (Feb 02 2025 at 02:06):

second talk especially has zig compiler ir specific designs and example code.

Joshua Warner (Feb 02 2025 at 02:48):

Richard Feldman said:

oh btw, I think we should restrict line lengths to u16 and total number of lines per file to u16

We currently use u32's to indicate positions in the syntax tree at least (and I _think_ elsewhere). Just byte index into the original string. That's both (1) the same size as 2 u16's, and (2) more flexible - i.e. you can have 17k lines just fine, so long as they're normal length lines.

I have definitely worked in projects with one or more files with 16k+ lines. While I don't want to encourage that, it's also a small enough limit that it doesn't feel worth it to enforce.

Sam Mohr (Feb 02 2025 at 02:59):

Seems really easy to go over 16k lines with codegen

Andrew Kelley (Feb 02 2025 at 03:16):

re: #compiler development > zig compiler - spike @ 💬

https://github.com/kristoff-it/zig-afl-kit/

if you give this a spin, I recommend to have a chat with Loris if you run into any issues. I didn't try using AFL yet other than reading the source code since I'm more interested in advancing the integrated fuzzing implementation

Joshua Warner (Feb 02 2025 at 03:18):

@Andrew Kelley Applied your suggestions - thanks!
FWIW, the reason for adding an extra entry to self.output.offsets was essentially the same job an Eof token would be doing. In the parser that corresponded to this, having that there avoided some branches - but I didn't need the actual eof token itself, so I never added it. :man_shrugging: Went ahead and replaced that with a proper EndOfFile token.

Joshua Warner (Feb 02 2025 at 03:19):

Also added the import keyword, removed the NoSpace weirdness, and it now supports snake_case idents

Andrew Kelley (Feb 02 2025 at 03:21):

ah yeah I figured it was something like that

Joshua Warner (Feb 02 2025 at 03:24):

Tokenizer PR here: https://github.com/roc-lang/roc/pull/7569

Andrew Kelley (Feb 02 2025 at 03:24):

another thing you might have fun playing with is labeled switch continue. In the case that the value used is comptime known, it's a direct branch to the other case. This is a common optimization trick used by emulators for example. When we switched our tokenizer to use it, we observed a 13% perf improvement

Richard Feldman (Feb 02 2025 at 03:24):

Joshua Warner said:

Tokenizer PR here: https://github.com/roc-lang/roc/pull/7569

IT BEGINS

Joshua Warner (Feb 02 2025 at 03:25):

(btw, please please rip my zig code to shreds; this is a useful learning opportunity)

Joshua Warner (Feb 02 2025 at 03:25):

re: labeled switch continue
Ohhhh Richard posted that above, but I didn't get the connection initially. I'll check it out!

Andrew Kelley (Feb 02 2025 at 03:26):

it's also just like really satisfying to use for some reason, idk if it's just me

Andrew Kelley (Feb 02 2025 at 03:29):

Joshua Warner said:

(btw, please please rip my zig code to shreds; this is a useful learning opportunity)

ok, you got it :smile: I trust you to tell me if it becomes too much

Andrew Kelley (Feb 02 2025 at 03:42):

I'll be curious to hear how the perf measures up old code vs new code, although I'm sure the tokenizer is insignificant compared to the rest of the pipeline

Brendan Hansknecht (Feb 02 2025 at 06:13):

General request:

As we spin up this new compiler, can we try to add file and function doc comments to everything (or at least everything exposed). They can be super simple, but I think if we start from the core with a culture of documentation it will be very useful. I think we will be much much more likely to document the compiler flow if we are separate stages of the compiler into different files and we add a high level comment about the goal of each file and roughly what it is meant to do.

Sam Mohr (Feb 02 2025 at 13:44):

Beyond Rust compiler compilation speedup that is irrelevant with Zig's compilation speeds, I don't think there's a good reason to split our compiler into separate libraries. Folders seem sufficient. To that end, I don't think we should consolidate the IR into a single place, that moves the data types away from the logic that operates on them. I'm gonna put the IRs in their respective stage folders, probably in <phase>/<stage>/ir.zig

Anthony Bullard (Feb 02 2025 at 13:45):

I agree with that

Brendan Hansknecht (Feb 02 2025 at 16:28):

Totally agree

Brendan Hansknecht (Feb 03 2025 at 02:05):

Fun random thing I just realized. Cause the interpreter will be written in zig, it will be able to just call the zig builtins without any sort of ffi complexities.

Luke Boswell (Feb 03 2025 at 02:06):

Yeah that was one of the big motivating factors Richard mentioned. :smiley:

Joshua Warner (Feb 03 2025 at 03:00):

How does one organize a large multi-file zig project?

Joshua Warner (Feb 03 2025 at 03:01):

Do you make intermediate libraries that reference other libs? Or is it better to do one big source tree that builds into a single exe?

Brendan Hansknecht (Feb 03 2025 at 03:07):

On big source tree with direct file imports from what I have seen

Luke Boswell (Feb 03 2025 at 03:11):

I've starting putting a simple shell for the cli together... not sure if it's something we'll want to keep. But something to parse the cli args and build a binary from. Based pretty heavily off the zig compiler. -- mostly as a learning exercise, but I plan to make it a PR so we have something.

Brendan Hansknecht (Feb 03 2025 at 03:11):

I'm not sure the exact standard, but I would assume something like:

build.zig
build.zig.zon
main.zig

<phase.zig>
<phase>/<stage>.zig
<phase>/<stage>/something.zig

Where zig files are only allowed to import from files at the same level or from a level deeper if the folder name matches the zig file name. That enables clean separation of various parts of the compiler with clear interfaces

Joshua Warner (Feb 03 2025 at 03:12):

And how does that work with tests? One test target from main.zig? Or multiple?

Brendan Hansknecht (Feb 03 2025 at 03:12):

And if a <stage> is small enough, or just getting started, it may just be in one file. In that case it would just be <phase>/<stage>.zig and there would be no <phase>/<stage>/something.zig

Brendan Hansknecht (Feb 03 2025 at 03:14):

I'm not sure the best way to do tests. One natural way (but may not scale well) is to put tests at the bottom of the file. That way I can just do zig test <phase>/<stage>.zig to run all tests for a specific stage.

Brendan Hansknecht (Feb 03 2025 at 03:15):

But it might be cleaner to make a separate test directory. Based on experience with rust, you want a reasonable number of test executables/targets. If you have too many, it wastes a crap ton of time linking them all. If you have too few, you have to compile way too much code for small changes.

Brendan Hansknecht (Feb 03 2025 at 03:15):

cc @Andrew Kelley in case he has any good tips on zig project organization.

Joshua Warner (Feb 03 2025 at 03:16):

I have this right now:
Screenshot 2025-02-02 at 19.15.34.png

Having trouble importing the tokenizer from the parser:

check/parse/parse.zig:2:27: error: import of file outside module path: '../tokenizer/tokenizer.zig'
const tokenizer = @import("../tokenizer/tokenizer.zig");
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Brendan Hansknecht (Feb 03 2025 at 03:17):

I think it should be:

check/parse.zig
check/tokenize.zig

Brendan Hansknecht (Feb 03 2025 at 03:18):

If parse needs to be split over multiple files, it would be:

check/parse.zig
check/parse/subpart1.zig
check/parse/subpart2.zig

Joshua Warner (Feb 03 2025 at 03:18):

Ahhhhh cool

Joshua Warner (Feb 03 2025 at 03:18):

Being a beginner is fun :sweat_smile:

Brendan Hansknecht (Feb 03 2025 at 03:20):

I'm definitely still a beginner in terms of expected zig project organization. Though I have worked on a few zig projects at this point, I have never touched anything large or dealt with best practices around organization

Brendan Hansknecht (Feb 03 2025 at 03:21):

This is just some stuff I have gleamed from looking at the zig compiler itself, but I really have not studied it thoroughly

Luke Boswell (Feb 03 2025 at 03:22):

File names fall into two categories: types and namespaces. If the file (implicitly a struct) has top level fields, it should be named like any other struct with fields using TitleCase. Otherwise, it should use snake_case. Directory names should besnake_case.

https://ziglang.org/documentation/0.13.0/#Names

Brendan Hansknecht (Feb 03 2025 at 03:25):

Also, the pattern I described follows what the zig standard library does for reference

Brendan Hansknecht (Feb 03 2025 at 03:28):

Also, the TitleCase just for reference is used to define a type in a single file. For example, instead of defining Cursor in tokenize.zig you could instead put it in tokenize/Cursor.zig. In this case, you would literally copy all of the lines within struct { ... } as the contents of Cursor.zig.

Brendan Hansknecht (Feb 03 2025 at 03:29):

Anyway, for structure purpose, I think that we should put unit tests at the bottom of the relevant file. For our larger integration tests, we probably want them to be standalone executables that read the snapshots and what not.

Brendan Hansknecht (Feb 03 2025 at 03:35):

Also, just to explain one more design point, if we have:

check/parse.zig
check/parse/ir.zig
check/canonicalize.zig

and canonicalize needs access to parse/ir.zig, it would do:

const parse = @import("parse.zig");
const ir = parse.ir;

Brendan Hansknecht (Feb 03 2025 at 03:36):

and parse would need to export the ir for can to import

Luke Boswell (Feb 03 2025 at 03:40):

Here's my WIP for the cli https://github.com/roc-lang/roc/pull/7571

Brendan Hansknecht (Feb 03 2025 at 03:53):

If others are ok with it, I'm heavily interested in working on the interpreter. Of course, it depends on getting everything wired through type checking before it can be worked on. I feel like it is the highest level thing that I can really help with (both initial implementation and eventual optimizations). I'm sure I'll need help finding the correct design around type info and such, but sounds like a fun project. (also, I'm sure more than one person can work on the interpreter if others are interested).

Hopefully will be able to start by working only with self contained examples that don't need platforms then can expand to passing in pointers for allocators and effects. Will be interesting eventually getting it working in wasm too.

Brendan Hansknecht (Feb 03 2025 at 03:53):

Otherwise, hopefully I can help with reviews and some high level zig organizational stuff.

Brendan Hansknecht (Feb 03 2025 at 05:35):

Luke Boswell said:

Here's my WIP for the cli https://github.com/roc-lang/roc/pull/7571

What order do we want to land things in. Someone has to put the base zig config and organization together. Currently both this and @Joshua Warner's PR both are setting up the zig build and what not.

Luke Boswell (Feb 03 2025 at 05:38):

I'm still cleaning mine up from the feedback. If Josh's merges first I can merge any updates and adapt it.

Luke Boswell (Feb 03 2025 at 05:39):

It sounds like your keen to start on things. Also feel free to push to my branch. I'm not precious about what I have so far... and probably wont be doing much for the next 18hrs

Brendan Hansknecht (Feb 03 2025 at 05:40):

It sounds like your keen to start on things

I guess so, though not like I have that much time at the end of today.

Luke Boswell (Feb 03 2025 at 05:48):

Yeah, I'm so hyped about the zig stuff... I just want to work on this instead of my actual work. :sweat_smile:

Brendan Hansknecht (Feb 03 2025 at 06:01):

Also, as a general note, I feel like for some things I might just make cleanup PRs for instead of adding PR comments. Just feel liking cleaning up some code will be easier to just show the changes that I think would be nice than add a ton of comments. On top of that, they aren't pressing, so I don't think they are worth tons of iteration on PRs for.

Luke Boswell (Feb 03 2025 at 06:07):

Sounds good. I like the feedback because I'm learning. But appreciate its a lot more work.

Brendan Hansknecht (Feb 03 2025 at 06:08):

At a minimum, I'll make sure to add original authors as reviewers.

Sam Mohr (Feb 03 2025 at 06:39):

Yep, definitely better to just do cleanup PRs for now.

Sam Mohr (Feb 03 2025 at 06:40):

@Joshua Warner don't the build.zig ans build.zig.zon usually go in the folder outside of src?

Sam Mohr (Feb 03 2025 at 06:40):

I think we could just leave those in the root of the roc repo and they won't interfere with anything

Luke Boswell (Feb 03 2025 at 09:34):

Luke Boswell said:

Here's my WIP for the cli https://github.com/roc-lang/roc/pull/7571

Ok... I've fixed @Brendan Hansknecht's suggestions. I think we could just merge this and then we've got a basic structure to start working with.

Luke Boswell (Feb 03 2025 at 09:34):

Anyone available to review?

Anthony Bullard (Feb 03 2025 at 13:04):

I approved you @Luke Boswell

Andrew Kelley (Feb 03 2025 at 22:51):

Brendan Hansknecht said:

cc Andrew Kelley in case he has any good tips on zig project organization.

my main suggestion is to take advantage of namespace nesting, try to follow these namespace-related suggestions, and then once any given struct accumulates "too much" code inside of it, extract it to a separate file.

in my experience one generally ends up with weird/wrong organization when trying to guess ahead of time which files to put stuff in, but a nice organization arises when you prioritize the Fully Qualified Namespace Names having precise, accurate, non-redundant names

Luke Boswell (Feb 03 2025 at 23:40):

https://github.com/roc-lang/roc/pull/7573

Improve cli command and option parsing -- refactor into a separate module and add tests.
Configure all tests to run from cli using zig build test -- tests to run are specified in src/test.zig

Luke Boswell (Feb 03 2025 at 23:41):

Also .. I was surprised at first that just zig build test didn't report anything unless there is a failure. So I also "install" the test binary so it can be run separately.

10:40:31 ~/Documents/GitHub/roc zig-cli $ zig build test
10:40:35 ~/Documents/GitHub/roc zig-cli $ zig build && ./zig-out/bin/test
All 4 tests passed.

Luke Boswell (Feb 03 2025 at 23:43):

I don't know if it's controversial... but for now you have to specify roc run and the run subcommand isn't optional. It just greatly simplifies the option parsing logic. I figure we could improve later if we still want that.

Sam Mohr (Feb 03 2025 at 23:44):

Didn't we want to remove roc run? https://github.com/roc-lang/roc/issues/6637

Luke Boswell (Feb 03 2025 at 23:45):

Yes... well remove roc dev and roc run in favour of just roc was the plan ... but now I think it makes sense just to have roc run to simplify option parsing. So I've haven't added a roc dev or anything so we're still aligned with that new Cli workflow

Luke Boswell (Feb 03 2025 at 23:46):

The options I'm parsing are based on that Issue and not the current implementation

Luke Boswell (Feb 03 2025 at 23:46):

--opt size --opt none etc

Sam Mohr (Feb 03 2025 at 23:47):

Okay, that's good. I think we shouldn't do roc run even if it's simpler, though. Making roc script.roc work is what makes shebangs work

Luke Boswell (Feb 03 2025 at 23:47):

I've ignored --backend dev etc.. because it sounds like we will only have the llvm backend.

So roc run would use the interpreter dev loop, and roc build would use llvm.

Luke Boswell (Feb 03 2025 at 23:49):

Sam Mohr said:

Okay, that's good. I think we shouldn't do roc run even if it's simpler, though. Making roc script.roc work is what makes shebangs work

I agree.. I'll have another crack at it in a follow up PR.

It was just challenging because we still support parsing options before providing the path/to/app.roc file to run, and we're not sure if we have invalid options or an invalid roc file path being passed. I'm sure there is a way to do it.

Brendan Hansknecht (Feb 04 2025 at 06:40):

Made a PR to start figuring out fuzzing: https://github.com/roc-lang/roc/pull/7574

Anton (Feb 04 2025 at 15:03):

We discussed all the zig code should live under src/, so I think just start with that structure is ok.

Can we move src, build.zig and build.zig.zon to a new separate folder (zig or zigcompiler...)? That way I can put a flake.nix in that folder instead of adding a flake-zig.nix at the root level that also requires custom nix develop flags to point to that flake etc.

Sam Mohr (Feb 04 2025 at 15:06):

We definitely shouldn't move src somewhere else because that will mean we have to move like a hundred files in 3 months back to the root

Sam Mohr (Feb 04 2025 at 15:06):

But I think just the build files would be okay

Anton (Feb 04 2025 at 15:18):

Doesn't zig expect the layout to be:

src
build.zig

Anton (Feb 04 2025 at 15:20):

Sam Mohr said:

We definitely shouldn't move src somewhere else because that will mean we have to move like a hundred files in 3 months back to the root

Do we plan to add ~hundred files to the root currently?

Anton (Feb 04 2025 at 15:28):

Seems like if we use git mv to move the folder we can avoid most of the typical downsides

Sam Mohr (Feb 04 2025 at 15:34):

There will be many files in src/

Sam Mohr (Feb 04 2025 at 15:35):

Moving files isn't the real problem, it's moving files with 10 PRs against those files

Anton (Feb 04 2025 at 15:58):

I volunteer to update all those PRs. I'd really like a separate folder for the new compiler. We want to make tools like this platform that don't belong in src so they'll end up at the root level, that will lead to flake dependency issues. I would also need to add a bunch of exceptions for when the new compiler CI should run vs old compiler CI because there is no clean folder separation.

Sam Mohr (Feb 04 2025 at 16:04):

I don't feel that strongly about it. If you think its really worth this effort, then go for it

Anton (Feb 04 2025 at 16:18):

:+1: I'll leave some time for others to weigh in on the folder issue if they want to

Brendan Hansknecht (Feb 04 2025 at 16:39):

It's just a folder move. People can update PRs for a folder move. Though I would probably rather move the rust compiler and leave the new zig compiler at root.

Sam Mohr (Feb 04 2025 at 16:46):

That'd be way better IMO

Sam Mohr (Feb 04 2025 at 19:31):

Anyone have opinions on how to handle allocation failures for lists during Roc compilation? I don't know what recovery mechanism there is, so it feels like we'd want to just fail compilation, since it can only mean out of memory. I don't think we should expose handling allocation failures because of this.

Not sure how to avoid threading this logic throughout the entire service without doing something like "if OOM, std.process.exit(1)", though that's probably acceptable with a reasonable error message.

Sam Mohr (Feb 04 2025 at 19:56):

I wrote this function and I'm calling it on allocation results:

fn exit_on_oom(alloc_result: std.mem.Allocator.Error!void) void {
    alloc_result catch {
        const oom_message =
            \\I ran out of memory! I can't do anything to recover, so I'm exiting.
            \\Try reducing memory usage on your machine and then running again.
        ;

        std.debug.print(oom_message, .{});
        std.process.exit(1);
    };
}

Brendan Hansknecht (Feb 04 2025 at 19:56):

I would make the function just be what goes in catch

Brendan Hansknecht (Feb 04 2025 at 19:57):

So it read alloc(...) catch exit_on_oom

Brendan Hansknecht (Feb 04 2025 at 19:57):

At the callsite

Sam Mohr (Feb 04 2025 at 19:57):

That reads better, yeah

Andrew Kelley (Feb 04 2025 at 23:04):

Brendan Hansknecht said:

Made a PR to start figuring out fuzzing: https://github.com/roc-lang/roc/pull/7574

neat, did it work? i.e. does it build and link and basically function?

Brendan Hansknecht (Feb 04 2025 at 23:10):

It's broken in nix, but otherwise yes.

Brendan Hansknecht (Feb 04 2025 at 23:11):

I'm assuming that is an issue with our nix config though. Seems that afl clang is somehow pulling in a version of llvm from nix that is wrong or something along those lines.

Andrew Kelley (Feb 04 2025 at 23:13):

gotcha. well, that's certainly something we can compete on with an integrated fuzzer :) I'm looking forward to picking that project back up after the release is cut

Brendan Hansknecht (Feb 04 2025 at 23:13):

Oh, actually, might have one minor other bug. I think there are some wrong linker flags that lead it to "fail" on recompiled due to the linker printing to stderr. But it actually succeeds.

Brendan Hansknecht (Feb 04 2025 at 23:13):

Andrew Kelley said:

gotcha. well, that's certainly something we can compete on with an integrated fuzzer :) I'm looking forward to picking that project back up after the release is cut

We will be very happy to switch over as it gets more robust.

Brendan Hansknecht (Feb 04 2025 at 23:14):

I think that go did an amazing job with integrated fuzzing and I'm sure zig can do something at least as good.

Brendan Hansknecht (Feb 04 2025 at 23:17):

Hey @Andrew Kelley other random but related question, would you advise that we follow zig master instead of specific stable versions?

I know a lot of zig projects seem to do that (though not sure how common it is nowadays and best practices). I would assume tracking master would get us things like faster compile times and integrated fuzzing sooner.

Andrew Kelley (Feb 04 2025 at 23:27):

normally I would advise sticking to a release but we're like 2 weeks out from 0.14.0 so master branch is probably your best bet (the download page will show you the most recent master branch commit that passed all CI checks)

Andrew Kelley (Feb 05 2025 at 02:05):

Brendan Hansknecht said:

We will be very happy to switch over as it gets more robust.

there's a fun demo video in this PR that gives you a little taste of what I'm cooking up: https://github.com/ziglang/zig/pull/20958

Luke Boswell (Feb 05 2025 at 02:17):

Very cool

Joshua Warner (Feb 05 2025 at 05:45):

I rebased https://github.com/roc-lang/roc/pull/7569 on top of the new top-level build organization and did some minimal integration work, mostly just useful for testing at this point. In particular, I dozig build run -- check crates/compiler/test_syntax/tests/snapshots/pass/ to verify that the tokenizer will run over our existing tests without spitting out errors.

Luke Boswell (Feb 06 2025 at 00:38):

@Sam Mohr I'm thinking we should avoid putting too much time into translating the rust to zig for roc/src/check/parse/ir.zig specifically. It sounds like @Anthony Bullard has been working on this.

Sam Mohr (Feb 06 2025 at 00:39):

Yep, I'm basically entirely ignoring parse. The stuff in the draft PR was initial work that I don't plan to finish

Anthony Bullard (Feb 06 2025 at 00:39):

Yep I’m working

Joshua Warner (Feb 06 2025 at 00:49):

Do you have your stuff pushed anywhere? I'd be interested in collaborating. Might have some time tonight.

Anthony Bullard (Feb 06 2025 at 00:50):

I can push something later

Anthony Bullard (Feb 06 2025 at 00:50):

It’s in a chaotic state now after our convo

Joshua Warner (Feb 06 2025 at 00:50):

np!

Brendan Hansknecht (Feb 06 2025 at 00:52):

@Sam Mohr what is the state of landing the core structure you have been hacking on?

Sam Mohr (Feb 06 2025 at 00:54):

Luke and I are about to call, you're welcome to join. I think I could make a couple things a bit more consistent and then just push anyway, and we could keep cleaning up even after merging something into the main repo.

Choir got cancelled tonight so I've got lots of time for that today.

Sam Mohr (Feb 06 2025 at 00:55):

https://github.com/roc-lang/roc/pull/7580

Sam Mohr (Feb 06 2025 at 00:56):

I want to make the build/ IRs all follow a cleaned up structure, and then we can probably just push to prioritize visibility over finishedness

Luke Boswell (Feb 06 2025 at 00:56):

I like using Zed's realtime collab features. I can leave all kinds of mess behind in Sam's PR :sweat_smile:

Joshua Warner (Feb 06 2025 at 01:00):

Oh cool would love to join

Anthony Bullard (Feb 06 2025 at 01:12):

I might be able to in about an hour or so if you are still on

Sam Mohr (Feb 06 2025 at 01:19):

Odds are good, just ping us

Joshua Warner (Feb 06 2025 at 05:57):

Hah well figured out why you didn't have my syntax.zig @Sam Mohr ; because I never committed it :man_facepalming:
And that's why we need CI!

Sam Mohr (Feb 06 2025 at 06:00):

Big shrug, we'll get there soon

Brendan Hansknecht (Feb 06 2025 at 06:58):

@Sam Mohr I figured out the build flag, it isn't --verbose, it is --summary all. Prints out a summary of all the steps zig ran, if they were cached, and more clearly shows how many tests passed if they ran at all.

Sam Mohr (Feb 06 2025 at 07:11):

Thanks for the heads up

Anthony Bullard (Feb 06 2025 at 14:42):

I found this interesting while creating the type-safe layer on top of the parser AST:

    var store = try ast.NodeStore.initWithCapacity(allocator, estimated_node_count);
    const expr_index: ast.NodeStore.ExprIndex = try store.addExpr(.{ .int = .{
        .token = 1,
        .region = .{ .start = 0, .end = 0 },
    } });
    _ = try store.addStatement(.{
        .expr = .{
            .expr = expr_index,
            .region = .{ .start = 0, .end = 0 },
        },
    });
    const blah: ast.NodeStore.StatementIndex = expr_index;

This compiles successfully, the definitions of each ExprIndex and StatementIndex:

    pub const StatementIndex = struct { u32 };
    pub const ExprIndex = struct { u32 };

They are what I thought were nominal types, but are treated as structural here. @Sam Mohr you might want to take note.

Anthony Bullard (Feb 06 2025 at 14:49):

Changing the implementation of the index types to the following makes the types work fine:

    pub const StatementIndex = struct { statement: u32 };
    pub const ExprIndex = struct { expr: u32 };

Sam Mohr (Feb 06 2025 at 15:22):

I'll try the enum thing then. There are a few more options

Anthony Bullard (Feb 06 2025 at 15:48):

I'm happy with what I got

Anthony Bullard (Feb 06 2025 at 15:49):

I think just having a field name resolves it, you should check though.

Anthony Bullard (Feb 06 2025 at 15:49):

Here is what usage of the type safe layer looks like:

    var store = try ast.NodeStore.initWithCapacity(allocator, estimated_node_count);
    const expr_index = try store.addExpr(.{ .int = .{
        .token = 1,
        .region = .{ .start = 0, .end = 0 },
    } });
    const statement_index = try store.addStatement(.{
        .expr = .{
            .expr = expr_index,
            .region = .{ .start = 0, .end = 0 },
        },
    });
    const file_index = try store.addFile(.{
        .statements = &[_]ast.NodeStore.StatementIndex{statement_index},
        .region = .{ .start = 0, .end = 0 },
    });
   const file = try store.getFile(file_index);
   std.debug.print("File: {any}", .{file});

prints:

File: lib_ast.NodeStore.File{ .statements = { lib_ast.NodeStore.StatementIndex{ .statement = 1 } }, .region = lib_ast.Region{ .start = 0, .end = 0 } }

Brendan Hansknecht (Feb 06 2025 at 16:23):

Yeah, I think the current recommended way in zig is an enum.

The old version was using tuples. (Not sure if those are nominal in zig). Now you switched to strict with unique keys.

Anthony Bullard (Feb 06 2025 at 16:41):

Well an enum unifies things into one type of

Anthony Bullard (Feb 06 2025 at 16:42):

We are using these to explicitly get different types of

Anthony Bullard (Feb 06 2025 at 16:42):

They are basically typed “pointers”

Anthony Bullard (Feb 06 2025 at 16:42):

But since they are offsets into an array they can be smaller and offer much better memory locality

Brendan Hansknecht (Feb 06 2025 at 16:49):

Yeah, that is exactly what the zig compiler uses enums for now. Each one of your structs would become an unbounded nominal enum. Each enum would contain all values from 0 to the max u32.

Brendan Hansknecht (Feb 06 2025 at 16:53):

const Parse.Id = enum(u32) { _ };

Richard Feldman (Feb 06 2025 at 17:06):

btw I wanted to note a general thing that @Andrew Kelley pointed out to me when we were talking about arenas:

a downside of using plain bump-allocated arenas for all memory allocation is that resizes basically never happen in-place

Richard Feldman (Feb 06 2025 at 17:07):

like if we push an element and it exceeds capacity, it's not really going to be able to grow in-place because something else will have almost always used the next slot in the arena, so it has to allocate a new array in the arena and copy the existing elements over

Richard Feldman (Feb 06 2025 at 17:08):

something to think about, at any rate!

Sam Mohr (Feb 06 2025 at 17:09):

Are there any data structures that give us in-place memory reuse for free, or is that something we'll have to cook ourselves?

Anthony Bullard (Feb 06 2025 at 17:09):

I think we should do a single fixed allocation at boot

Anthony Bullard (Feb 06 2025 at 17:10):

Like a single mmap at the beginning to create the backing memory for our allocator

Anthony Bullard (Feb 06 2025 at 17:11):

I think it’s FixedBufferAllocator?

Brendan Hansknecht (Feb 06 2025 at 17:11):

How does that fix the problem?

Anthony Bullard (Feb 06 2025 at 17:12):

You don’t have to use an Arena

Anthony Bullard (Feb 06 2025 at 17:12):

And worry about pointer invalidation on resize

Brendan Hansknecht (Feb 06 2025 at 17:12):

Fixed buffer is an arena last I checked

Anthony Bullard (Feb 06 2025 at 17:12):

Ok, but if it’s initialized with a fixed capacity it never resizes

Brendan Hansknecht (Feb 06 2025 at 17:12):

Also, pointer invalidation isn't a problem cause we use indices. The only problem is the growth.

Anthony Bullard (Feb 06 2025 at 17:13):

That’s what the fixed means

Brendan Hansknecht (Feb 06 2025 at 17:13):

It isn't the buffer that is resizing. It is the array lists allocated within the buffer

Anthony Bullard (Feb 06 2025 at 17:13):

Oh I totally misread Richard’s comment

Anthony Bullard (Feb 06 2025 at 17:13):

My apologies

Brendan Hansknecht (Feb 06 2025 at 17:14):

No worries

Brendan Hansknecht (Feb 06 2025 at 17:15):

Also, mmap is still technically a valid solution. We could mmap enough space to store a max sized array list for every array (like one per IR). It would be a crazy huge mmap, but given the memory is virtual should be ok (though k think some system still complain if you make too big of an mmap)

Anthony Bullard (Feb 06 2025 at 17:15):

I think if we init our array lists with a very pessimistic capacity we can avoid this (using some empirical heuristic)

Brendan Hansknecht (Feb 06 2025 at 17:16):

Richard Feldman said:

btw I wanted to note a general thing that Andrew Kelley pointed out to me when we were talking about arenas:

a downside of using plain bump-allocated arenas for all memory allocation is that resizes basically never happen in-place

Did he mention what zig does?

Anthony Bullard (Feb 06 2025 at 17:16):

TigerBeetle uses the technique you are talking about to run a high availability database

Anthony Bullard (Feb 06 2025 at 17:16):

But they use a holistic coding style to make that work well

Anthony Bullard (Feb 06 2025 at 17:17):

(In zig)

Anthony Bullard (Feb 06 2025 at 17:17):

Did you meet Joran at SYCL @Brendan Hansknecht ?

Anthony Bullard (Feb 06 2025 at 17:21):

9A8A65F4-4659-4C81-A8DB-0CC6CEAB2A71.jpg

From their style guide

Anthony Bullard (Feb 06 2025 at 17:24):

So you would have in effect fixed arrays and just maintain stack allocated cursors on them

Anthony Bullard (Feb 06 2025 at 17:25):

Really reduces the number of trys in your code!

Sam Mohr (Feb 06 2025 at 17:26):

How do we determine the size of said fixed array? What if we try to compile lots of big files? The question seems so obvious that I'm hesitant to ask it

Anthony Bullard (Feb 06 2025 at 17:27):

I wouldn’t say that this is necessarily the right choice for a compiler

Anthony Bullard (Feb 06 2025 at 17:27):

But just explaining the design space available

Anthony Bullard (Feb 06 2025 at 17:28):

To Brendan’s point, I’d be interested in what Zigs own memory management strategy is

Anthony Bullard (Feb 06 2025 at 17:28):

Because that one seems to work well with this DoD architecture we seem to be embracing

Anthony Bullard (Feb 06 2025 at 17:29):

I know they make some assumptions, and use appendAssumingCapacity a lot from what I read

Sam Mohr (Feb 06 2025 at 17:30):

The only allocation error is OOM, which we can't/shouldn't do anything about. The current approach to minimize trys is to std.process.exit(1) on OOM, as is done for you in SafeList: https://github.com/roc-lang/roc/blob/a3cff5aff7a1cf9584eac6926fba03257d48391f/src/collections/safe_list.zig#L43

Anthony Bullard (Feb 06 2025 at 17:31):

You want me to catch every append and call exit?

Sam Mohr (Feb 06 2025 at 17:31):

(name pending, something like TypedIndexList would be better)

Anthony Bullard (Feb 06 2025 at 17:34):

Ok, I think those should both have an initWithCapacity constructor

Sam Mohr (Feb 06 2025 at 17:35):

Agreed

Sam Mohr (Feb 06 2025 at 17:36):

But yes, I think we should catch exit_on_oomon every allocation

Anthony Bullard (Feb 06 2025 at 17:36):

Ok I’d like to hear what others think

Anthony Bullard (Feb 06 2025 at 17:37):

But it should be a very rare and extraordinary case regardless of

Anthony Bullard (Feb 06 2025 at 17:44):

If we use these “initialize with a constant max size” for these lists, we could just use appendAssumeCapacity and get largely the same behavior

Anthony Bullard (Feb 06 2025 at 17:44):

Sorry not constant, but fixed relative to some attribute of the file

Brendan Hansknecht (Feb 06 2025 at 19:12):

Anthony Bullard said:

Did you meet Joran at SYCL Brendan Hansknecht ?

Yes

Brendan Hansknecht (Feb 06 2025 at 19:12):

Anthony Bullard said:

TigerBeetle uses the technique you are talking about to run a high availability database

Not quite

Brendan Hansknecht (Feb 06 2025 at 19:12):

Tiger beetle knows their exact limits and fills up those limits up front

Brendan Hansknecht (Feb 06 2025 at 19:13):

That would be more equivalent to recognizing how much memory a machine has and the allocating it into N different arrays of the max size possible for each of the IRs.

Brendan Hansknecht (Feb 06 2025 at 19:14):

Of course you could decide to use only 2 arrays and keep reusing the memory, but it is a static limit

Brendan Hansknecht (Feb 06 2025 at 19:16):

I was talking about allocating a terrabyte (arbitrary big number) of memory for each array. Way more than the system has. Then keeping track of the current size. The allocation never actually grows, but every once and a while, it page faults to get the os to get more real memory.

Andrew Kelley (Feb 06 2025 at 21:10):

Brendan Hansknecht said:

Did he mention what zig does?

a crap ton of ArrayList and AutoArrayHashMap

Andrew Kelley (Feb 06 2025 at 21:12):

tokenizer - append tokens to a multi array list
parser - append nodes to a multi array list
zir - append instructions to a multi array list
air - append instructions to a multi array list
mir - append instructions to a multi array list
machine code - append bytes to an array list

Andrew Kelley (Feb 06 2025 at 21:13):

intern pool - basically an array hash map

Andrew Kelley (Feb 06 2025 at 21:13):

one factor to consider is portability - if your code uses these simple data structures rather than depending on the OS feature of mmap, then you can e.g. run your code in the browser

Andrew Kelley (Feb 06 2025 at 21:14):

fun fact, when you're browsing zig's autodocs, you're actually downloading literally a tarball of unmodified zig std lib source code plus a wasm file, and nothing else, and then you're running the actual tokenizer and parser in wasm

Andrew Kelley (Feb 06 2025 at 21:19):

Brendan Hansknecht said:

Yeah, I think the current recommended way in zig is an enum.

one nice outcome of using enums is that you can name special values. for instance the trick where you use max int to store "none":

pub const OptionalString = enum(u32) {
    none = std.math.maxInt(u32),
    _,

    pub fn unwrap(i: OptionalString) ?String {
        if (i == .none) return null;
        return @enumFromInt(@intFromEnum(i));
    }

    pub fn slice(index: OptionalString, wasm: *const Wasm) ?[:0]const u8 {
        return (index.unwrap() orelse return null).slice(wasm);
    }
};
pub const String = enum(u32) {
    _,

    pub fn slice(index: String, wasm: *const Wasm) [:0]const u8 {
        const start_slice = wasm.string_bytes.items[@intFromEnum(index)..];
        return start_slice[0..mem.indexOfScalar(u8, start_slice, 0).? :0];
    }

    pub fn toOptional(i: String) OptionalString {
        const result: OptionalString = @enumFromInt(@intFromEnum(i));
        assert(result != .none);
        return result;
    }
};

maybe you reserve index 0 to mean empty string, and index 1 to be the string "roc", well you can just put that into the enum then

Richard Feldman (Feb 06 2025 at 21:33):

@Andrew Kelley what would you recommend in terms of allocators when it comes to MultiArrayLists that we're going to build up (potentially involving resizes) with unknown up-front length, but which use indices over pointers for everything, and which we want to serialize straight to disk later (and deserialize straight into an arena)?

Andrew Kelley (Feb 06 2025 at 21:38):

an allocator that supports resizing, i.e. not an arena allocator

Andrew Kelley (Feb 06 2025 at 21:39):

you can memcpy your array list data to disk independently of the allocator used

Andrew Kelley (Feb 06 2025 at 21:39):

since Roc links libc, the libc allocator is going to be your best bet probably for the "gpa" use case, i.e. when you need resizing

Andrew Kelley (Feb 06 2025 at 21:40):

lately I've come around to either naming my allocator parameter gpa or arena to indicate the intended usage pattern

Luke Boswell (Feb 06 2025 at 21:41):

gpa - is that the kitchen sink, and arena is a dedicated purpose built thing we care about storing?

Andrew Kelley (Feb 06 2025 at 21:41):

idea is that with gpa you need to avoid leaking by remembering to free, and with arena you can fire and forget

Andrew Kelley (Feb 06 2025 at 21:42):

trying to do storage serialization at the allocator layer is not something I've experimented with. if you go that route, have fun and I have no advice for you

Luke Boswell (Feb 06 2025 at 21:47):

@Andrew Kelley - I wanted to ask about zig ld. Would that be reasonable for us to embed in the roc cli binary to use for linking our prebuilt platform hosts and our roc apps together?

Luke Boswell (Feb 06 2025 at 21:48):

Something like zig ld app.o libhost.a

Andrew Kelley (Feb 06 2025 at 21:48):

that's actually not a command the compiler supports at the moment, but I think I understand what you're asking - you want to reuse the linker code (not LLD) right?

Andrew Kelley (Feb 06 2025 at 21:49):

to understand the use case a bit more, my understanding is that you already depend on llvm libraries in the roc binary - what makes zig's linker code a more attractive option than embedding LLD inside the roc binary, having roc expose roc lld and then making roc invoke itself as a subprocess?

Luke Boswell (Feb 06 2025 at 21:51):

I guess I'm wondering if there is a simpler way. I thought zig ld was written in zig and so using that might be a better option than wrangling llvm's lld.

Luke Boswell (Feb 06 2025 at 21:52):

Another option I was thinking was having roc distribute a version of lld that we could download from a release and store in cache.

Luke Boswell (Feb 06 2025 at 21:53):

We have been talking about using a similar approach to Zig with producing llvm bc so I guess I assumed that meant we didn't need llvm anymore. But I guess @Richard Feldman has mentioned he would like it all in one binary... so we'll still need clang or something.

Andrew Kelley (Feb 06 2025 at 21:54):

yeah we have a macho linker, an elf linker, and a wasm linker all written in zig, however, they do not yet have feature parity or performance parity with LLD

Andrew Kelley (Feb 06 2025 at 21:54):

actually I haven't measured the wasm linker perf vs lld since I rewrote it, that might have changed

Andrew Kelley (Feb 06 2025 at 21:55):

zig's linkers are also "surgical" if you will, they are designed to be tightly coupled with the frontend and to prioritize zig code updates rather than the use case of only linking objects together

Luke Boswell (Feb 06 2025 at 21:55):

This is for producing a final binary, so as long as the linker isn't too slow (or worse than llvm) it's probably ok. But I'm not sure.

Andrew Kelley (Feb 06 2025 at 21:56):

I think the tight coupling aspect will make them not a great fit for directly code sharing with roc compiler

Luke Boswell (Feb 06 2025 at 21:56):

Ok, it sounds like we're back to plan A, embedding llvm things.

Andrew Kelley (Feb 06 2025 at 21:56):

have a look yourself and see what you think: https://github.com/ziglang/zig/blob/master/src/link/Wasm.zig

Luke Boswell (Feb 06 2025 at 21:57):

Thank you, but I'm probably not familiar enough with zig or linking to be able to read that just yet. :sweat_smile:

Andrew Kelley (Feb 06 2025 at 21:57):

it's certainly adaptable to your use case, but then we'd be maintaining two separate implementations. which, hey, maybe that's not so bad. porting code can be fun and fairly quick

Andrew Kelley (Feb 06 2025 at 21:58):

but yeah I think embedding LLD is a more immediate solution to your problem, and does not come with too many downsides since you already depend on LLVM libraries

Luke Boswell (Feb 06 2025 at 21:59):

Would you recommend using a library, or downloading a binary and caching that to use?

Luke Boswell (Feb 06 2025 at 21:59):

^^ I hope that question makes sense...

Andrew Kelley (Feb 06 2025 at 21:59):

for shipping a linker?

Luke Boswell (Feb 06 2025 at 21:59):

Yeah, for both compiling llvm bc and also for linking the final executable

Andrew Kelley (Feb 06 2025 at 22:00):

Andrew Kelley said:

to understand the use case a bit more, my understanding is that you already depend on llvm libraries in the roc binary - what makes zig's linker code a more attractive option than embedding LLD inside the roc binary, having roc expose roc lld and then making roc invoke itself as a subprocess?

if you do :point_up: instead, then you can ship only 1 binary (roc) that has those other abilities built in

Andrew Kelley (Feb 06 2025 at 22:01):

this is what zig does today - basically you copy paste main.cpp from lld into your project and rename main to lld_main

Andrew Kelley (Feb 06 2025 at 22:02):

actually I don't think you have to do that since lld exposes library functions

Andrew Kelley (Feb 06 2025 at 22:02):

we do it for clang tho

Luke Boswell (Feb 06 2025 at 22:02):

Ok, I'll checkout the zig code and follow that.

Andrew Kelley (Feb 06 2025 at 22:02):

https://github.com/ziglang/zig/blob/b0ed602d5d9358128471588f00a073f2545809fa/src/main.zig#L301-L306

Brendan Hansknecht (Feb 06 2025 at 22:23):

Andrew Kelley said:

tokenizer - append tokens to a multi array list
parser - append nodes to a multi array list
zir - append instructions to a multi array list
air - append instructions to a multi array list
mir - append instructions to a multi array list
machine code - append bytes to an array list

Just to clarify, these all use gpa in your case.

Brendan Hansknecht (Feb 06 2025 at 22:24):

Arena is for temporary non list stuff that never needs to grow

Andrew Kelley (Feb 06 2025 at 22:29):

right

Andrew Kelley (Feb 06 2025 at 22:41):

Luke Boswell said:

Yeah, for both compiling llvm bc and also for linking the final executable

doesn't roc already support this via the llvm library APIs? specifically for compiling bitcode, not linking

Luke Boswell (Feb 06 2025 at 22:48):

I'm not sure how that works. We currently use the inkwell crate and I assume that bundles in the llvm libraries. But maybe there is some other magic happening there.

Anthony Bullard (Feb 06 2025 at 23:22):

Our tokenizer and parser are doing exactly this, and right now using the heap allocator but could easily switch to libc. I think if we upfront make some assumptions about needed capacity resizing won’t hit us too hard

Brendan Hansknecht (Feb 06 2025 at 23:26):

Yep

Brendan Hansknecht (Feb 06 2025 at 23:26):

Also, I would probably use gpa and an arena for now

Brendan Hansknecht (Feb 06 2025 at 23:27):

Currently we don't link libc. And we can generally re-evaluate at any point

Brendan Hansknecht (Feb 06 2025 at 23:27):

Easy to switch an allocator

Andrew Kelley (Feb 07 2025 at 01:58):

you don't link libc, but you're planning to link llvm right? that means you will be linking libc

Andrew Kelley (Feb 07 2025 at 01:58):

or do you have plans to keep llvm libraries out of the roc binary?

Brendan Hansknecht (Feb 07 2025 at 02:06):

That's true

Brendan Hansknecht (Feb 07 2025 at 02:06):

So I guess we could just go straight to the c sllocator

Brendan Hansknecht (Feb 07 2025 at 06:42):

I thought we had main compiling. From what I can tell, most of it is just broken code with invalid imports and non-existant variables

Sam Mohr (Feb 07 2025 at 06:42):

lazy compilation made it not obvious

Sam Mohr (Feb 07 2025 at 06:42):

I'm working on stuff right now

Brendan Hansknecht (Feb 07 2025 at 06:43):

I don't think it is lazy. It is just that nothing is imported by main.zig or test.zig. So they literally aren't in the tree at all

Sam Mohr (Feb 07 2025 at 06:43):

Yes, lazy compilation is imprecise

Sam Mohr (Feb 07 2025 at 06:43):

It's what you said

Sam Mohr (Feb 07 2025 at 06:44):

I'm not sure how best to fix the issue besides importing all modules in main followed by a _ = imported_module for each of them

Sam Mohr (Feb 07 2025 at 06:44):

Of course, paired with fixing whatever broken imports and such

Brendan Hansknecht (Feb 07 2025 at 06:45):

Yeah, that is a fine start (also, I think just importing the coordinate into main imports everything or nearly everything)

Sam Mohr (Feb 07 2025 at 06:46):

I'm gonna try to get my scope cleanup work done first for canonicalization, and then I can try to fix these issues, and then make a PR

Brendan Hansknecht (Feb 07 2025 at 06:47):

Sounds good.

Brendan Hansknecht (Feb 07 2025 at 06:47):

I think I'll wait for the tree to be working and then switch into doing some various minor changes and cleanups

Sam Mohr (Feb 07 2025 at 06:47):

You might as well wait, I've been touching a lot of minor stuff today

Brendan Hansknecht (Feb 07 2025 at 06:48):

yeah, sounds good

Brendan Hansknecht (Feb 07 2025 at 06:49):

Also, exceptionally minor PR to update the zig-afl-kit dependency: https://github.com/roc-lang/roc/pull/7584

Anthony Bullard (Feb 07 2025 at 10:43):

It does seem that Zig does not perform full semantic analysis on dead code.

Sam Mohr (Feb 07 2025 at 10:48):

More work on minor details, started work on scope checking for canonicalization to make sure it fits with everything else, added "coordinate.zig" import to "main.zig" for typechecking: https://github.com/roc-lang/roc/pull/7585

Sam Mohr (Feb 07 2025 at 10:48):

Not complete, just a wash of code

Anthony Bullard (Feb 07 2025 at 15:03):

Got everything I need to _blast_ through the parser and formatter on top of a type-safe NodeStore

If you want to see what it would look like to manually construct an AST (which we should almost never do)

   test {
        var store = try NodeStore.initWithCapacity(std.testing.allocator, 100);
        defer store.deinit();

        // Try to add the nodes for the Hello World in Roc, using 0 for all tokens:
        // app [main!] { pf: platform "../basic-cli/platform.roc" }
        //
        // import pf.Stdout
        //
        // main! = |_|
        //     Stdout.line!("Hello, world!")

        const header = try store.addHeader(.{ .app = .{
            .provides = &[_]TokenIndex{0},
            .platform = 0,
            .packages = &[_]RecordFieldIndex{},
            .region = .{ .start = 0, .end = 0 },
        } });

        const import = try store.addStatement(.{ .import = .{
            .module_name_tok = 0,
            .qualifier_tok = 0,
            .alias_tok = null,
            .exposes = &[_]TokenIndex{},
            .region = .{ .start = 0, .end = 0 },
        } });

        const main_ident = try store.addPattern(.{ .ident = .{
            .ident_tok = 0,
            .region = .{ .start = 0, .end = 0 },
        } });

        const hello_world = try store.addExpr(.{ .string = .{
            .token = 0,
            .region = .{ .start = 0, .end = 0 },
        } });

        const line_ident = try store.addExpr(.{ .ident = .{
            .token = 0,
            .region = .{ .start = 0, .end = 0 },
        } });

        const line_apply = try store.addExpr(.{ .apply = .{
            .@"fn" = line_ident,
            .args = &[_]ExprIndex{hello_world},
            .region = .{ .start = 0, .end = 0 },
        } });

        const body_statement_0 = try store.addStatement(.{ .expr = .{
            .expr = line_apply,
            .region = .{ .start = 0, .end = 0 },
        } });

        const body = try store.addBody(.{
            .whitespace = null,
            .statements = &[_]StatementIndex{body_statement_0},
        });

        const main = try store.addStatement(.{ .decl = .{
            .pattern = main_ident,
            .body = body,
            .region = .{ .start = 0, .end = 0 },
        } });

        const file = try store.addFile(.{
            .header = header,
            .statements = &[_]StatementIndex{ import, main },
            .region = .{ .start = 0, .end = 0 },
        });

        _ = file;

        try std.testing.expect(true);
    }

Andrew Kelley (Feb 07 2025 at 20:18):

any time you have something like &[_]T{...}, you can shortcut with &.{...}, as long as there is a result type on the LHS

Anthony Bullard (Feb 07 2025 at 20:30):

Sweet! I don’t know why I didn’t try that

Brendan Hansknecht (Feb 08 2025 at 01:37):

@Sam Mohr, not sure this is useful to you now, but we could add a step in CI to do find src -name "*.zig" | xargs -n1 zig ast-check. It would ensure that all of the zig files that aren't in the tree are at least internally valid. That said, I would assume that pretty quickly everything will be in the tree, so this may not matter.

Brendan Hansknecht (Feb 08 2025 at 01:39):

I have time right now, so just trying to figure out if I can do something useful.

Sam Mohr (Feb 08 2025 at 01:39):

I think it's fine right now. I think what would be helpful is seeing if you can skeleton out what the interpreter and/or the LLVM backend would need for inputs

Sam Mohr (Feb 08 2025 at 01:41):

I'm working on setting up the coordinate.zig file right now so that everything is actually glued together

Sam Mohr (Feb 08 2025 at 01:43):

I've at least putatively set up scope checking logic already for canonicalization (not plugged in), meaning I'm pretty confident we'll be okay with the new ModuleIdent approach (formerly Symbol) for referencing idents that Richard proposed

Brendan Hansknecht (Feb 08 2025 at 01:43):

If you can skeleton out what the interpreter

I feel like that may be hard to do without a Can IR. I plan to start by having this as a tree walking interpreter that just directly walks Can (maybe 1 level removed from can to do some type checking first with the concrete input types)

LLVM backend

This I definitely should look into more. Need to figure out zigs library to build llvm bitcode and also statically linking to llvm in general (which might be pretty hard)

Sam Mohr (Feb 08 2025 at 01:44):

Once I finish the package stuff, I'll see if I can get the CanIR at least mostly correct

Brendan Hansknecht (Feb 08 2025 at 02:17):

Another super minor PR: https://github.com/roc-lang/roc/pull/7588

Brendan Hansknecht (Feb 08 2025 at 02:17):

but a bit bigger this time

Sam Mohr (Feb 08 2025 at 02:28):

Approved, auto-merge enabled

Sam Mohr (Feb 08 2025 at 02:32):

Your PR is also the first time I've seen a greentext in Roc's ecosystem, nice

Andrew Kelley (Feb 08 2025 at 04:24):

Brendan Hansknecht said:

Sam Mohr, not sure this is useful to you now, but we could add a step in CI to do find src -name "*.zig" | xargs -n1 zig ast-check. It would ensure that all of the zig files that aren't in the tree are at least internally valid. That said, I would assume that pretty quickly everything will be in the tree, so this may not matter.

zig fmt src --check --ast-check

Brendan Hansknecht (Feb 08 2025 at 04:54):

We should add that to CI once we get all our source cleaned up. Thanks!

Brendan Hansknecht (Feb 08 2025 at 05:27):

To add --ast-check. Fails currently, but I want to make sure we don't forget about it later: https://github.com/roc-lang/roc/pull/7589

Brendan Hansknecht (Feb 08 2025 at 06:36):

Correctly get the github ci filter working: https://github.com/roc-lang/roc/pull/7591

Richard Feldman (Feb 08 2025 at 14:15):

new gpa! https://ziglang.org/devlog/2025/#2025-02-07

Brendan Hansknecht (Feb 08 2025 at 16:58):

That's cool. Though isn't the libc allocator just b tier? Like the good allocators are like mimalloc and tcmalloc?

Not trying to downplay anything, huge for the zig default to at least match C (while also having options to detect for memory leaks and such). Just noting that there are a lot of allocators out there and I don't think libc has been state of the art for a long time.

Brendan Hansknecht (Feb 08 2025 at 17:00):

Also, I'm really curious how it compares to the jdz zig allocator: https://github.com/joadnacer/jdz_allocator

Brendan Hansknecht (Feb 08 2025 at 17:08):

Also, does this mean it is in in time for 0.14.0?

Brendan Hansknecht (Feb 08 2025 at 18:11):

Brendan Hansknecht said:

Correctly get the github ci filter working: https://github.com/roc-lang/roc/pull/7591

Or not...try 2: https://github.com/roc-lang/roc/pull/7592

Brendan Hansknecht (Feb 08 2025 at 19:41):

Yay! only zig ci running: https://github.com/roc-lang/roc/pull/7590

Brendan Hansknecht (Feb 08 2025 at 19:41):

Also above PR just adds a script to enable reproing fuzzing failures.

Anton (Feb 08 2025 at 19:41):

Brendan Hansknecht said:

Yay! only zig ci running: https://github.com/roc-lang/roc/pull/7590

I was testing it too :) https://github.com/roc-lang/roc/pull/7593

Brendan Hansknecht (Feb 08 2025 at 19:45):

Currently CI is now ~3 minutes....why is windows so much slower than all the rest of CI?

Sam Mohr (Feb 08 2025 at 19:46):

To be fair 1m21s was for downloading to our cache for the Windows Zig install

Sam Mohr (Feb 08 2025 at 19:46):

But still

Brendan Hansknecht (Feb 08 2025 at 19:46):

To be fair 1m21s was for downloading to our cache for the Windows Zig install

So on rerun, this should be skipped? let me test

Brendan Hansknecht (Feb 08 2025 at 19:50):

Ok, a bit better

Andrew Kelley (Feb 08 2025 at 20:27):

btw if you don't mind using https://github.com/mlugg/setup-zig instead of goto-bus-stop (which is now unmaintained) it will use mirrors and proper caching. should be faster for you and less bandwidth for ziglang.org

Brendan Hansknecht (Feb 08 2025 at 20:29):

Yeah, we can definitely switch over

Brendan Hansknecht (Feb 09 2025 at 02:33):

PR to switch: https://github.com/roc-lang/roc/pull/7594

Anthony Bullard (Feb 09 2025 at 02:33):

Ok, I have the first draft of the parser able to parse AND format the canonical Hello World. I need to rebase it on the current structure and then probably make a few more changes to harmonize it with the rest of the project

Anthony Bullard (Feb 09 2025 at 02:33):

The rest of the parser should be able to go _VERY_ fast from here

Brendan Hansknecht (Feb 09 2025 at 02:33):

Exciting. I'm definitely interested in helping get a fuzzer setup once it lands

Anthony Bullard (Feb 09 2025 at 02:34):

I'll have a PR up by tomorrow evening

Brendan Hansknecht (Feb 09 2025 at 02:34):

By that I mean, if no one does it first, I'll add a parser to formatter fuzzer following the current format after your PR lands

Anthony Bullard (Feb 09 2025 at 02:35):

I made a lot of mistakes and did a few redesigns. I also feel like I type like molasses with Zig, guess I'm just getting used to the syntax (and cursing everytime I forget the . in front of an anonymous struct)

Luke Boswell (Feb 09 2025 at 02:36):

I know the feeling

Anthony Bullard (Feb 09 2025 at 02:36):

But I'm very happy with this current design, even if there are some tools (i.e., helpers) that need to be built to make the actual parsing code more quickly scannable - but I didn't want to abstract too early in the game.

Sam Mohr (Feb 09 2025 at 02:38):

A good policy

Anthony Bullard (Feb 09 2025 at 02:39):

Also, I'm going to be upfront - this initial version will have in it a LOT of panics (where problems should be reported, and unimplemented code paths). I do not intend to commit the PR that way, but I wanted to talk about how to do problems effectively (and Malformed) before I replace them in the PR review.

Lucas Rosa (Feb 09 2025 at 02:53):

any plans to do snapshot testing for the parser? I think sometimes people call them golden tests? basically just like debug print the ast to a file and compare against it on the next run

Brendan Hansknecht (Feb 09 2025 at 03:02):

#compiler development > zig compiler - snapshot testing

Andrew Kelley (Feb 09 2025 at 08:32):

@panic("TODO") ftw

Anthony Bullard (Feb 09 2025 at 11:01):

@Andrew Kelley This is the way

Anthony Bullard (Feb 09 2025 at 15:25):

I'm going to put up my PR. But first, should I try linking it into main.zig at all (even in it's very partially implemented state, and even though there is no further analysis being done)?

Anthony Bullard (Feb 09 2025 at 15:31):

Fire away please! https://github.com/roc-lang/roc/pull/7597

Anthony Bullard (Feb 09 2025 at 15:32):

And yes, I'm going to be looking at:

String interning
Problems
Malformed
Removing panics in general

Brendan Hansknecht (Feb 09 2025 at 17:43):

If you don't link it to main, make sure it at least passes zig fmt --check --ast-check

Brendan Hansknecht (Feb 09 2025 at 17:53):

Oh, just realized that the tree is valid now. Can we merge https://github.com/roc-lang/roc/pull/7589 to protect it more from future breaks?

Sam Mohr (Feb 09 2025 at 18:08):

Done

Luke Boswell (Feb 09 2025 at 20:59):

I was waiting to see if anyone wanted to claim the fun one... but it seems to be still open for the taking. :smiley:

I'd like to take responsibility for the type-checker.

Me and my mate Claude have some reading to do on HM type inference... but we have a nice implementation to follow.

Sam Mohr (Feb 09 2025 at 21:00):

Hell yes, please take the hard work

Luke Boswell (Feb 09 2025 at 21:01):

Wait.. you said type-checking was the easy one

Sam Mohr (Feb 09 2025 at 21:01):

Talk to @Lucas Rosa, he was also interested because of his experience

Sam Mohr (Feb 09 2025 at 21:01):

It's not bad once you understand what's happening

Sam Mohr (Feb 09 2025 at 21:02):

I'd recommend reading through the current solve and unify crates in the Rust code

Lucas Rosa (Feb 09 2025 at 21:32):

@Luke Boswell go for it, you have my support, just remember that constraint solving part, it's important for the quality of error messages :)

Lucas Rosa (Feb 09 2025 at 21:34):

I also have access to OG lang PhDs that we can ask questions to, like Kent from chez scheme, also I could ask philip wadler questions probably, not sure how responsive he is but I share a slack with him.

Lucas Rosa (Feb 09 2025 at 21:35):

just ping me and I can pretty much jump online to pair program about it at any time. the basics of HM type inference are straight forward and well understood for decades. definitely make sure you review solve and unify, those are the main show for this. a little bit of can reading wouldn't hurt either

Luke Boswell (Feb 09 2025 at 21:38):

Thank you.

Anthony Bullard (Feb 09 2025 at 21:40):

I will be watching with keen interest Luke

Anthony Bullard (Feb 09 2025 at 21:40):

And I can also tell you what NOT to do :rofl:

Lucas Rosa (Feb 09 2025 at 21:43):

it's not unlike an interpreter, you walk the tree, assign type vars to things, primitives and annotations are like the terminal points from which everything resolves upwards from

the only things you might not see in intro material is the extensible records stuff (basically row polymorphism if I'm not messing up terms)
and the tags which are like ocaml polymorphic variants (assuming that hasn't changed)

Lucas Rosa (Feb 09 2025 at 21:44):

I also recently did the exhaustive matrix check thingy for matches so that's still fresh in my head

Luke Boswell (Feb 09 2025 at 21:45):

My plan is to approach it from both sides at once... spend roughly 50% of my time learning the theory, and the other following the rust impl.

Lucas Rosa (Feb 09 2025 at 21:46):

let me find you a paper for exhaustiveness, I'm pretty sure it's where elm got it's algo from, I lifted it from elm myself as well

Lucas Rosa (Feb 09 2025 at 21:49):

I think this is it
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/3165B75113781E2431E3856972940347/S0956796807006223a.pdf/warnings-for-pattern-matching.pdf

Lucas Rosa (Feb 09 2025 at 21:51):

also available here in web form on some subdomain on inrias website

http://moscova.inria.fr/~maranget/papers/warn/index.html

Lucas Rosa (Feb 09 2025 at 21:52):

here is the elm implementation, also a remarkable reference because the code is some of the cleanest haskell code I've ever seen
https://github.com/elm/compiler/blob/master/compiler/src/Nitpick/PatternMatches.hs
along with my very ugly rust port of the exact algo
https://github.com/aiken-lang/aiken/blob/main/crates/aiken-lang/src/tipo/exhaustive.rs

Anthony Bullard (Feb 09 2025 at 22:30):

The existing Roc implementation works very well, I think we should basically try to transcribe it in Zig terms

Lucas Rosa (Feb 09 2025 at 22:30):

It was also lifted from elm, so I assumed he would look at the roc impl but I wanted to link him more sources

Richard Feldman (Feb 09 2025 at 23:03):

of note, the "demanded" enum for records isn't necessary anymore due to optional record fields going away

Lucas Rosa (Feb 09 2025 at 23:04):

ah cool, good to know

Brendan Hansknecht (Feb 09 2025 at 23:12):

optional record fields going away

single tear. I'll miss default value function arguments

Luke Boswell (Feb 09 2025 at 23:42):

Yeah, what replaces that? Is there another way to do it?

Luke Boswell (Feb 09 2025 at 23:43):

I've used them extensively in plume for example

Sam Mohr (Feb 09 2025 at 23:43):

Terse builders

Sam Mohr (Feb 09 2025 at 23:44):

That thing i did for Weaver... let me grab a link

Luke Boswell (Feb 09 2025 at 23:44):

Ahk.. I remember now. The builder pattern

Sam Mohr (Feb 09 2025 at 23:45):

https://gist.github.com/smores56/dc7b37f73114df11d28cd6a148987dea#file-weaver-builders-roc

Lucas Rosa (Feb 09 2025 at 23:46):

I noticed that yesterday, pipe is replaced with .? not a bad idea at all

Sam Mohr (Feb 09 2025 at 23:46):

That's static dispatch

Lucas Rosa (Feb 09 2025 at 23:55):

oh I see, I went back and actually paid attention to the proposal

There it is! Pass the value in front of the dot to that function as its first argument, and we're done. (If the Result module did not define a top-level mapErr, or if we couldn't have accessed it in this module because it wasn't exposed, we'd get a compile-time error.)

sorry, still catching up to the most recent design

Lucas Rosa (Feb 10 2025 at 00:07):

"In this design, the API is literally all there is to consider—exactly as it should be!" - some chad probably

Lucas Rosa (Feb 10 2025 at 00:10):

"[-2, 0, 2].map(.abs().sub(1))" omg this is insane, didn't realize this could look so clean

Brendan Hansknecht (Feb 10 2025 at 00:25):

I still think default valued args are super useful, but optionals and builder patterns are okish alternatives.

Luke Boswell (Feb 10 2025 at 01:05):

Yeah, I guess the hypothesis we are testing is can we get away without them. One more language feature we don't need to think about -- i.e. simplifies roc

Brendan Hansknecht (Feb 10 2025 at 01:09):

yep

Brendan Hansknecht (Feb 10 2025 at 17:19):

random other notes I just thought of (or really, Loris reminded me of).

Debug assertions are our friends. Please feel free to add std.debug.assert to your code. They will all be ripped out in the final release binaries, but they are great anchors for fuzzing and for testing in general.

Lucas Rosa (Feb 10 2025 at 17:28):

https://github.com/kristoff-it/zig-lsp-kit

I figure most say this already, but just in case. this could be useful

Brendan Hansknecht (Feb 13 2025 at 23:31):

Adding llvm feels so painful. RIP binary size (will be great when we add an interpreter only config).

Debug
840K -> 114M

ReleaseFast strip
72k -> 9M

This is all without actually using any llvm. This is simply linking to a c++ file and statically compiled llvm, not actually using anything yet.

Sam Mohr (Feb 13 2025 at 23:31):

A necessary cost, it seems

Brendan Hansknecht (Feb 13 2025 at 23:33):

I really surprised that release fast with strip grows by so much when we literally don't use any of it. That said, it will grow by much more than that once we start using things.

Brendan Hansknecht (Feb 13 2025 at 23:35):

but yeah, statically compiled release builds of llvm and lld together are 201MB. So if we reference absolutely everything our compiler would be that big.

Brendan Hansknecht (Feb 13 2025 at 23:36):

Which is why zig is like 270MB in size (note, zig also packages clang which we won't)

Luke Boswell (Feb 13 2025 at 23:38):

It would be really nice to go the same route as zig and take LLVM out of the compiler binary. I know we want everything to be a single binary, but could we instead have some tooling in the cli which makes it easy to install LLVM or setup the right thing if it's not already available in the system.

Sam Mohr (Feb 13 2025 at 23:39):

Do you mean the same route as cargo?

Luke Boswell (Feb 13 2025 at 23:39):

Just spitballing here -- I'm thinking it might be good to map out the intended use cases or some scenarios of people using roc and figure out when we think we need a fully statically linked thing that includes LLVM, and are there other ways we could make that seamless.

Brendan Hansknecht (Feb 13 2025 at 23:39):

Yeah, theoretically we could emit a .bc file or run the interpretter (super slim). Then we could orchestrate llvm to actually compile. On first run, would just download our llvm compiler to the cache.

Brendan Hansknecht (Feb 13 2025 at 23:40):

That said, I think for most users, this just means instead of downloading llvm now, they download it in a few days when they make an optimized build

Brendan Hansknecht (Feb 13 2025 at 23:41):

So not really that much of gain in my mind.

Brendan Hansknecht (Feb 13 2025 at 23:41):

Just hidden on first compilation

Luke Boswell (Feb 13 2025 at 23:41):

It also means upgrading roc... may use the same LLVM compiler and switching between roc versions may not need to download it again -- just use the one we saved in the roc cache

Brendan Hansknecht (Feb 13 2025 at 23:41):

Oh, yeah, that is fair.

Brendan Hansknecht (Feb 13 2025 at 23:41):

less often that you need to update the llvm bundle potentially

Luke Boswell (Feb 13 2025 at 23:42):

Roc compiler without LLVM bundled might be a couple of MBs (maybe even smaller?)... so it would be almost free to have every version in the roc cache.

Brendan Hansknecht (Feb 13 2025 at 23:42):

In fact, assuming we generate an old version of bitcode, you potentially could lazily update much later. Like just use llvm 18 for a few years then jump to llvm 22.

Luke Boswell (Feb 13 2025 at 23:43):

Unless there is a significant performance gain from upgrading to 22 (for example) which we can easily measure... then we may not even need to.

Luke Boswell (Feb 13 2025 at 23:43):

I have zero anything to base that on though

Brendan Hansknecht (Feb 13 2025 at 23:44):

Long term this is definitely something we should explore.

Luke Boswell (Feb 13 2025 at 23:45):

I was thinking short-medium term... it would help us with the development process to be able to manage multiple versions seamlessly -- the roc cli upgrades and downgrades itself

Brendan Hansknecht (Feb 13 2025 at 23:46):

I think simply generating bitcode files already will give us a lot of flexibility. Still will depend on a few c apis from llvm, but they are the much more stable high level apis.

Luke Boswell (Feb 13 2025 at 23:46):

What is the size impact if we only have LLD linked?

Brendan Hansknecht (Feb 13 2025 at 23:48):

should be around 10MB

Brendan Hansknecht (Feb 13 2025 at 23:48):

Maybe less

Sam Mohr (Feb 13 2025 at 23:49):

Seems like bundling LLVM as well will be easier to get right for now. Would we be okay with starting with that and then pulling LLVM out later?

Sam Mohr (Feb 13 2025 at 23:50):

I think we should prioritize features that get us a working compiler first, and a fast one later

Sam Mohr (Feb 13 2025 at 23:51):

But if it's gonna be a fun project to get this working, then by all means

Brendan Hansknecht (Feb 13 2025 at 23:51):

Also, to begin with, for the interpreter path, we will still need llvm.

Brendan Hansknecht (Feb 13 2025 at 23:51):

Cause we need to generate the shims

Brendan Hansknecht (Feb 13 2025 at 23:51):

probably could manually generate those at some point if we want.

Luke Boswell (Feb 13 2025 at 23:52):

Could the shim be prebuilt?

Brendan Hansknecht (Feb 13 2025 at 23:55):

hmm... possibly could be. I guess it is static to the platform. So the platform could provide an extra shim object file per platform.

Luke Boswell (Feb 13 2025 at 23:55):

Like roc serialises ResolveIR, the interpreter shim is prebuilt and parses that before it starts doing it's interpreter thing...

But I guess this doesn't work because it needs to know what shape to satisfy the platform hosts API.

Brendan Hansknecht (Feb 13 2025 at 23:56):

It's worth trying.

Luke Boswell (Feb 13 2025 at 23:56):

Oh yeah.... that was the idea -- the platform provides a prebuilt interpreter

Luke Boswell (Feb 13 2025 at 23:56):

I remember we discussed that

Brendan Hansknecht (Feb 13 2025 at 23:57):

Not a prebuilt interpretter, just a prebuilt shim object file

Brendan Hansknecht (Feb 13 2025 at 23:58):

Hmm, though the platform doesn't know the file locations of the final app which is need for the shim to know what to load.

Brendan Hansknecht (Feb 13 2025 at 23:59):

Cause the shim essentially needs to provide the intepretter with the app main.roc path, the name of the entrypoint being called, a return pointer, and a list of argument pointers. The interpreter can then learn the types of each pointer by reading the roc source code.

Brendan Hansknecht (Feb 14 2025 at 00:01):

Anyway, I plan to get the interpreter working with shim and such before starting the llvm backend. So that will give us more concrete ideas here.

Brendan Hansknecht (Feb 14 2025 at 00:02):

For now, I am just adding the wiring for llvm, I will leave it commented out in the build script to avoid the giant binaries until we have an llvm backend.

Richard Feldman (Feb 14 2025 at 00:10):

we should consider having the llvm backend only in the build if you enable a flag

Richard Feldman (Feb 14 2025 at 00:11):

bc the majority of compiler development won't need it and will be faster without it

Richard Feldman (Feb 14 2025 at 00:12):

and then we could instead build in something which just panics every time you call it saying "this compiler wasn't built with llvm, please rebuild with that env var set to use this feature"

Brendan Hansknecht (Feb 14 2025 at 00:27):

the majority of compiler development won't need it

I wonder how well we can work on backend passes without llvm

Sam Mohr (Feb 14 2025 at 00:28):

We just have to validate their output

Sam Mohr (Feb 14 2025 at 00:28):

We can do that for at least simple cases with unit testing/fuzzing

Sam Mohr (Feb 14 2025 at 00:28):

But yes, the LLVM codegen side will be trickier

Brendan Hansknecht (Feb 14 2025 at 00:29):

I mean theoretically we will even be able to generate .bc files without llvm

Brendan Hansknecht (Feb 14 2025 at 00:29):

So it really is just for the final exe

Sam Mohr (Feb 14 2025 at 00:30):

Oh, I thought by "I wonder how well" you were implying "I'm not confident we can do a good job"

Sam Mohr (Feb 14 2025 at 00:30):

I agree

Brendan Hansknecht (Feb 14 2025 at 00:31):

I guess we can run all snapshot tests without llvm, so that is nice anchor.

Brendan Hansknecht (Feb 14 2025 at 00:31):

like we can generate refcounting ir and lower ir.

Brendan Hansknecht (Feb 14 2025 at 03:07):

Roc successfully cross compiling to all major targets with llvm (as static as possible): https://github.com/roc-lang/roc/actions/runs/13321631348/job/37207263318?pr=7603

Richard Feldman (Feb 14 2025 at 03:21):

yoooooooooooo

Andrew Kelley (Feb 14 2025 at 09:24):

for what it's worth, zig built in ReleaseSmall mode without llvm is 12M

Andrew Kelley (Feb 14 2025 at 09:25):

and I think it's interesting to note that includes a full x86 backend, C backend, llvm (bitcode) backend, wasm backend, riscv64 backend, elf linker, coff linker, wasm linker. the only thing it lacks compared to llvm is a few more targets and optimization passes

Notification Bot (Feb 14 2025 at 19:40):

4 messages were moved from this topic to #compiler development > zig compiler - fuzzing by Brendan Hansknecht.

Luke Boswell (Feb 15 2025 at 01:40):

Does anyone have any recommendations for debug printing with zig?

Luke Boswell (Feb 15 2025 at 01:40):

I want to debug print in tests.. but only if the test fails

Luke Boswell (Feb 15 2025 at 01:41):

std.log.debug and std.log.info do not show up in zig test output

Brendan Hansknecht (Feb 15 2025 at 01:46):

did you set the log level?

pub const std_options = .{
    .log_level = .debug,
};

Luke Boswell (Feb 15 2025 at 01:46):

Is that in the module I'm testing or the tests.zig file?

Brendan Hansknecht (Feb 15 2025 at 01:47):

I think once in test.zig should work. It should be a global flag to my understanding

Brendan Hansknecht (Feb 15 2025 at 01:47):

Otherwise, maybe just std.debug.print after checking for failure?

Brendan Hansknecht (Feb 15 2025 at 01:47):

I'm not really sure best practice here

Luke Boswell (Feb 15 2025 at 01:48):

Yeah, me either. I've been researching and trying different things. But haven't found a reasonable solution yet.

Luke Boswell (Feb 15 2025 at 01:49):

Basically... I have all these test scenarios for unification etc... and they can print out useful debug info for each step. I only want to see that for the tests that fail though.

Luke Boswell (Feb 15 2025 at 01:50):

Maybe I should be thinking about snapshots at this point instead though

Luke Boswell (Feb 15 2025 at 02:13):

Hey... I got something working. Builds a snapshot test file using debug prints.

Brendan Hansknecht (Feb 15 2025 at 02:17):

Just in case you didn't already read this: https://kristoff.it/blog/dead-simple-snapshot-testing/

Luke Boswell (Feb 15 2025 at 03:17):

Would it be a bad idea to give a Type Variable a comptime optional "name" to help with debug printing?

Luke Boswell (Feb 15 2025 at 03:18):

I'd like to have pretty greek letters for my snapshots tests instead of having random integers everywhere

Brendan Hansknecht (Feb 15 2025 at 03:24):

Why not convert integers to characters on print for the snapshot test?

Luke Boswell (Feb 15 2025 at 03:24):

Ooh, that's a nicer idea

Luke Boswell (Feb 15 2025 at 03:25):

I've almost got something working.. I might just keep going with this thought before trying that

Joshua Warner (Feb 15 2025 at 04:37):

I went ahead and rebased @Anthony Bullard 's parser PR and fixed some bugs: https://github.com/roc-lang/roc/pull/7609

(would like to get that landed soon so we can iterate on it)

Sam Mohr (Feb 15 2025 at 04:52):

I'm happy to approve to unblock after the tests are fixed

Joshua Warner (Feb 15 2025 at 05:36):

Very puzzled at the current failure
Is it possible that CI box has run out of disk space or something?

Brendan Hansknecht (Feb 15 2025 at 05:39):

That is what it looks like, which is pretty confusing cause it is a GitHub runner. Should have a blank slate every time

Sam Mohr (Feb 15 2025 at 05:39):

We use our own runners, right?

Brendan Hansknecht (Feb 15 2025 at 05:39):

Not here I don't think. I think this are vanilla GitHub runners

Brendan Hansknecht (Feb 15 2025 at 05:39):

But maybe that is wrong

Brendan Hansknecht (Feb 15 2025 at 05:45):

@Joshua Warner can you rebase on latest main and run again. I just merged something that changed test steps a bit. Not saying it will fix anything, but might give more info.

Joshua Warner (Feb 15 2025 at 05:51):

No change, looks like

Brendan Hansknecht (Feb 15 2025 at 05:55):

Really confusing that cross compiling with -Dllvm works, but somehow regular compiling does not....

Brendan Hansknecht (Feb 15 2025 at 05:56):

I guess it could be related to -Dllvm combined with -Dfuzz for some reason.

Joshua Warner (Feb 15 2025 at 05:56):

Removing -Dllvm seems to work

Joshua Warner (Feb 15 2025 at 05:57):

Maybe it'd be better to remove -Dfuzz?

Brendan Hansknecht (Feb 15 2025 at 05:57):

nah, -Dllvm doesn't actually do anything useful currently

Joshua Warner (Feb 15 2025 at 05:57):

Still very puzzled as to what's going on

Brendan Hansknecht (Feb 15 2025 at 05:58):

Could be out of disk space, but should have 14GB, so a bit surprising

Brendan Hansknecht (Feb 15 2025 at 05:58):

llvm is only like 300MB or so

Brendan Hansknecht (Feb 15 2025 at 05:58):

even if it gets duplicated for ever executable, it should be no where near 14GB

Brendan Hansknecht (Feb 15 2025 at 05:58):

Also, why would it break on this PR specifically...hmm

Joshua Warner (Feb 15 2025 at 06:02):

Hmm; added a call to df -h and of course now it works

Brendan Hansknecht (Feb 15 2025 at 06:02):

You still don't have -Dllvm

Joshua Warner (Feb 15 2025 at 06:03):

Oh whoops

Joshua Warner (Feb 15 2025 at 06:06):

Probably not disk space:

 /dev/root        72G   46G   26G  64% /

Brendan Hansknecht (Feb 15 2025 at 06:11):

is it only failing on ubuntu-24.04?

Joshua Warner (Feb 15 2025 at 06:11):

Yep :/

Brendan Hansknecht (Feb 15 2025 at 06:12):

Can we try ubuntu-22.04 just to see?

Brendan Hansknecht (Feb 15 2025 at 06:13):

I have no immediate ideas, probably would need to pull up a linux machine and test.

Joshua Warner (Feb 15 2025 at 06:23):

It works on ubuntu-22.04 :shrug:

Brendan Hansknecht (Feb 15 2025 at 06:32):

Send it and worry about it later.

Brendan Hansknecht (Feb 15 2025 at 06:32):

Might be some weird GitHub CI bug

Andrew Kelley (Feb 15 2025 at 09:08):

Luke Boswell said:

I want to debug print in tests.. but only if the test fails

oh yeah I've been meaning to make what you want be the default thing

Andrew Kelley (Feb 15 2025 at 09:09):

tracking issue: https://github.com/ziglang/zig/issues/5738

Notification Bot (Feb 16 2025 at 00:03):

A message was moved from this topic to #compiler development > zig compiler - builtins by Luke Boswell.

Brendan Hansknecht (Feb 16 2025 at 19:23):

Good watch on some zig patterns like comptime interfaces. Discussed a handful of useful ideas: https://youtu.be/l_qY2p0OH9A?t=1920&si=gzHTwuGTYRWoDQlS

Joshua Warner (Feb 16 2025 at 21:54):

Implemented some functionality in the old formatter to support migrating to the new braces syntax: https://github.com/roc-lang/roc/pull/7619

There are almost certainly bugs hiding in the formatting here, since I'm not really doing any verification on it. I'd like to assert that the new parser parses all of these inputs (or we add more "NotSupported" errors as appropriate).

@Anthony Bullard that PR also contains *.migrated.roc files, which should hopefully serve as useful test inputs for the new parser.

Luke Boswell (Feb 17 2025 at 02:15):

I wanted to understand the plan to wire up the compiler stages. I've put together this simplified diagram that I think captures some of the key features - with the help of Sam.

Roc Compiler Stages - Environments.pdf

Some key points to summarise

we start interning strings, names etc in tokenisation... so we instantiate a SoloEnv there
the SoloEnv is shared with Can and contains all the interned (and deduplicated) data for a single file (or module) at a time.
When we resolve imports we analyse the graph of module dependencies to find the strongly connected components, then combine those with cyclic deps that cannot be separated into ModuleSet. Most ModuleSet's probably contain only a single module.
We instantiate a CombinedEnv for each set of modules, which contains all the interned data combined from the separate modules, and this env will be passed between all the future compiler stages.
The Coordinator will ensure that any dependencies are completed first before commencing a stage
The IR's should contain any data that doesn't live beyond the next stage. For example the information about "types" in TypeSpecIR is in the IR and not in the CombinedEnv.
Re namespacing a Module lives under SoloEnv, and a ModuleSet lives under CombinedEnv.

Brendan Hansknecht (Feb 17 2025 at 02:21):

I thought we don't have cyclic dependencies

Brendan Hansknecht (Feb 17 2025 at 02:21):

So no module sets?

Brendan Hansknecht (Feb 17 2025 at 02:21):

Though maybe type variables or static dispatch change this?

Sam Mohr (Feb 17 2025 at 02:21):

We didn't used to have them, but we'd like to see if we can make cyclic dependencies work

Sam Mohr (Feb 17 2025 at 02:22):

The reason is because you might have two custom types that depend on each other (we expect this for DB constructs) and they both need to define a to_str

Brendan Hansknecht (Feb 17 2025 at 02:22):

Oh, interesting. Are there restrictions?

Sam Mohr (Feb 17 2025 at 02:22):

They can't do that in the same file

Sam Mohr (Feb 17 2025 at 02:23):

Currently, the plan is to have them give a warning about compilation speed if it's not necessary

Sam Mohr (Feb 17 2025 at 02:23):

And "necessary" means the modules have custom types that depend on each other

Richard Feldman (Feb 17 2025 at 02:33):

yeah, basically the idea is that if you have two nominal tag unions that are mutually recursive, and you also want them to have static dispatch (which could come up when modeling database tables, for example) that's currently impossible

Richard Feldman (Feb 17 2025 at 02:35):

so if we do this, then all the modules in the cycle basically get type-checked as one big unit

Richard Feldman (Feb 17 2025 at 02:36):

which is worse for both concurrency and caching granularity

Richard Feldman (Feb 17 2025 at 02:36):

so it's kind of a perf footgun unless you're specifically using it to resolve the thing that's currently impossible

Richard Feldman (Feb 17 2025 at 02:37):

e.g. if you're not careful, one wrong import can accidentally make your entire project into one big cycle that can't be parallelized at all anymore

Richard Feldman (Feb 17 2025 at 02:37):

so the warning would be basically "you have a cycle involving modules that don't expose mutually recursive types, so you should break it up!"

Richard Feldman (Feb 17 2025 at 02:37):

that way you get equivalent perf to today

Richard Feldman (Feb 17 2025 at 02:38):

because the only cycles are between modules that have mutually recursive types, which is also true today (where you have to just literally put them in the same module instead)

Richard Feldman (Feb 17 2025 at 02:38):

as a bonus, it's also nice for "always report, never block"

Richard Feldman (Feb 17 2025 at 02:38):

in that if you find yourself in a spot where you import something and it causes a cycle, you aren't blocked - you can do that, and still run your program

Richard Feldman (Feb 17 2025 at 02:39):

and maybe confirm whether you want to keep that architecture before cleaning up the cycle later

Brendan Hansknecht (Feb 17 2025 at 03:14):

What if you end up being unsure how to break up the cycle and want to just leave it be. Will you be stuck with roc reporting failures in CI?

Sam Mohr (Feb 17 2025 at 04:30):

That's the current plan. You can always remove it by putting the cycle in a single file, but even more, since the cycle will be immediately obvious on creation, this will be an annoyance that will be early to see in at least most cases

Sam Mohr (Feb 17 2025 at 04:31):

I'd rather we not allow cyclic imports to prevent this from even being an issue, but the aforementioned custom types scenario doesn't seem to have a better solution

Brendan Hansknecht (Feb 17 2025 at 04:47):

Yeah, I just find it strange partially allowing it

Brendan Hansknecht (Feb 17 2025 at 04:47):

Sounds painful potentially

Andrew Kelley (Feb 17 2025 at 04:59):

btw can I get Roc compiler developers' opinions about this? https://github.com/ziglang/zig/pull/22137

do you prefer status quo names or the names proposed in this PR?

Brendan Hansknecht (Feb 17 2025 at 05:06):

I think the old wording is definitely clearer. I don't have to guess at all. That said, you only have to learn once, so the new wording is fine overall.

I wish ensureReserved worked, I think it's nicer than reserveUnused. I guess appendReserved works.

Norbert Hajagos (Feb 17 2025 at 07:41):

Not familiar with the zig std, so I don't know if there are inconsistencies, but the original wording communicates intent better. To me, even appendAssumeCapacity is more clean than appendReserved, though not by much.

Andrew Kelley (Feb 17 2025 at 09:18):

Thanks!

Luke Boswell (Feb 18 2025 at 00:13):

@Joshua Warner @Anthony Bullard

Could you please make a single unit test or something that produces a minimal parse.IR.

Like are we at the point we're the IR could represents this?

module [name]

name = "Luke"

This would be really helpful for working on Can. We're just not sure how to work with the ParseIR rn.

Luke Boswell (Feb 18 2025 at 00:14):

Just the output -- hardcoded is ok if we don't have the parser implementation this far yet.

Luke Boswell (Feb 18 2025 at 00:28):

I'm trying to make something like this... but definitely not right

test "Example Can IR" {
    // Imagine we received a parse.IR representing the following roc module

    const source =
        \\module [name]
        \\
        \\name = "Luke"
    ;

    const parse_ir = parse.IR{
        .source = source,
        .tokens = parse.TokenizedBuffer.init(std.testing.allocator),
        .store = parse.NodeStore.init(std.testing.allocator),
        .errors = .{},
    };

    var can_ir = IR.init(std.testing.allocator);

    // We called "canonicalize" and
    canonicalize(&can_ir, &parse_ir, std.testing.allocator);
}

Sam Mohr (Feb 18 2025 at 00:46):

I think it'd be nice to have these fields added to the parse.IR:

header: NodeStore.Header
statements: std.ArrayList(NodeStore.Statement)

Sam Mohr (Feb 18 2025 at 00:46):

Or something like those, as that would unblock canonicalization

Sam Mohr (Feb 18 2025 at 00:47):

For now, I'm just pretending those exist and using dummy values

Richard Feldman (Feb 18 2025 at 01:14):

something I always wished we had in the Rust code base was having every single test suite start with normal Roc source code as the input

Richard Feldman (Feb 18 2025 at 01:15):

I think to do that, we'd need test helpers for each step that build on the previous step's helper

Sam Mohr (Feb 18 2025 at 01:16):

I presume you're thinking of something like what @Agus Zubiaga started setting up for the test_compile crate: https://github.com/roc-lang/roc/blob/26f9416929aa0cd52ca732fc533b4a94a690de04/crates/test_compile/src/help_constrain.rs#L29

Luke Boswell (Feb 18 2025 at 01:18):

Richard Feldman said:

something I always wished we had in the Rust code base was having every single test suite start with normal Roc source code as the input

I would like this.

Im currently just trying to get my head around the IRs in a really minimal sense. Not suggesting we make unit tests like this.

Brendan Hansknecht (Feb 18 2025 at 01:28):

Richard Feldman said:

something I always wished we had in the Rust code base was having every single test suite start with normal Roc source code as the input

I think many tests can be that way, but it also can make things more brittle to changes higher up the stack. Also can make it harder to write some of the low level optimization steps.

Brendan Hansknecht (Feb 18 2025 at 01:28):

That said..I think that is what the super snapshot test framework is for

Brendan Hansknecht (Feb 18 2025 at 01:29):

I really think we need to get the base of snapshots working with parse and can. Then slowly work it down the stack

Richard Feldman (Feb 18 2025 at 01:41):

Brendan Hansknecht said:

Richard Feldman said:

something I always wished we had in the Rust code base was having every single test suite start with normal Roc source code as the input

I think many tests can be that way, but it also can make things more brittle to changes higher up the stack.

I agree, but I think it's worth it :big_smile:

Richard Feldman (Feb 18 2025 at 01:41):

like I spent a lot of time in tests trying to build IRs from scratch and then taking even more time to try to figure out if they were correct

Richard Feldman (Feb 18 2025 at 01:42):

also I know there were tests I wouldn't write just because setting up the IRs was too tricky, but starting from scratch it wouldn't have been

Richard Feldman (Feb 18 2025 at 01:42):

so I think the brittleness to upstream changes is worth it

Joshua Warner (Feb 18 2025 at 01:42):

@Sam Mohr the reason I'd like not to just directly have a header/statements is I want this to be able to directly parse individual expressions both for testing and for repl evaluation

Brendan Hansknecht (Feb 18 2025 at 01:43):

Richard Feldman said:

like I spent a lot of time in tests trying to build IRs from scratch

Yeah, this is a fundamental problem

Brendan Hansknecht (Feb 18 2025 at 01:44):

I assume if every ir can be printed and parsed, it will reduce this pain a lot

Richard Feldman (Feb 18 2025 at 01:44):

I'm not saying all tests should start from source code btw, but rather that each stage should have at least some tests that are that way

Brendan Hansknecht (Feb 18 2025 at 01:44):

Yeah, I think we want a comprehensive library of snapshot tests and they should run through every single ir

Richard Feldman (Feb 18 2025 at 01:44):

and that means we're set up to do them, and can reach for that whenever we want

Brendan Hansknecht (Feb 18 2025 at 01:45):

100%

Luke Boswell (Feb 18 2025 at 01:48):

For the immediate short term, while we figure out what is and isnt in each IR / Env -- I'm hoping we can make a couple of simple hardcoded examples fur the purpose of seeing what the IR actually looks like and how we might use it.

I'm just having trouble piecing everything together and getting familiar with the representations like SoA etc

Luke Boswell (Feb 18 2025 at 01:50):

Before we have spent much time on implementing things, we might pick up on fundamental design/arch issues, and it's easier to change course now.

Like are we allowing cyclic imports or not... that seems kind of helpful to have a rough plan for now. <-- just an example of a discussion that has spun out of recent efforts to build out the API

Richard Feldman (Feb 18 2025 at 02:20):

Richard Feldman said:

yeah, basically the idea is that if you have two nominal tag unions that are mutually recursive, and you also want them to have static dispatch (which could come up when modeling database tables, for example) that's currently impossible

so I thought about this some more, and I think it's better if we continue to disallow cyclic imports and recommend this workaround if anyone actually needs mutually recursive nominal types with static dispatch:

define the two mutually recursive nominal types (let's call them A and B) in the same module, let's call it AandB.roc
create separate A.roc and B.roc modules, each of which imports AandB.roc only
both A.roc and B.roc exposes a nominal type which wraps the appropriate mutually recursive type inside AandB.roc, and exposes all the desired methods on those
if the underlying nominal type needs to be exposed too (e.g. because the tags are needed for matching), then A.roc and B.roc can each provide a to_inner function which returns the underlying nominal type from AandB. Static dispatch won't be available on this structure, but that's okay because the wrapper does.

Richard Feldman (Feb 18 2025 at 02:24):

the tradeoffs of this compared to allowing cyclic imports and doing a warning:

it's less ergonomic in the specific case where you want mutually recursive types and static dispatch. Assuming this does come up with database queries, I'm betting it would most often come up with types that are being auto-generated by an external tool anyway, which decreases the downside of the extra module
the rule is easier to understand: "this isn't supported, don't do it"
it significantly simplifies (and speeds up by some small amount) certain parts of the compiler

Sam Mohr (Feb 18 2025 at 02:27):

So if you want to be able to call some external use_special_method : a -> Str where a.special_method() -> Str on A or B, you can't?

Sam Mohr (Feb 18 2025 at 02:28):

I'm not sure how bad that limitation is

Sam Mohr (Feb 18 2025 at 02:28):

Especially for the auto-genned methods like to_str and encode

Richard Feldman (Feb 18 2025 at 02:29):

I probably should have chosen better names

Sam Mohr (Feb 18 2025 at 02:29):

Yeah, they're tripping me up :sweat_smile:

Richard Feldman (Feb 18 2025 at 02:29):

let's say A.roc exposes ExternalA and that's a wrapper around InternalA from AandB.roc

Richard Feldman (Feb 18 2025 at 02:29):

so InternalA is mutually recursive with InnerB

Richard Feldman (Feb 18 2025 at 02:30):

InternalA and InternalB are both in AandB.roc

Richard Feldman (Feb 18 2025 at 02:30):

and ExternalA wraps InternalA and imports AandB.roc but that's it

Richard Feldman (Feb 18 2025 at 02:30):

so you can do static dispatch on ExternalA (which just operates on its wrapped InternalA for you)

Richard Feldman (Feb 18 2025 at 02:31):

and then you can call to_inner : ExternalA -> InternalA if you want to pattern match on it

Richard Feldman (Feb 18 2025 at 02:31):

(or maybe some things would make ExternalA opaque and not expose a to_inner - that's fine too)

Sam Mohr (Feb 18 2025 at 02:31):

Yeah, okay, that's a fine limitation

Richard Feldman (Feb 18 2025 at 02:31):

so you can't dispatch directly on InnerA but that's fine because you can static dispatch on the ExternalA wrapper around it instead

Richard Feldman (Feb 18 2025 at 02:31):

yeah, it's definitely an ergonomic downside

Richard Feldman (Feb 18 2025 at 02:32):

but you can still do everything

Sam Mohr (Feb 18 2025 at 02:32):

As long as its only ergonomics

Sam Mohr (Feb 18 2025 at 02:32):

We force the same tradeoff elsewhere, e.g. unicode stuff

Sam Mohr (Feb 18 2025 at 02:32):

It's good that Roc is opinionated

Richard Feldman (Feb 18 2025 at 02:32):

yeah I mean if you have a bunch of these that are all mutually recursive, you could end up with some gigantic QueriesInner.roc or whatever

Richard Feldman (Feb 18 2025 at 02:33):

but what I realized is that if we allow cyclic imports, the compiler still ends up effectively dealing with that file - except it's even more work for the compiler, because first it has to staple together several cyclically imported modules into one gigantic module

Sam Mohr (Feb 18 2025 at 02:34):

Roc is a high-level language because we want to do stuff for you automatically you'd do anyway, right?

Richard Feldman (Feb 18 2025 at 02:34):

so on the one hand, arguably the ergonomics are better for the programmer because the modules are smaller, but on the other hand, there's something I do appreciate about like "hey these things are all tangled together into one big chunk" not being hidden from the programmer

Sam Mohr (Feb 18 2025 at 02:35):

So I'm more convinced by the "make it awkward to force better organization" argument

Richard Feldman (Feb 18 2025 at 02:35):

well, it really comes down to whether there are use cases where a ton of mutually recursive types is actually the best way to write the code

Sam Mohr (Feb 18 2025 at 02:36):

Do we allow any recursion at all outside of custom unions?

Richard Feldman (Feb 18 2025 at 02:36):

I am somewhat skeptical that those use cases really exist, but I also don't think I've really explored the space of representing the outputs of database queries that involve joins, nested queries, etc. using Roc's type system

Richard Feldman (Feb 18 2025 at 02:36):

Sam Mohr said:

Do we allow any recursion at all outside of custom unions?

not anymore

Sam Mohr (Feb 18 2025 at 02:36):

What about Tree a : [Leaf a, Cons (List (Tree a))]

Richard Feldman (Feb 18 2025 at 02:36):

although we do want to allow it within lists

Richard Feldman (Feb 18 2025 at 02:37):

yeah we never supported that, but always should have :sweat_smile:

Sam Mohr (Feb 18 2025 at 02:37):

lol yep

Richard Feldman (Feb 18 2025 at 02:37):

same with sets and dictionaries

Sam Mohr (Feb 18 2025 at 02:37):

Anything list-backed, yep (or Zig-list backed, distinct from written in a Roc-native list)

Richard Feldman (Feb 18 2025 at 02:38):

we used to support it in structural tag unions, but decided to stop because of https://github.com/roc-lang/rfcs/pull/1

Sam Mohr (Feb 18 2025 at 02:39):

So then, I'm gonna finish up the coordinate.zig work I was doing to implement this, but get rid of ModuleSet and move towards sequential module ID assignment for post-cache work. Sound good?

Sam Mohr (Feb 18 2025 at 02:39):

This doesn't feel like we need a big discussion

Richard Feldman (Feb 18 2025 at 02:40):

sounds good!

Sam Mohr (Feb 18 2025 at 02:40):

awesome

Luke Boswell (Feb 18 2025 at 02:50):

I'm really glad we've gone this direction.

Brendan Hansknecht (Feb 18 2025 at 06:11):

I still really think we should support this without requiring custom tags if possible:
Tree a : { data: a, children: List a }

Brendan Hansknecht (Feb 18 2025 at 06:11):

Just recursion through list. Cause that covers dict and set as well

Sam Mohr (Feb 18 2025 at 06:11):

That seems to be what we are agreeing on above

Brendan Hansknecht (Feb 18 2025 at 06:11):

Oh, you still wrapped it in a tag

Sam Mohr (Feb 18 2025 at 06:12):

Oh, good point

Sam Mohr (Feb 18 2025 at 06:12):

Hmm

Brendan Hansknecht (Feb 18 2025 at 06:12):

I want it completely tag free

Sam Mohr (Feb 18 2025 at 06:12):

Should be fine

Brendan Hansknecht (Feb 18 2025 at 06:12):

List always breaks recursions.

Sam Mohr (Feb 18 2025 at 06:13):

Yep

Anthony Bullard (Feb 18 2025 at 14:31):

Luke Boswell said:

Joshua Warner Anthony Bullard

Could you please make a single unit test or something that produces a minimal parse.IR.

Like are we at the point we're the IR could represents this?
module [name]

name = "Luke"
This would be really helpful for working on Can. We're just not sure how to work with the ParseIR rn.

If you need this right now, I can add parsing a plain module header to my PR

Anthony Bullard (Feb 18 2025 at 14:32):

Everything else should work

Anthony Bullard (Feb 18 2025 at 14:32):

@Sam Mohr header and statements do exist. If you get a file, it has this shape:

    pub const File = struct {
        header: HeaderIdx,
        statements: []const StatementIdx,
        region: Region,
    };

Anthony Bullard (Feb 18 2025 at 14:33):

You just need to run ast.store.getFile() to get it

Anthony Bullard (Feb 18 2025 at 14:34):

Another passing note, I am at the point where I want to start lazily creating the Parse IR display format, because I think it'll be useful for debugging for me

Anthony Bullard (Feb 18 2025 at 14:39):

So I get:

(file (app (main!) "pf" ".../platform.roc" ()) (
    (import "Stdout" "pf")
    (decl (ident "main!") (
        (expr (apply (ident "line!" "Stdout") ((string "Hello, world!")))))))

Maybe with the region as a (START END) sexpr after the tag.

Norbert Hajagos (Feb 18 2025 at 18:00):

@Anthony Bullard We were discussing this in #compiler development > zig compiler - IR serde . The relevant part of that topic is that all IRs would have a way to get turned into S-expression nodes, which then would be serializable to string. Similarly (but less relevant to you), we would have 1 S-expression parser that would take in a string and turn that into a node, which then could be translated into any of the IR-s. That way the pretty printing and parsing could be in 1 place.

Sam Mohr (Feb 18 2025 at 19:10):

Okay, we can start running with that for canonicalization. The remaining thing is alignment of Region structs. Maybe we can just use the parse.IR.Region (source) everywhere and expect that to update when necessary? @Anthony Bullard would you think that a bad idea?

Joshua Warner (Feb 18 2025 at 21:06):

I think it's actually worth trying to use a parser node id as a region

Joshua Warner (Feb 18 2025 at 21:06):

That's a single u32, which is nice

Sam Mohr (Feb 18 2025 at 21:06):

What if a region is for a function body?

Joshua Warner (Feb 18 2025 at 21:06):

Then use the node that corresponds specifically to the body :)

Sam Mohr (Feb 18 2025 at 21:07):

Okay, yeah

Sam Mohr (Feb 18 2025 at 21:07):

I think that could work

Sam Mohr (Feb 18 2025 at 21:07):

All we'd need to do is ensure that all diagnostics we'd want to show would be traceable to such a parser node

Sam Mohr (Feb 18 2025 at 21:08):

But for the same source code and the same Roc compiler version, that's free

Sam Mohr (Feb 18 2025 at 21:09):

I'd love to make an alias so that we aren't just passing around Parser.NodeStore.Idx or whatever the actual thing is called

Sam Mohr (Feb 18 2025 at 21:09):

Not sure what to call it

Sam Mohr (Feb 18 2025 at 21:10):

maybe

pub const ParseRegion = struct {
    node_id: Parser.NodeStore.Idx,
};

Brendan Hansknecht (Feb 18 2025 at 21:49):

Joshua Warner said:

I think it's actually worth trying to use a parser node id as a region

Then for caching, we have to serialize both the parse and can ir?

Joshua Warner (Feb 18 2025 at 21:54):

My thinking is to not write that out, but rather re-generate it on demand if we ever need to emit errors for that file.

Joshua Warner (Feb 18 2025 at 21:55):

(I'm a little unsure of what this means about things like debug info, where resolving line:col info is in the hot path)

Joshua Warner (Feb 18 2025 at 21:56):

I kinda want to say we should not emit traditional debug info unless asked, and by default we should have some faster-to-generate thing.

Brendan Hansknecht (Feb 18 2025 at 22:04):

I think it is reasonable to say that optimized llvm builds (which likely will be the default llvm builds) don't emit debug info or emit very limited debug info.

Richard Feldman (Feb 18 2025 at 22:05):

yeah the interpreter seems like it would reduce demand for debuginfo a lot

Brendan Hansknecht (Feb 18 2025 at 22:05):

Of course need to be able to opt in for llvm optimized builds with full debug info

Richard Feldman (Feb 18 2025 at 22:06):

eventually yeah, but not a hard requirement for 0.1.0 I don't think

Brendan Hansknecht (Feb 18 2025 at 22:06):

And I think dev llvm builds should have full debug info (but the interpreter will make those rarer)

Brendan Hansknecht (Feb 18 2025 at 22:07):

Richard Feldman said:

eventually yeah, but not a hard requirement for 0.1.0 I don't think

It think I might try to add in full debug info on first build of the llvm backend. I think it is the kind of work that can be painful to add later.

Richard Feldman (Feb 18 2025 at 22:07):

awesome!

Anthony Bullard (Feb 18 2025 at 22:32):

Sorry, why a single parser/writer for all IR display formats? I don't know how well those will work together....

Anthony Bullard (Feb 18 2025 at 22:33):

But I'm happy to see it if it does!

Joshua Warner (Feb 18 2025 at 22:49):

My (perhaps unfounded) assumption is that there will be some useful commonality to extract there, but I could be wrong.

Joshua Warner (Feb 18 2025 at 22:50):

In particular I was thinking about things like pretty-printing the s-expr, which if you want to have a reasonably dense format that doesn't take up a bunch of vertical space, is non-trivial

Joshua Warner (Feb 18 2025 at 22:51):

I'm ambivalent as to whether that takes the form of a bunch of utilities that are used adhoc by different IRs, or a system where you convert the IR to/from some s-expr nodes which are then processed

Luke Boswell (Feb 19 2025 at 00:49):

I've made a zig library to help parse and generate S-expression. Just wanted to let people know that I've started on this. Haven't wired it up with any of our actual IR's yet.

@Norbert Hajagos and I will polish it later today and I'll make a PR to share it.

Luke Boswell (Feb 19 2025 at 00:51):

Here's a snippet of what I've got so far

/// Represents a token in an S-expression.
///
/// This type is comptime generic over two types: `T` for identifiers and `V` for values.
pub fn Token(comptime T: type, comptime V: type) type {
    return union(enum) {
        ident: T,
        value: V,
        lparen,
        rparen,
    };
}

Isaac Van Doren (Feb 22 2025 at 23:32):

Do we want to use an arg parsing library for the CLI or roll it ourselves? I remember hearing talk of wanting to use very few third party libraries.

Richard Feldman (Feb 22 2025 at 23:38):

roll it ourselves!

Richard Feldman (Feb 22 2025 at 23:39):

as I believe @Andrew Kelley has said, "just write a parser for cli args"

Luke Boswell (Feb 22 2025 at 23:43):

We currently have one I wrote that is super simple

Luke Boswell (Feb 22 2025 at 23:43):

https://github.com/roc-lang/roc/blob/7fc2a08e2811fed7207ab5035f680bbf697d232f/src/cli.zig#L54

Luke Boswell (Feb 22 2025 at 23:44):

Probably could do with some love though

Isaac Van Doren (Feb 22 2025 at 23:59):

Okay sweet. I'm going to take a look at wiring up the formatter in the CLI and wanted to confirm the approach

Anthony Bullard (Feb 23 2025 at 22:16):

Thank you Isaac! I'd love to get that so I can start writing files straight-up and running the formatter on it

Joshua Warner (Feb 23 2025 at 22:18):

Some string tokenizing + parsing updates: https://github.com/roc-lang/roc/pull/7632

Anthony Bullard (Feb 23 2025 at 22:38):

Type Annotation and Declaration parsing and formatting: https://github.com/roc-lang/roc/pull/7633

Agus Zubiaga (Feb 24 2025 at 02:00):

I got most of Type.Store (new Subs) working today. I’m gonna work on unifying primitives tomorrow!

Lucas Rosa (Feb 24 2025 at 16:55):

haven't disappeared, have been reading almost every message. just letting people cook, lots of chefs already :D

Lucas Rosa (Feb 24 2025 at 17:01):

honestly, the speed at which this is moving needs to be studied, extremely impressive. ya'll are making what seems like a gargantuan effort look like a weekend project

Brendan Hansknecht (Feb 24 2025 at 17:03):

Definitely moving well, but still tons to do. That said, I do agree that it might be hard to fully parallelize to more folks right now. Mostly things at the top of the stack are moving right now.

Lucas Rosa (Feb 24 2025 at 17:04):

yea it's all good, I'll sneak something in eventually, I'm not here for personal glory :D

Luke Boswell (Feb 27 2025 at 09:51):

I'm trying to update my SExpr PR so I can land what I have, and am running into some issues with memory leaks/issues in the formatting tests.

One somewhat related question... should the SafeMultiList hold onto the allocator? why do we do that?

pub fn SafeMultiList(comptime T: type) type {
    return struct {
        items: std.MultiArrayList(T),
        allocator: std.mem.Allocator,

Luke Boswell (Feb 27 2025 at 09:52):

I'm looking at this error trace from a test and it points to the store.nodes.deinit() as the source of the problem.

thread 415308 panic: integer overflow
/opt/homebrew/Cellar/zig/0.13.0/lib/zig/std/multi_array_list.zig:540:31: 0x103032c37 in capacityInBytes (test)
            return elem_bytes * capacity;
                              ^
/opt/homebrew/Cellar/zig/0.13.0/lib/zig/std/multi_array_list.zig:544:49: 0x103032c7b in allocatedBytes (test)
            return self.bytes[0..capacityInBytes(self.capacity)];
                                                ^
/opt/homebrew/Cellar/zig/0.13.0/lib/zig/std/multi_array_list.zig:177:41: 0x10303122f in deinit (test)
            gpa.free(self.allocatedBytes());
                                        ^
/Users/luke/Documents/GitHub/roc/src/collections/safe_list.zig:132:30: 0x102ff531b in deinit (test)
            self.items.deinit(self.allocator);
                             ^
/Users/luke/Documents/GitHub/roc/src/check/parse/IR.zig:475:27: 0x102fa3ea3 in deinit (test)
        store.nodes.deinit();

Luke Boswell (Feb 27 2025 at 09:53):

My suspicion is that maybe we are passing it a different allocator somehow... and so it's trying to free memory that isn't there or something.

Luke Boswell (Feb 27 2025 at 09:55):

If anyone would like to take a look I pushed a commit for the above error https://github.com/roc-lang/roc/pull/7629/commits/6f641a7ee9a3e1c850643c44ef988774b62babc6

Brendan Hansknecht (Feb 27 2025 at 18:37):

Luke Boswell said:

My suspicion is that maybe we are passing it a different allocator somehow... and so it's trying to free memory that isn't there or something.

The error is an overflow when calculating the capacity. That should be before any sort of allocator interactions. Probably would be good to print out the capacity and element size in bytes before that call to see what they are.

Brendan Hansknecht (Feb 27 2025 at 18:38):

Luke Boswell said:

One somewhat related question... should the SafeMultiList hold onto the allocator? why do we do that?
pub fn SafeMultiList(comptime T: type) type {
    return struct {
        items: std.MultiArrayList(T),
        allocator: std.mem.Allocator,

MultiArrayList stores the allocator, so we should not need to.

Brendan Hansknecht (Feb 27 2025 at 18:39):

One possibility is that we deallocate out of order and that leads to reading a garbage capacity from freed memory. That or the equivalent but via a stack allocation

Brendan Hansknecht (Feb 27 2025 at 18:40):

I should be able to pull this later today and take a deeper look

Luke Boswell (Feb 27 2025 at 22:21):

If you could that would be helpful. I'm very lost staring at these errors in the zig stdlib

Luke Boswell (Feb 27 2025 at 22:22):

I feel like building the SExpr I'm bumping into issues that we're not aware of just because I'm wiring things up for the first time.

Brendan Hansknecht (Feb 28 2025 at 02:27):

@Luke Boswell I think I have everything you need in PR comments

Andrew Kelley (Feb 28 2025 at 23:21):

me trying to resist making SExpr-based jokes in this channel

Andrew Kelley (Feb 28 2025 at 23:24):

Luke Boswell said:

I'm looking at this error trace from a test and it points to the store.nodes.deinit() as the source of the problem.
thread 415308 panic: integer overflow
/opt/homebrew/Cellar/zig/0.13.0/lib/zig/std/multi_array_list.zig:540:31: 0x103032c37 in capacityInBytes (test)
            return elem_bytes * capacity;
                              ^

If this happens in debug mode, it's often because one of the operands is undefined. In zig, when you assign something to undefined (for example by freeing memory), the bytes are memset to 0xaa. This has some nice properties, including the fact that if you multiply by an integer even as small as 2 you get overflow.

Andrew Kelley (Feb 28 2025 at 23:25):

in other words, I would expect that stack trace if you called safe_list deinit() twice

Andrew Kelley (Feb 28 2025 at 23:27):

fear not, for this is checked illegal behavior, which is deterministic and straightforward to debug

Brendan Hansknecht (Feb 28 2025 at 23:49):

Yeah, was great for catching the double free. Just not the clearest error message

Brendan Hansknecht (Mar 01 2025 at 01:33):

What are general thoughts on always using Unmanaged datastructures?

Unmanaged just means that they do not store a pointer to an allocator. Instead the allocator must be passed in on for all functions that might allocate/deallocate.

Fundamentally, I don't think it is a big change. Just a minor api change. It likely isn't too important of a change, but avoids storing lots of extra copies of pointers to the allocator. Instead we just store one copy of the pointer to the allocator and pass it down the stack.

Seems to fit nicely with how we are designing datastructures.

Sam Mohr (Mar 02 2025 at 13:45):

@Brendan Hansknecht I think I've got all the coordinate code set up in this PR, more or less: https://github.com/roc-lang/roc/pull/7625

question for you on code structure: I was trying to figure out how best to pass around ownership of the different stage IRs and I realized that they are all in (Multi)ArrayLists anyway, so I changed from having each IR get inited and returned to each IR being created in a MultiArrayList as undefined and then their init functions get a pointer to that undefined IR that they init in-place (source and source). This seemed like a simple way to keep the reference depth to one, but maybe there's a better pattern. What do you think about this strategy? Am I not explaining this well enough?

Brendan Hansknecht (Mar 02 2025 at 18:24):

I'm not sure I quite follow. What is the advantage of

items.appendAssumeCapacity(.{
    .package_idx = can_irs.getPackageIdx(work_idx),
    .module_idx = can_irs.getModuleIdx(work_idx),
    .work = undefined,
});

init_work_with_env(&items.items(.work)[index], &can_irs.getWork(work_idx).env, gpa);

over (or whatever the equivalent would be):

items.appendAssumeCapacity(.{
    .package_idx = can_irs.getPackageIdx(work_idx),
    .module_idx = can_irs.getModuleIdx(work_idx),
    .work = Work.init_with_env(&can_irs.getWork(work_idx).env, gpa),
});

Sam Mohr (Mar 02 2025 at 19:12):

Because the second one is pass by value, so more copying is happening

Brendan Hansknecht (Mar 02 2025 at 19:59):

Your talking about the returned value from Work.init_with_env which is being placed into the larger struct as the .work field?

Sam Mohr (Mar 02 2025 at 20:01):

Yes

Sam Mohr (Mar 02 2025 at 20:02):

Not sure if that's a big cost, or if LLVM will optimize that away

Brendan Hansknecht (Mar 02 2025 at 20:03):

I wouldn't worry about that. Should be optimized away by llvm. Also, should not be anywhere near the hot path.

Brendan Hansknecht (Mar 02 2025 at 20:04):

large returned structs by default are turned into pointer args

Sam Mohr (Mar 02 2025 at 20:04):

Great, that was the most underlying question

Luke Boswell (Mar 05 2025 at 10:30):

Question re https://github.com/roc-lang/roc/pull/7664#discussion_r1980864364 @Anthony Bullard and @Joshua Warner

I was wanting to clarify, should I be slicing into the source bytes or getting this information from the interner?

Is the plan for the untagged union to go away eventually and everything will be interned?

Anthony Bullard (Mar 05 2025 at 12:16):

I haven't really paid attention to the interning progress we've made, I know that I'm using IR.resolve in the formatter

Joshua Warner (Mar 05 2025 at 16:03):

The untagged union won’t go away - eg for things that we definitely don’t need to intern (braces, symbols, etc)

Joshua Warner (Mar 05 2025 at 16:03):

My intent is that strings/number/etc will all be interned

Brendan Hansknecht (Mar 06 2025 at 02:18):

I assume if we have any sort of parse errors, we should not format the ast, right? We should print the parse errors and early exit.

Luke Boswell (Mar 06 2025 at 02:27):

I thought the plan was to do a best effort, and generate a compiler error.

Luke Boswell (Mar 06 2025 at 02:28):

Are you specifically talking about the cli formatter?

Brendan Hansknecht (Mar 06 2025 at 02:54):

Yeah, cli specifically

Luke Boswell (Mar 06 2025 at 02:54):

Ahk, that makes sense then.

Luke Boswell (Mar 06 2025 at 02:55):

The only thing I can think of is maybe there is something like an LSP that would want different behaviour. But that's a different tool.

Joshua Warner (Mar 06 2025 at 05:31):

Eventually I want the formatter to format everything _but_ the part of the input that has the error (e.g. maybe the nearest outer statement/decl) - and then copy the source text from the input verbatum, for the section with the error - the only possible difference being indenting or dedenting the entire block of text, if appropriate.

Joshua Warner (Mar 06 2025 at 05:31):

... but for now I think we should just make the formatter bail out on parser OR tokenizer errors.

Brendan Hansknecht (Mar 06 2025 at 06:16):

That makes sense and will be cool when it works

Anthony Bullard (Mar 06 2025 at 10:30):

I think we need to keep the source input around throughout all sources that can report errors in the source

Brendan Hansknecht (Mar 06 2025 at 16:43):

At some point we could manifest the errors if we need to.

Brendan Hansknecht (Mar 08 2025 at 21:50):

Note to all, on latest main, the zig version is now 0.14.0

If you have an issue compiling after updating zig, you may need to delete some caches (.zig-cache and/or ~/.cache/zig/, maybe also zig-out).

Brendan Hansknecht (Mar 09 2025 at 00:08):

Missed this on my first read of the zig updates, but zig plans to make the Unmanaged containers their default contianers. So in zig 0.15.0 std.ArrayList will be what was previously std.ArrayListUnmanaged. So I guess switching over to the unmanaged veriants makes even more sense.
Embracing "Unmanaged"-Style Containers

Luke Boswell (Mar 09 2025 at 06:30):

Just wanted to highlight something with our Parser diagnostics. See https://github.com/roc-lang/roc/pull/7672#discussion_r1986224249

I'm thinking we probably want the Parser problems to be pushed into the ModuleEnv so they outlive the parsing stage of the compiler, and later stages that see a malformed node can still reference those errors.

What do people think?

Sam Mohr (Mar 09 2025 at 06:33):

That was always the plan in my eyes, but I do remember the Problem.Parse variant being removed by someone else

Sam Mohr (Mar 09 2025 at 06:33):

So there may be a reason I don't know about why we don't want them in ModuleEnv

Sam Mohr (Mar 09 2025 at 06:34):

But I don't know it

Luke Boswell (Mar 09 2025 at 06:34):

Yeah, I suspect we just haven't ever got that far before. Now we've got more of the coordinate and other stages set up a bit, we can find and sort out these things.

Brendan Hansknecht (Mar 09 2025 at 06:36):

Sounds good to me. Avoids manually managing the lifetime

Luke Boswell (Mar 09 2025 at 08:23):

I'm happy to attempt this change. Though I'll wait for when @Anthony Bullard is next online as he may have ideas or want to work on this.

Luke Boswell (Mar 10 2025 at 00:34):

Where did we land on || -- is that meant to be parsed as an or or are we only accepting keyword?

Luke Boswell (Mar 10 2025 at 00:35):

I'm just working on an ambiguous fuzz failure and this is the current problem.

Brendan Hansknecht (Mar 10 2025 at 00:46):

I'm pretty sure both || and && should parse and then format to or and and

Luke Boswell (Mar 10 2025 at 00:47):

This is the fuzz issue

~~~META
description=fuzz crash
~~~SOURCE
||1

Luke Boswell (Mar 10 2025 at 00:48):

That parses the first time as a lambda, then formats as || (without the space) then the second time around it parses as an or

Luke Boswell (Mar 10 2025 at 00:53):

Here's the latest snapshot output for that... including the tokens

~~~META
description=fuzz crash
~~~SOURCE
||1
~~~FORMATTED
|| 1
~~~TOKENS
OpBar,OpBar,Int,EndOfFile
~~~PARSE
(file
    (malformed 'missing_header')
    (lambda
        (args)
        (int '1')))

Luke Boswell (Mar 10 2025 at 00:54):

One "fix" I found that works, was to format the lambda with no args with a space, e.g. | |

Brendan Hansknecht (Mar 10 2025 at 00:54):

ah yeah, there is ambiguity now. Not sure the plan on that

Luke Boswell (Mar 10 2025 at 00:55):

Would it look terrible if it formatted as |_|?

Brendan Hansknecht (Mar 10 2025 at 00:56):

I think that would suggest an ignored arg

Luke Boswell (Mar 10 2025 at 00:57):

For the sake of moving past this crash, I think I'll format using a space for now. It's a hack but we can come back to it.

Richard Feldman (Mar 10 2025 at 01:33):

so the idea we settled on was that:

officially, we only support or and and
as a matter of convenience, we always have the formatter rewrite && to and, and in the situations where you wrote || and it can unambiguously detect that or would have worked (but 0-arg lambda would not), then it can also rewrite that to or for you. But there was at least one situation where either could work, and so in those situations we have to assume you meant lambda (which is why we have to go with or as the keyword)

Luke Boswell (Mar 10 2025 at 01:35):

Ok, that makes sense

Anthony Bullard (Mar 10 2025 at 13:22):

We are _not_ parsing || or && any longer. With the 0.1.0-line compiler we are hard moving to or and and

Anthony Bullard (Mar 10 2025 at 13:24):

Richard Feldman said:

as a matter of convenience, we always have the formatter rewrite && to and, and in the situations where you wrote || and it can unambiguously detect that or would have worked (but 0-arg lambda would not), then it can also rewrite that to or for you. But there was at least one situation where either could work, and so in those situations we have to assume you meant lambda (which is why we have to go with or as the keyword)

We can work to do this eventually, but I'm not going to focus on this now.

My biggest concern is landing my current PR, then finishing all header parsing, then figuring out and implementing where ... in type annotations, and then making malformed work well (the current situation has a number of issues)

Brendan Hansknecht (Mar 10 2025 at 16:09):

Anthony Bullard said:

We are _not_ parsing || or && any longer. With the 0.1.0-line compiler we are hard moving to or and and

I don't think this is correct. I think the tokenizer unifies them.

Brendan Hansknecht (Mar 10 2025 at 16:10):

So we likely need to delete support from the tokenizer if this is what we want.

Anthony Bullard (Mar 10 2025 at 21:13):

If that is the case, then yes the tokenizer needs an update

Joshua Warner (Mar 13 2025 at 04:06):

I think || at least needs to be given its own special token (not unified with or at that level)

Joshua Warner (Mar 13 2025 at 04:07):

We need context from the parser in order to distinguish cases where || would me or from uses as a no-args closure

Brendan Hansknecht (Mar 13 2025 at 05:07):

Parser could match on double single bar instead of doing it in the tokenizer.

Brendan Hansknecht (Mar 13 2025 at 05:07):

Though not sure the tradeoffs there

Brendan Hansknecht (Mar 13 2025 at 05:08):

Like if in an expression where it could see and or or, if the parser sees two ampersand or two bar tokens, it could consider them and/or

Joshua Warner (Mar 13 2025 at 14:53):

I don’t want two bars with white space in between to ever accidentally be treated as an pipe || or. So I think this definitely needs some kind of token representation that’s unique from that.

Brendan Hansknecht (Mar 13 2025 at 15:16):

I see. Yeah, forgot about whitespace

Brendan Hansknecht (Mar 13 2025 at 17:25):

Interesting reply to our roc post on switching to zig. Specifically about compile times:
https://x.com/zack_overflow/status/1899953945990357081?t=oaGhkuoKLPnMyAsvl_zUxg&s=19

Richard Feldman (Mar 13 2025 at 20:37):

I agree with your reply :big_smile:

Richard Feldman (Mar 13 2025 at 20:38):

I'll happily take "massive feedback loop improvements for large code bases are mid-development but haven't landed yet" over "there are no plans for massive feedback loop improvements at any point in the future"

Sam Mohr (Mar 13 2025 at 20:53):

Even worse, no possible plans, IMO

Sam Mohr (Mar 13 2025 at 20:54):

The WASM runtime for proc macros seems pretty promising

Sam Mohr (Mar 13 2025 at 20:54):

And parallelism was only recently introduced in nightly for pre-LLVM stages within the same crate

Sam Mohr (Mar 13 2025 at 20:55):

But Rust as a language isn't designed in a way that can be compiled quickly

Sam Mohr (Mar 13 2025 at 20:55):

This coming from one of Rust's biggest fanboys

Andrew Kelley (Mar 14 2025 at 01:39):

I think y'all should be able to try out this workflow already (the thing the bun guy said they can't try out because of usingnamespace).

has anyone tried it? can you report success or trouble with it?

Andrew Kelley (Mar 14 2025 at 01:45):

note: specifically the "no-bin" thing (read the linked section to see how to set it up)

Brendan Hansknecht (Mar 14 2025 at 01:46):

Does no-bin have to be a flag? Currently we have a check build step, but zig seems to ignore it and still build everything.

Andrew Kelley (Mar 14 2025 at 01:47):

might be a bit of an awkward build system API thing here to work around. it needs to pass -fno-bin to the compiler, which I believe it does based on whether or not exe.getEmittedBin() is called, which is called by installArtifact

Andrew Kelley (Mar 14 2025 at 01:48):

I'm interested in addressing that, but more interested in seeing if you can get incremental compilation on errors only to work already

Brendan Hansknecht (Mar 14 2025 at 01:51):

If I comment out all installArtifact calls and remove the test step we get super fast compiles.

If I only comment out installArtifact, but leave in the test step, some reason every other build is super fast (presumably a bug of some sort cause I change the file every time).

Andrew Kelley (Mar 14 2025 at 01:52):

interesting. well you are welcome to file bugs against this functionality (codegen disabled). mlugg has been pretty diligent about fixing such bugs

Brendan Hansknecht (Mar 14 2025 at 01:52):

Despite changing the same amount of code between every run, every other call full rebuilds the tests and takes 2 seconds.

with test step

time zig build check -fincremental --watch
Build Summary: 6/6 steps succeeded
check success
├─ zig build-exe roc Debug native success 428ms
├─ zig test Debug native success 2s
├─ zig build-exe repro-tokenize Debug native success 372ms
└─ zig build-exe repro-parse Debug native success 406ms
Build Summary: 6/6 steps succeeded
check success
├─ zig build-exe roc Debug native success 19ms
├─ zig test Debug native success 21ms
├─ zig build-exe repro-tokenize Debug native success 22ms
└─ zig build-exe repro-parse Debug native success 18ms
Build Summary: 6/6 steps succeeded
check success
├─ zig build-exe roc Debug native success 26ms
├─ zig test Debug native success 2s
├─ zig build-exe repro-tokenize Debug native success 30ms
└─ zig build-exe repro-parse Debug native success 27ms
Build Summary: 6/6 steps succeeded
check success
├─ zig build-exe roc Debug native success 28ms
├─ zig test Debug native success 33ms
├─ zig build-exe repro-tokenize Debug native success 36ms
└─ zig build-exe repro-parse Debug native success 30ms
Build Summary: 6/6 steps succeeded
check success
├─ zig build-exe roc Debug native success 25ms
├─ zig test Debug native success 2s
├─ zig build-exe repro-tokenize Debug native success 30ms
└─ zig build-exe repro-parse Debug native success 25ms
Build Summary: 6/6 steps succeeded

Andrew Kelley (Mar 14 2025 at 01:54):

that's a good bug report assuming that you've managed to make the zig test command pass -fno-emit-bin

Andrew Kelley (Mar 14 2025 at 01:55):

and assuming that gets fixed soon, that should be a ~25ms recompile cycle for you while working on a refactor. hope that gives you a sense of where things are headed :)

Brendan Hansknecht (Mar 14 2025 at 01:59):

Is there any equivalent config to -fno-emit-bin for addTest followed by addRunArtifact?

Brendan Hansknecht (Mar 14 2025 at 01:59):

From a quick skim of the options, I don't see anything available there

Andrew Kelley (Mar 14 2025 at 01:59):

mm I think addTest is equivalent to addExecutable. so if you don't try to run the test, I think it will pass -fno-emit-bin. you can verify the CLI commands with --verbose

Andrew Kelley (Mar 14 2025 at 02:05):

btw another thing I'm doing in this release cycle is separating out the build runner process from the application's configure script (build.zig), so that you don't have to wait for the growing number of build system features to compile every time you change your build script. this is relevant as the fuzzer UI becomes more sophisticated

Brendan Hansknecht (Mar 14 2025 at 02:09):

Hmm, so yeah, some reason every other save it rebuilds the test binary. And it is missing -fno-emit-bin.
/Users/bren077s/vendor/zig-0.14.0/zig test -ODebug -Mroot=/Users/bren077s/Projects/roc/src/test.zig -lc --cache-dir /Users/bren077s/Projects/roc/.zig-cache --global-cache-dir /Users/bren077s/.cache/zig --name test --zig-lib-dir /Users/bren077s/vendor/zig-0.14.0/lib/ -fincremental --listen=-

That said, given I am not calling zig build test, I am a bit surprised this is happening at all.
I guess just because the test step exists which runs the test binary, anything that interacts with the test binary (even if it doesn't run the binary), will lead to the binary being generated. I made our check step depend on addTest, just to make sure all our tests compile.

Andrew Kelley (Mar 14 2025 at 02:10):

I'm sure the build system API could be improved to make this better without you having to think so hard about it

Brendan Hansknecht (Mar 14 2025 at 02:11):

Hmm, yeah, making 2 copies of the test step. One that is used for zig build test and actually runs the test, and one that is used for zig build check but never run fixes the issue.

Andrew Kelley (Mar 14 2025 at 02:11):

I mean, once the "codegen doesn't work yet" caveat is lifted, for instance, this will Just Work

Andrew Kelley (Mar 14 2025 at 02:12):

but anyway, I hope you will find these workarounds worth it for the 0.14.x release of zig - they sure helped me out a ton when working on big things

Brendan Hansknecht (Mar 14 2025 at 02:12):

Anyway, thanks for the tip, definitely will clean up our build.zig to make incremental builds work.

Andrew Kelley (Mar 14 2025 at 02:14):

np! report bugs :)

Andrew Kelley (Mar 14 2025 at 02:14):

one day we'll find all our own bugs with fuzzing but that day has not yet arrived

Luke Boswell (Mar 14 2025 at 05:30):

I've just checked out the new incremental stuff @Brendan Hansknecht landed for roc's zig compiler and configured ZLS too.

It's very fast. :firebird: :racecar: :rocket:

(edit) it's hard to find the right emoji to really convey the feeling

Loris Cro (Mar 14 2025 at 09:54):

The "official" emoji for fast Zig stuff is :zap: (:zap:)

Luke Boswell (Mar 14 2025 at 10:01):

Thank you Loris :zap:

Loris Cro (Mar 14 2025 at 10:07):

ZLS does support --watch so you should also be able to enjoy basically the best of both worlds (in editor diagnostics, fast feedback) if you have the correct setup https://github.com/zigtools/zls/pull/2096

Loris Cro (Mar 14 2025 at 10:21):

forgot to add: :zap:

Joshua Warner (Mar 27 2025 at 03:39):

What's the next thing that ought to be worked on at this point?

Sam Mohr (Mar 27 2025 at 03:40):

There are a few candidates

Sam Mohr (Mar 27 2025 at 03:40):

Moving to a keyboard

Sam Mohr (Mar 27 2025 at 03:47):

There are a few paths for us to take to get to an MVP

Sam Mohr (Mar 27 2025 at 03:47):

The first stage we have been talking about was functions and strings

Joshua Warner (Mar 27 2025 at 03:48):

_nods_

Sam Mohr (Mar 27 2025 at 03:48):

There were a couple of "more difficult to do after" steps I thought important to implement, like blocks and imports

Sam Mohr (Mar 27 2025 at 03:49):

But those aren't necessary

Sam Mohr (Mar 27 2025 at 03:49):

So if we want to finish up with imports, then implementing the basics of import resolution would be a good next step

Joshua Warner (Mar 27 2025 at 03:50):

That's part of can? Or is that in a later phase?

Sam Mohr (Mar 27 2025 at 03:50):

That's the ~~phase~~ stage directly after canonicalization

Joshua Warner (Mar 27 2025 at 03:51):

resolve_imports.zig, presumably

Sam Mohr (Mar 27 2025 at 03:51):

yep

Sam Mohr (Mar 27 2025 at 03:52):

In particular, can wants to treat imported data as "probably" present, and imported from a module with the given name. resolve_imports.zig would go through those imports (and also ingested files a la import "file.txt" as data : List(U8)) and match them to files in the filesystem or create an error

Joshua Warner (Mar 27 2025 at 03:52):

So it'll need to trigger parsing (and can) for those other files, right?

Joshua Warner (Mar 27 2025 at 03:53):

What's the mechanism for that?

Sam Mohr (Mar 27 2025 at 03:53):

That has already been implemented in the coordinate.zig foundational work

Sam Mohr (Mar 27 2025 at 03:53):

The coordinate.zig file looks at all files (Roc and non-Roc) in all referenced packages

Sam Mohr (Mar 27 2025 at 03:53):

And registers them in a big list

Sam Mohr (Mar 27 2025 at 03:55):

Then resolve_imports.zig would look in the *Package.Store and see if it can find a file with the right name for the imported module

Joshua Warner (Mar 27 2025 at 03:55):

Is that the discoverModulesStartingFromEntry thing?

Sam Mohr (Mar 27 2025 at 03:56):

I'd look at ModuleGraph.zig for whoever implements this, because that already forms a dependency graph and sorts the modules in reverse dependency order for compilation of post-cache stages, AKA import resolution and beyond

Joshua Warner (Mar 27 2025 at 03:56):

Aha great

Joshua Warner (Mar 27 2025 at 03:56):

And remind me, are we guaranteed to have a DAG? Or might there be import cycles?

Sam Mohr (Mar 27 2025 at 03:57):

We are banning import cycles

Sam Mohr (Mar 27 2025 at 03:57):

If we find an import cycle, we exit early:
https://github.com/roc-lang/roc/blob/2dbbdf3e1f6b80f7fade826b27e6a4703dd24357/src/coordinate.zig#L111

Sam Mohr (Mar 27 2025 at 03:57):

So this is one potential task

Brendan Hansknecht (Mar 27 2025 at 03:58):

I thought we needed import cycles for mutually recursive custom types.

Joshua Warner (Mar 27 2025 at 03:58):

We could just require that mutual recursion stays within one module, no?

Joshua Warner (Mar 27 2025 at 03:59):

Or are you saying you also want to allow functions of the same name on each type?

Sam Mohr (Mar 27 2025 at 03:59):

Well, what about Foo and Bar both having to_str methods

Joshua Warner (Mar 27 2025 at 03:59):

Ah, fair

Sam Mohr (Mar 27 2025 at 03:59):

Yeah... we should be able to support those awkwardly without allowing people to write less performantly-compiling code

Sam Mohr (Mar 27 2025 at 03:59):

I'll link the message from Richard's brain blast on the subject

Brendan Hansknecht (Mar 27 2025 at 04:00):

Yeah, I think in that one special case, we have to conseder those modules as a super unit essentially

Sam Mohr (Mar 27 2025 at 04:00):

Richard Feldman said:

Richard Feldman said:

yeah, basically the idea is that if you have two nominal tag unions that are mutually recursive, and you also want them to have static dispatch (which could come up when modeling database tables, for example) that's currently impossible

so I thought about this some more, and I think it's better if we continue to disallow cyclic imports and recommend this workaround if anyone actually needs mutually recursive nominal types with static dispatch:

define the two mutually recursive nominal types (let's call them A and B) in the same module, let's call it AandB.roc

create separate A.roc and B.roc modules, each of which imports AandB.roc only

both A.roc and B.roc exposes a nominal type which wraps the appropriate mutually recursive type inside AandB.roc, and exposes all the desired methods on those

if the underlying nominal type needs to be exposed too (e.g. because the tags are needed for matching), then A.roc and B.roc can each provide a to_inner function which returns the underlying nominal type from AandB. Static dispatch won't be available on this structure, but that's okay because the wrapper does.

From earlier in this channel:

Joshua Warner (Mar 27 2025 at 04:02):

Interesting

Brendan Hansknecht (Mar 27 2025 at 04:02):

ah yeah, forgot about that

Joshua Warner (Mar 27 2025 at 04:02):

This sorta creates a one-type-per-file requirement

Brendan Hansknecht (Mar 27 2025 at 04:03):

So yeah, painful, but no recursion

Brendan Hansknecht (Mar 27 2025 at 04:03):

This sorta creates a one-type-per-file requirement

Custom types do that in general. This is more a side effect.

Joshua Warner (Mar 27 2025 at 04:03):

Yaeh true

Joshua Warner (Mar 27 2025 at 04:04):

I guess what I was poking at is maybe static dispatch methods ought to be definable within a smaller scope than a whole module

Joshua Warner (Mar 27 2025 at 04:04):

Like if you could define types in submodules and re-export them from the parent (real file) module

Joshua Warner (Mar 27 2025 at 04:04):

like Rust's mod{} blocks

Brendan Hansknecht (Mar 27 2025 at 04:04):

Yeah, that has been mentioned a few times. I think it is currently in the lets wait and see in practice state

Joshua Warner (Mar 27 2025 at 04:04):

Yep yep, makes sense

Brendan Hansknecht (Mar 27 2025 at 04:05):

Cause could be dealt with via submodules or via explicitly bound methods to an object.

Sam Mohr (Mar 27 2025 at 04:19):

So @Joshua Warner, in order of my estimation of importance, there's

import resolution
typechecking
helping with canonicalization (might be tricky to coordinate?)
S-Expr's
type specialization
refcounting
error message rendering

Sam Mohr (Mar 27 2025 at 04:19):

Is there something in particular you want to hear about?

Sam Mohr (Mar 27 2025 at 04:21):

If you're interested in typechecking, I'd reach out to Agus and see if you can grab it from him, he seems very busy with work at the moment and hasn't been pushing to his PR since its creation.

Sam Mohr (Mar 27 2025 at 04:21):

But there are technically multiple people that are interested in working on it, so that one will happen eventually

Joshua Warner (Mar 27 2025 at 04:25):

I have fairly limited time, so whatever I pick up probably needs to be possible to put into some relatively bite-sized chunks

Joshua Warner (Mar 27 2025 at 04:36):

Looks like snapshots can't have multiple test files right now, and thus can't usefully hit code in resolve_imports

Joshua Warner (Mar 27 2025 at 04:42):

Actually it looks like snapshots don't currently cover anything after parsing (no canonicalization, for example)

Sam Mohr (Mar 27 2025 at 04:45):

Yes, that would be something helpful to figure out on its own, and working on one stage at a time (or part of a stage at a time) should be modular enough

Joshua Warner (Mar 27 2025 at 04:47):

I thinking of tackling these things next:

Add coverage of canonicalize to the snapshot handling
Allow a snapshot to define multiple files - I'm thinking using the section headers, so instead of ~~~SOURCE you have ~~~SOURCE:foo.roc and ~~~SOURCE:bar.roc sections, and so on for any of the following sections which are per-module.
Try to switch to using coordinate.zig logic in snapshot testing. Both so we don't have to duplicate that to get coverage of resolve_imports, but also to just... get coverage of coordinate.zig itself in snapshot testing.

Thoughts?

Sam Mohr (Mar 27 2025 at 06:03):

That would be great!

Last updated: Jul 26 2025 at 12:14 UTC