Are all of those IRs separate data structures? Many (especially in the build phase) Feel like they should be the same IR just with multiple passes running on it.
Also, I feel like adding in reference counting is likely too late in the pipeline. The interpreter will need to run reference counting. So likely we will want refcounting right after check phase and to the interpreter run on the refcounted IR, but maybe I am missing something with the dependencies here.
I also definitely think we should think more about the build phase pass ordering. What would lead to the simplest implementation.
I think the interpreter would do its own runtime reference counting
separate system
Sure, but it still needs to know when to check reference counts, right? Like it needs to know when calling a function that the function requires incrementing the reference count of the 3rd arg passed in.
Though I guess that probably can still be figured out at runtime. Hmm.
That stage has a big question mark around it. Sam mentioned it's something he was really unsure about what it does or where it should go.
I see it as a few different pieces:
also, I'm definitely keen to help port and work on a lot of this backend stuff and our new wiring to llvm. I feel like I have a solid bar on many things after specialization (Though I have not actually looked into the algorithm for drop specialization).
The nice thing with most of theses passes is that they are optional. So In the list above, you would start by porting just 1 and 5. Later 2, 3, and 4 can be added in.
Brendan Hansknecht said:
Though I guess that probably can still be figured out at runtime. Hmm.
yeah, for example Python does automatic reference counting fully at runtime
As a note, I realized I missed reset reuse which is also refcounting related. Lots of optimizations passes to eventually add but that I don't think are fundamentally needed.
Oh, as another note talking of irs. Sharing irs between multiple passes means you can turn off passes or implement new passes without changing API boundaries significantly.
You're right. However, it's easier to ensure correctness when we have the shape of the IRs constrained to what each phase can actually use and actually produce. Less of a need to say "I shouldn't have a function, save a compiler problem to a big list."
It's prioritizing correctness over performance
It really won't be that hard to change later
I think adding 10 to 20 clones of the ir for all the phases is still a mistake either way. And I don't think it helps correctness in many cases
As a first note, any pass that is optional, if it can avoid changing to a new IR, that really helps test correctness. It makes it easy to toggle any off, which is huge for debugging correctness.
I'm pretty sure @Joshua Warner was running into multiple "this should have been desugared" errors while getting fuzzing up to speed
We can assume testing will catch these things, but I'd always rather trust types than tests
On top of that some passes like lifting lambdas really should not change the ir. It will be easier to move a few nodes around then make an entire new ir.
And Ayaz keeps iterating that IR translation costs basically nothing compared to algorithmic costs like specialization
Sure, but even ignoring the compile time cost, I think it hurts correctness for many phases (especially optimization phases that are optional)
I don't understand how correctness could be hurt
Cause you lose the ability to turn off a pass. It is now always required due to creating a new IR.
I don't think any of our passes besides maybe refcounting and maaaaybeee lower_ir could be turned off
reset reuse, drop specialization, borrow inference, morphic (if we ever re-add it), any other future optimization
Oh, those, sure
Those aren't in the preliminary design
The 9 in the preliminary design are all required
Everything past that could get away with reuse
Ah, ok, then quite possibly it is fine.
What are the difference in the middle IRs? like lift functions?
The 9 I'm thinking about are:
I'll edit that message to talk about structural differences
yes, please do, I would like to understand
done
Very terse descriptions, let me know what needs explaining
The only one we might consider is solve functions, but that's playing with fire a little bit
resolve imports (reused for typechecking) (any "delayed import" values are resolved to the actual values)
Why is this an entire new IR and not just making a new dictionary of resolved imports or editting all import nodes and then banning the old type of import node? What does a new IR help with?
just one example
As a relative percentage, like 90%+ of the IR is shared
yeah, so why make a new IR instead of just making a verifier that the old node type is gone/banned. That is exceptionally easy to fuzz/test correct
Yeah, some passes seem to just return subsets of the previous IRs instead of changing the fundamental shape of them
:shrug: it's a question of trust. Not that big a deal either way
For context, I am not against spinning up new IRs, I have worked with mlir professionally, and it is made to make making new IRs super easy. In practice, making too many IRs always sucks. Verifiers and progressively adding more details generally works a lot better if IRs are similar.
it's a question of trust
Verifiers can remove the need for trust.
The feedback on verifiers is far from instantaneous tho
e.g. you have to find a failing input first, which is not always obvious
If you have different IR types, the compiler will yell at you immediately
I agree that it gives some compile time guarantees around typing which can be nice.
I still am not sold it is worth 9 IRs, many which are exceptionally similar. As I mentioned above, from working with MLIR, too many dialects ends up being more painful to work with. I think there is a balance where some passes should share, but others should be split. Depends on the amount of difference between IRs.
Anyway, not a hill I will die on, I think 9 execeptionally similar IRs will just make a lot of extra work and maintenance burden for us with little gains.
I wonder if there are ways we can catch these at compile time without the entire new IR. Not sure.
Maybe storing the parts that are different out of band?
Not sure if that helps correctness, though, we would have to look at the specifics
Ah yeah, that would work. Then cause the old out of ban thing is no longer around, you are guaranteed to have transitioned.
Yeah, exactly
Though I guess you could still have a stale reference to the old out of ban data potentially.
There could be debug-mode checks for that I think
Yeah, just like verifiers on the same IR for classes of nodes being gone.
Also, one simple example of maintenance pain. If you add a new IR node, you now how to add it to an wire it through 9 dialects instead of 3. Any larger refactor also becomes multiple times bigger if you have more IRs.
You lose the ability to ever reorder passes without significant hassle (may not matter for roc if we truly know the pipeline, but has been a big pain in practices for work I have done before)
If passes have fundamentally different requirements and guarantees on input and output (e.g. it expects a specific node to have been desugared), then it's already not re-orderable
Some passes are not re-orderable. That is true. But that is not true of all passes
And that is where verifiers come in.
I would contend that it is true of all passes that should have separate IRs
(and visa-versa)
If you add a new IR node, you now how to add it to an wire it through 9 dialects instead of 3.
This "wire things around" kind of update is the sort of thing I expect LLM-driven tooling to get much better at very quickly.
I think that is a bad metric for what is a maintainable and fun to work on compiler.
Fair :shrug:
@Sam Mohr is specialize where we make N copies of each function?
If so, we may want to order passes as:
Note: for 2, I think at a minimum we can do the refcount insertion and borrow inference. May not be able to do other optimization like reset reuse.
How can we do refcounting for a function that uses either a list or an int? We'd have to optionally do it
If you're doing this level of reordering, we need a expert to agree on it, aka @Ayaz Hafiz . I'm pretty sure we need to make everything first order before we start lowering
Refcounting is the same. Int refcounting will just turn to a noop when finalized. So we would insert refcounts for everything concrete that needs refcounting and anything dynamic that might need refcounting.
Why not just do refcounting once at the end?
I see your goal
Which is to specialize later to avoid doing the same work we need to do later on more copies
Just trying to move as much stuff as possible before specialization to avoid repeating it n times
Yeah, exactly
I'm just assuming @Ayaz Hafiz would have suggested that was an option if we could
But maybe I'm putting words in his mouth
Also, I assume specializing a dense ir will be faster than specializing a recursive ir. Thus moving lower ir before specialization. I think for the most part this shouldn't hurt complexity/correctness much. Just a different order.
But fundamentally the same work
Actually, I don't know how lowering to statements would work with child functions not having been lifted yet
I'm not sure I follow isn't lifting functions much earlier in the pipeline?
Can we spin these off into separate discussion threads.. and keep the contributor coord meeting thread scoped to planning/scheduling discussion. It's going to be hard to find things later.
It's after type specialization
I'm on mobile, could someone else do it?
@Sam Mohr can you explain the difference betteen type and function specialization? I assumed function specialization is what leads to N copies of each function, but maybe that is wrong?
No, it's different
Type specialization takes a copy of every top level value and every top level function and makes a concretely-typed copy for each specific set of types it should have at runtime
Function specialization is https://github.com/roc-lang/rfcs/blob/ayaz/compile-with-lambda-sets/0102-compiling-lambda-sets.md#function_specialize
oh....
Which is a way to convert higher order functions to functions we call using a tag union to represent different possible arguments
It's kinda like, if not exactly, defunctionalization
Confusing name, I agree
and type specialization has to come before all of these:
- lift functions (lambdas should no longer exist in the code)
- solve functions (all higher-order function arguments need to be generic)
- specialize functions (no more higher-order functions anywhere)
We could maybe call it defunctionalize to make the name distinct
It doesn't have to, but it's gonna be pretty hard to do those correctly otherwise I think
ack
That section of the compiler, even if we combine those phases, will be hard to reorder
Yeah, then suggested order sounds fine
Though it'd be nice to avoid doing more work for the phases after them because of the type specialization
Yeah, would be nice if we could share more. Cause, for example, if you have 10 specializations of the same function that uses lists, they all refcount the same way.
But unless that is easy to do, I 100% agree with correctness first.
Brendan Hansknecht said:
Also, one simple example of maintenance pain. If you add a new IR node, you now how to add it to an wire it through 9 dialects instead of 3. Any larger refactor also becomes multiple times bigger if you have more IRs.
Not sure if we'll come to an agreement, but it feels like we're not on the same page, so I'll ask: if we can't really reorder from the suggested order, then are you still against the IRs being different?
One big benefit of different IRs is that if we make sure that calculating stage N takes the stage N+1 IRs for the dep modules, then we have a type guarantee that the deps must have been compiled first
I think it's my lizard brain speaking, but I see fuzzing as a high probability test for correctness, and types as true guarantees
I always prefer types where possible, but I'm definitely always a high-level thinker for modeling problems
my heuristic for this would be:
all optimizations over the so called mono IR, refcounting etc are probably same IR
cost of creating a new IR is a bad reason not to create an IR IMO
the IR creation being redundant and obfuscating the actual transformation is a good reason not to create an IR, which is the case for refcounting and optimizations
Yeah, that sounds pretty reasonable overall as a metric. It's up to the stage author to decide if invariants are simple enough to reuse the same IR. In my mind, many of the "its the same IR minus one specific thing" are pretty simple invariants, but I won't be writing those passes, so whoever writes the passes can decide.
@Ayaz Hafiz do you think any work can be hoisted above type specialization to avoid repeating N times? Just curious if there is anything that is flexible that we should try to move now instead of later when the pipeline is more ironed down.
what is the worry about what might be repeated N times?
It is N times faster to do work once than N times. So anything that is reasonable to move before specialization probably should be.
i see
i probably wouldn't think about this too deeply right now
I don't want to think too deeply about it, but if there is anything we think would be easy to move now without a concern for correctness, I think we should move it. I think later it will be much harder to fix these kinds of issues.
if there's something obvious that can be lifted it to an earlier pass it will be obviously and be easy to lift i think
for example function lifting can kind of be done whenever after typechecking
but that is one of those kind of obvious/trivial things
i think its worth embracing that all of this will have to be rewritten at the limit anyway
if you want the fastest compiler there are some parts of this architecture that will be blockers
Ayaz Hafiz said:
i think its worth embracing that all of this will have to be rewritten at the limit anyway
I don't want to embrace that
I think that will lead us to an unnecessarily and noticeably slow compiler for years and years
I think that's what most compilers do, and I think we can do better without going overboard like we did before
I do think it's fine to have different IRs for each pass (when it would be valuble)
but I think eventually once things have settled down we might explore combining some of them as an incremental change
eventually as in "someday" not like right after 0.1.0 or anything :sweat_smile:
and I think that can be done more incrementally than rewriting the whole compiler
i don't know. i don't think you can have both worlds. the current implementation is closer to what the limit needs to look like. but there's a reasoning a rewrite is being discussed.
I'm saying I think we should prioritize correctness, simplicity, maintainability, and debugabillity for this rewrite, but I think "embrace that it's all gonna be rewritten and don't worry about it" goes too far
im not saying this approach will be too slow or anything like that. it will be totally fine for the vast majority of programs because limiting factors for any non-trivial program are going to be algorithmic, not number of passes or whatever.
well, i'm just saying that the fastest possible approach is not this approach
for sure!
I just think we shouldn't be all-or-nothing about it haha
before the goal was "fastest possible" and that got us into trouble
I don't think we should aim for "fastest possible" for this rewrite
but I also don't think we should aim for "only algorithmic perf matters" because I don't think that will turn out to be correct
I mean maybe I'll be wrong!
i think it's true. the only cases i can think of in the current implementation where it was a problem was because of algorithms. but maybe i'm missing cases you have in mind?
I agree with that, but "the current thing that's heavily optimized for perf only has algorithmic trouble" does not imply "if we have one that's not optimized at all for perf it will also only have algorithmic trouble"
that would imply that only algorithmic performance optimizations matter to compilers as a category, which I would be extremely skeptical of haha
i agree
i think the difference is in limiting factors. it's far more likely that algorithmic passes are the limiting factor because Roc is doing a lot work to lower a high level language to a low level one, than just rewriting stuff to a different form. But at the limit you have to fuse both, and this approach doesn't really work for that, I think.
but also like i think they should be two different planes
the concerns about the algorithmic passes should be entirely disjoint from the implementation of how data is stored
I guess a really big question mark here is whether the "dev backend can be an interpreter and it's fine" hypothesis proves correct in practice
that way you have a nice interface boundary between the two and can prefer to optimize one or the other
because if that works out, then in dev builds we stop after type-checking
and only --optimize
builds do any of this expensive stuff
if that kind of boundary can be done correctly, then you don't have to worry about getting both right at the same time
which get run way less frequently, and the downside of a slower feedback loop is much less (unless you're doing performance optimization, in which case it's painful if they're slow)
I think it will work out fine, taking e.g python or ocaml as prior
OCaml is interpreted? :astonished:
both of which don't have sophisticated interpreters (or at least didn't until the past few years)
ocaml has many modes
TIL!
yeah the specific scenario I'm concerned about (because I don't have any idea how likely it is to come up in practice) is something like:
--optimize
it plays great but my feedback loop is slow--optimize
to try anything outBrendan pointed out that our current dev backend is so unoptimized that it might not actually be faster enough than an interpreter to be noticeable
and also, we can potentially JIT up the interpreter if :point_up: becomes a problem in practice
which also would still skip all the steps after type-checking for dev builds
i think this is the kind of thing where it's probably just worth punting until it comes up
if the implementation is simple, then it's not hard to refactor to make this work or introduce complexity for specific usage patterns
and maybe it never comes up
"this" being the IRs after type-checking?
well just the compiler in general
just a note I remembered: today, we have automatic dead code elimination of unused constants (e.g. I use a package but I only use some of its exposed operations, and nothing I actually use involves certain top-level constants the package defines) because we compile them to thunks which automatically get DCE'd
but in the new compiler where we're instead evaluating constants at compile-time, we need to make sure that we don't include constants that have been compiled down to literals unless they're actually referenced somewhere!
(same goes for string literals etc in general)
Depends on what section granularity we use
You can also put each constant in their own section just like functions.
true, although it'll improve build speed if we just only flag things as used when we actually use them, and then only include them if so
(compared to adding everything and then having the linker remove them again in a later pass)
Same for functions preferably
I think we could use specialization to do a liveness check even on functions that are already concrete.
we've always done it for functions
Oh, never realized
I think we can have type specialization handle this
Last updated: Jul 06 2025 at 12:14 UTC