approach to optimization · contributing

Before we start on rewriting the compiler in Zig I'd like us to agree on our approach to optimization of runtime. I feel we currently have substantial differences between individuals as to where compiler runspeed falls on the priority ladder.

One important note; the consensus here seemed to be that we want to write small compiler passes. With the current compiler I believe we've taken the approach that we plan things for maximum execution speed because it's too hard to change it later. But with small passes this does not seem to apply, they should be much easier to re-write for execution speed if that pass is a serious bottleneck.

I would personally always prioritize correctness, maintainability~simplicity~debugability and error message quality over run speed. I think we should not allow any optimization that complicates the code in a pass that is nowhere close to being the bottleneck pass. Any optimization that does meet this standard should be proven to be beneficial through benchmarks before it is merged.

What do you think?

Anthony Bullard (Feb 01 2025 at 13:55):

I think you just, for each pass, try to make the fastest possible pass implementation you can that is simple to understand, easy to debug, and a joy to maintain.

So for instance, Arena allocation makes sense and is easy to reason about and provides great performance. In some cases SoAs can have the same qualities (even if they can take some adjustment to those not familiar with the pattern). But we just avoid performance optimization patterns that sacrifice those other two.

Richard Feldman (Feb 01 2025 at 14:17):

some thoughts on passes:

All else being equal, fewer passes is faster. Minimizing number of passes has been a goal in the Rust compiler, and it has made things harder to get right and to debug. I don't think we should push for a small number of passes in the rewrite. (Someday we may decide to combine passes, but if so, we should do that well after having a reliable implementation.)
More passes is slower, but it's not necessarily easier to understand. I also don't think we should push for a large number of passes in the rewrite.
Instead, I think we should be focusing on the goals of simplicity, debuggability, and easiness to get right. However many passes that results in, that's how many passes we have.

Richard Feldman (Feb 01 2025 at 14:24):

some thoughts on parallelization:

The current compiler is parallel across multiple stages of compilation, but the parallel loading code is very complex and doesn't have a noticeable payoff because all Roc code bases today are so small. So if it weren't parallel, I doubt anyone would notice.
As we discovered, we need a single-threaded build pipeline anyway for wasm (we had to introduce one for the web repl), so this is needed regardless.
As such, I don't think the new compiler should be parallel for 0.1.0 - just run everything in one thread.
That said, although there's a long history of compilers being rewritten, there's also a long history of multi-year "make the compiler parallel" efforts in compilers which were built without any thought to parallelism up front.
I think we should architect the dependencies between the various stages of the compiler in a way where parallelism is possible without another rewrite even if we aren't running things in parallel up front.
This means thinking about dependencies between modules, between stages, etc. Which steps could be run in parallel, and which couldn't? I think we should organize our data in such a way that we could put the operations into a scheduler and have them run in parallel.
I do think this will make things take a bit longer and be a bit more complex. But I think the cost-benefit is massively skewed on this; trying to retrofit for parallelism in the future will be a massive project compared to if we plan for it now, even if we aren't actually running things in parallel yet.
Concretely, this means I don't think we should eagerly combine multiple modules into one. I think we should keep them separate (and be able to process each of them independently) until close to the end of the compiler pipeline, so that in the future that work can be done in parallel.

Richard Feldman (Feb 01 2025 at 14:48):

(these are coming in a sporadic order because we just started potty training today and I'm writing them in between cleaning up accidents :joy:)

Richard Feldman (Feb 01 2025 at 15:34):

some thoughts on memory allocation and data structures:

Correctness needs to take precedence over performance. Right now we have a fast and buggy compiler. After this rewrite, we need to end up with a reliable compiler. If it's less fast than the current one, that's okay.
It's been a foundational goal to end up with a very fast compiler. In the same way that I think we should organize the code to be parallelizable in the future, I think we should also organize the code in a way where we can do certain optimizations in the future.
I think we should intern strings (like we do today) and try using indices into arenas (the way the Zig compiler does) by default, but not really try to optimize anything beyond that overall style.
There may be some things that are quick and don't really affect debugging, such as storing some info in a few bits of identifiers. As I understand it, Zig makes this really easy and straightforward, and I don't think we need to go out of our way to avoid using easy and straightforward things that make performance better.
An example of the type of performance optimization I don't think we should do right now is adjacency in IR nodes. (Where we don't store the ID of the next node inside the current node, but rather just know based on its node type that the next node will be the next node in the array.) If everything is already done using indices into arrays, that's something we can do incrementally in the future.

Brendan Hansknecht (Feb 01 2025 at 16:19):

One of the really nice things about working with mlir over the years is that I think it gets a clear priority correct. First figure out your IR (that is your fundamental interface). Then right passes that operate over IRs. If constraints change enough consider adding a new IR and changing to it. If not, it is fine to add more details to the same IR or save side band info.

Any stage can be a theoretically cutting point for writing the ir to disk and caching. On top of that mlir can auto parallelize to some extent. Though that mostly depends on passes running patterns on individual IR nodes instead of larger groups (which isn't always practical).

Brendan Hansknecht (Feb 01 2025 at 16:20):

I think we fundamentally need to focus on our IRs, our interfaces between different stacks of the compiler and. Not writing ourselves into a box that we can't easily get out of

Brendan Hansknecht (Feb 01 2025 at 16:21):

We have some ideas about where the North Star is for the compiler. Our work should be align with the idea of reaching the North Star. We want to right the cleanest thing possible that allows for easily taking steps towards optimal.

Richard Feldman (Feb 01 2025 at 16:22):

yeah, and then later if we want to merge some of them together to improve performance, we can do that incrementally and after we've already gotten things correct

Brendan Hansknecht (Feb 01 2025 at 16:22):

So it isn't strictly about simple. It is about correct now, but also easy to improve upon

Sam Mohr (Feb 01 2025 at 16:23):

Totally agreed. You weren't there for the meeting, but that's what I wrote in Rust for the post-typechecking part of the compiler: a set of IRs for each of the stages. By next Saturday, my hope is to have the IRs for all of the stages of the compiler (not just the build part like before) translated to Zig so we can collaborate on it and agree what this all should look like.

Sam Mohr (Feb 01 2025 at 16:24):

If anyone wants to work on that on their own in parallel, go for it, it's valuable to have different people try to come up with these IRs separately so we can see what the good ideas we agree on are.

Brendan Hansknecht (Feb 01 2025 at 16:24):

So in my mind the two most important pieces outside of correctness are making the IR and passes have a CPU/memory friendly core along with making splits in places that will help enabe improvements in the future (parallelism, caching, etc).

Richard Feldman (Feb 01 2025 at 16:25):

that's almost exactly what I advocated for in the meeting :smiley:

Richard Feldman (Feb 01 2025 at 16:25):

focus on the boundaries

Richard Feldman (Feb 01 2025 at 16:25):

and dependencies

Brendan Hansknecht (Feb 01 2025 at 16:27):

Yeah, I guess I just add on top CPU/memory friendly core design.

Then all the correctness, debugability simplicity,, etc without too much concern for the rest of the performance.

Brendan Hansknecht (Feb 01 2025 at 16:27):

But yeah making good bones first.

Agus Zubiaga (Feb 02 2025 at 00:30):

Richard Feldman said:

There may be some things that are quick and don't really affect debugging, such as storing some info in a few bits of identifiers.

Yeah, I think this can even help debugging because you can gather more information from just the value without having to look it up in a side table. For Purity Inference, I reserved one bit of Symbols as a !-prefixed flag and it greatly simplified the implementation overall.

Last updated: Jul 26 2025 at 12:14 UTC

Stream: contributing

Topic: approach to optimization

Anton (Feb 01 2025 at 13:46):

Anthony Bullard (Feb 01 2025 at 13:55):

Richard Feldman (Feb 01 2025 at 14:17):

Richard Feldman (Feb 01 2025 at 14:24):

Richard Feldman (Feb 01 2025 at 14:48):

Richard Feldman (Feb 01 2025 at 15:34):

Brendan Hansknecht (Feb 01 2025 at 16:19):

Brendan Hansknecht (Feb 01 2025 at 16:20):

Brendan Hansknecht (Feb 01 2025 at 16:21):

Richard Feldman (Feb 01 2025 at 16:22):

Brendan Hansknecht (Feb 01 2025 at 16:22):

Sam Mohr (Feb 01 2025 at 16:23):

Sam Mohr (Feb 01 2025 at 16:24):

Brendan Hansknecht (Feb 01 2025 at 16:24):

Richard Feldman (Feb 01 2025 at 16:25):

Richard Feldman (Feb 01 2025 at 16:25):

Richard Feldman (Feb 01 2025 at 16:25):

Brendan Hansknecht (Feb 01 2025 at 16:27):

Brendan Hansknecht (Feb 01 2025 at 16:27):

Agus Zubiaga (Feb 02 2025 at 00:30):