Hey folks, I just opened a PR to remove an optimisation from the Wasm dev backend but wanted to get feedback on the idea here too.
While Brendan was working on the Zig 0.11 PR, the last issue was a tricky one in Wasm, and I commented that it seemed to be in the most complex part of the Wasm backend. I was actually a bit afraid to touch it.
But that code is mainly concerned with a size optimisation, which we could do without.
So the PR is to delete the optimisation and all of the book-keeping code we need for it.
The diff is +320 -910
which is a nice simplification.
In theory this makes the generated code more bloated. But it deletes complex code that I'm afraid of. And I have not actually been able to measure the bloat!
Disk space for all test_gen binaries with optimisation: 759,592 bytes
Disk space for all test_gen binaries without optimisation: 759,688 bytes
So that's just 96 bytes.
This is weirdly small though.
There's an example in the PR body for 1 + 2 + 3
where the optimisation removes 10 instructions, at 2 bytes each, which is 20 bytes. And that's just 1 test out of 1250.
Some technical background:
Each Wasm function can store local variables in one of 3 places:
We need to choose one of these 3 places to store each Symbol
in the monomorphic IR.
Currently on the main
branch, we use all 3 in various situations.
The PR removes (1) as an option.
It turns out to be more complicated than you'd expect to track symbols in the stack machine. Wasm has structured control flow with nested block scopes. So there are instructions like loop
rather than just "jump" or "goto". You can't access a value in a higher scope.
After this PR we no longer track what's stored in the stack machine.
Instead we do a much more direct translation of the mono IR.
We translate each let
in the IR to:
let
the value stack machine is empty.The drawback is that we are not taking advantage of the stack machine, so we get inefficient code.
The advantage is that we are more directly translating the mono IR so our Rust code is much simpler and easier to maintain.
There doesn't seem to be much speed advantage, at least in the test_gen
tests.
The total runtime for cargo test-gen-wasm
goes from 7.428s to 7.412s (based on hyperfine
with 10 runs).
So the main argument is simplicity/maintainability.
Frankly, I don't want to be the only one who is able to make changes in this code. I've had less time for Roc in the past year and I'm not sure how my availability is going to fluctuate in the future. So getting rid of something very complex that we don't really need, seems like a good idea.
If we decide that we need the size optimisation, it would be much simpler to implement it after generating the instructions for a function. We could probably combine it into an existing pass. I think it would be a lot less code and easier to understand/maintain/debug.
Sounds like a wise decision!
Agreed! :smiley:
Thanks @Brian Carroll, I think that's a really good call!
Last updated: Jul 06 2025 at 12:14 UTC