I'd like to talk about our Zig 16 upgrade plans... specifically this PR https://github.com/roc-lang/roc/pull/9341
I've been working on that, and have rebased it a few times against main. I'd take a pluck and estimate it's more than 90% done, as at the last time we updated it. I've had it passing zig build minici locally on all three Windows, Macos, and Linux multiple times but never got a full CI run green.
Before today I was aiming for after the MIR changes landed (which they have now) ... but I'm now starting to think we should hold off on zig 16 upgrade for a while longer.
There is at least a few critical PR's in the pipeline I could see an argument to land first.
Today I started planning the upgrade for roc-wasm4 platform and realised we never landed the WASM Surgical Linking PR. The surgical linker is necessary so roc build can produce a wasm module (without adding another linker wasm-ld or some other linking approach). I can't remember why I paused work on that but I think it was blocked for some reason.
Also we are close to landing the LLVM backend.
So I feel like if we wait a few more weeks (rough estimate) we could be in a position where we have more of the platforms (including wasm targets) and the associated test apps and examples in a working state to help identify regressions in Roc.
I guess the alternate approach, is to pause Work in Progress and prioritise or focus on landing the Zig 16 upgrade. The other PR's in flight are smaller and so they'll be easier to upgrade.
Anyone have any thoughts or advice on how to navigate this?
So I feel like if we wait a few more weeks (rough estimate) we could be in a position where we have more of the platforms (including wasm targets) and the associated test apps and examples in a working state to help identify regressions in Roc.
We had a lot of memory bugs when upgrading last time, so waiting seems like the best approach.
I'd actually kinda prefer to just land it sooner
because the more stale it gets, the longer it'll take to land
and right now is kind of a nice time to have regressions on the new compiler bc nobody is relying on it yet :smile:
now that LLVM has landed, I'm fine pausing landing landing things until Zig 16 is in
especially because my Codex rate limits reset tomorrow :joy:
Ok, lets hold off merging anything into main for a little. Ill try and have zig 16 ready this weekend.
I've made good progress so far, pushed a few commits. I've updated Linux and that was passing minici, now switched across to Windows and working through issues there. Back to zig build test passing... now onto the full zig build minici before I switch to macos and see if there are any issues there too.
@Anton or @Richard Feldman would you mind looking at the CI issues? I can look at them tomorrow when I get up, but I'm guessing it's a few minor git workflow configuration things
I will :)
I have fixes locally for the typos and tracy issues, I spent a bunch of time looking at the best way to fix all the llvm: FAIL 'LinkFailed', claude is working on implementing our latest plan now :)
I will continue tomorrow.
Are you able to push any fixes you have? I can continue with it soon.
Anton said:
I have fixes locally for the typos and tracy issues, I spent a bunch of time looking at the best way to fix all the
llvm: FAIL 'LinkFailed', claude is working on implementing our latest plan now :)
As a second line of effort here -- I'm going to try rebasing my surgical linker branch onto zig-16. I found and fixed some flaky LLVM build issues there which I think may be related to this. So I assume it will port across and may help with zig 16 CI.
![]()
This looks relevant... switching to our embedded LLD instead of using cc
@Richard Feldman this looks like a root cause for our very slow LLVM tests too
oh yeh we shouldn't be doing that haha
we should be using lld on all targets for this
not just macOS
Does anyone run Nix here and can help with the Zig 16 branch.... need to bump the flake and re-generate the lock
The zig-16 branch is fully migrated to Zig 0.16 everywhere except the nix CI leg. build.zig:2539/:2554 use the 0.16-only std.Io.Dir API, but src/flake.nix:47 still pins zig = pkgs.zig_0_15, so the nix shell runs the build with 0.15 → compile error.
Two parts:
1. src/flake.nix:47: pkgs.zig_0_15 → pkgs.zig_0_16
2. Regenerate src/flake.lock: the locked nixpkgs is from 2025-10-15, which predates the 0.16.0 release and has no zig_0_16. Bump it:
cd src && nix flake update nixpkgs
2. then commit both src/flake.nix and src/flake.lock.
I spoke with Richard about making LLVM backend in eval tests opt-in and switching that on for only one CI workflow. I've added that in my surgical linker branch (which I've rebased onto zig-16) and it significantly speeds up CI.
@Luke Boswell https://github.com/roc-lang/roc/pull/9494 hope that helps!
Thank you @Niclas Ahden
Thank YOU for pushing this forward! :bow: :smiley:
Luke Boswell said:
I spoke with Richard about making LLVM backend in eval tests opt-in and switching that on for only one CI workflow. I've added that in my surgical linker branch (which I've rebased onto zig-16) and it significantly speeds up CI.
Do we still have enough other tests that run with llvm on all operating systems?
Doing something similar to PR#9473 may also speed up those eval tests a lot
Anton said:
Do we still have enough other tests that run with llvm on all operating systems?
I think running the LLVM eval tests on a single machine is acceptable for this specific narrow case.
The eval tests run the common compiler pipeline down to LIR, and the non-LLVM eval backends still run on the OS matrix. That means OS-specific bugs during LIR lowering will be caught there.
The LLVM-specific eval tests are focussed on testing LIR -> LLVM bitcode. In MonoLlvmCodeGen.zig, there isn't much OS-specific lowering. So I think running the whole LLVM eval suite on every OS would be overkill and it's also very expensive.
yeah if there's any difference it's probably an LLVM bug :sweat_smile:
but we could, just to be extra safe, start doing like "after every main merge, kick this off and tell us if it breaks" more thorough runs
Interesting idea you gave me... what if for individual PR's you opt-ed into the specific CI or tests that are relevant for your PR. Then after the PR merges we kick off larger run all the things (which are much slower and so may actually be running against multiple PR's that landed since the last run).
I'm not sure if something like that is even possible with GH actions... just a thought though
yeah I don't think it is unfortunately
I think it is possible but it requires a non-trivial amount of custom scaffolding.
In the future I would also like to setup some CI stats tracking so that we can identify which workflows fail very rarely and just run them once a day.
I'm gonna merge this tonight even though we still have a CI failure on Windows (might be a flake? I'm not sure) and also one on Nix (should be a quick fix for anyone with a nix machine - just need to run a command to regenerate some hashes).
there are a bunch of PRs stacked up that I want to start landing, and I don't think it's worth continuing to block those on 0.16 when it's working aside from those two issues! :smile:
ok this is merged now! A few ci steps will fail for now, but that's an ok tradeoff to accept I think :smile:
Last updated: Jun 16 2026 at 16:19 UTC