I managed to learn just enough nix to update the flakes and get a dev shell set up with the correct dependencies for llvm17 and zig12, I pushed it to the llvm17-zig12 branch.
Actually... maybe not. I'm close though
Should we go straight for llvm18 and zig13?
They are both out, right?
Yeah, I looked at that. The only thing I wasn't 100% on was inkwell
If we pin to tag 0.4.0 then it looks like that includes 17, but 18 is still sitting in a CI branch and I'm guessing not ready
inkwell github says it support llvm 18
I know it in the README, but I couldn't make cargo happy. It was saying it didn't exist
Ah, I see, that is master. 0.4.0 is only to llvm 17
I'm currently fumbling around trying to understand the differences between rust's llvm-sys, and why is needs a very specific binary to check the llvm version. That binary isn't provided in the newer llvmPackages_18.clangUseLLVM
Yeah, looks like they have everything update for 18, but haven't cut a release yet
Honestly, I would just pin the release to a git commit for now, update to llvm18 and avoid the need for two separate updates.
Or fork and tag in a fork. I know we've done this before, not sure the exact mechanism though.
I know the zig upgrade to 12/13 was quite painless before the recent refcount changes. I can motor through to bulk of that, but there's some low level stuff I can't fixup. One example was we are using overflow flag from a builtin (I think) and it's no longer available in the later versions.
I spent a bit of time trying to update our builtins for the roc-wasm4 stuff, and that was the point I got stuck and deferred for a laterday.
I figure, first stop is just getting our nix dependencies happy and working from there.
I don't have any experience with these kind of upgrades... just learning as I go
Can we just track master maybe, and as we get closer we can pin to a specific tag, or maybe a later release?
From the llvm-sys crate docs
llvm-sys requires a copy of
llvm-config
corresponding to the desired version of LLVM to build:llvm-config
allows it to probe what libraries need to be linked and what compiler options are required.
Binary distributions of LLVM (including the official release packages) generallydo not include a copy of
llvm-config
, making them unsuited to use for building programs withllvm-sys
. Known exceptions (that do include a copy ofllvm-config
) include:
* Official Debian/Ubuntu packages from apt.llvm.org
* Arch Linux'sllvm
package
If a suitable binary package is not available for your platform, compiling from source is usually the best option. See Compiling LLVM in this document for details.
How would we work around this if nix doesn't provide that binary?
Can we just track master maybe, and as we get closer we can pin to a specific tag, or maybe a later release?
Yeah, I think that's fine. At a minimum it's fine for local testing
How would we work around this if nix doesn't provide that binary?
Should be part of the llvm nix package I believe
nvm, I was using the wrong nix package and output. It's llvmPackages_18.libllvm.dev not llvmPackages_18.clangUseLLVM.out
Great, now Im through to actual roc code issues.
There isn't a nix package for zig 13 yet. But I'll just leave it as 12 for now as they are very similar, and it should be really easy to upgrade when there is a release.
Ah, then I guess you have to use llvm17. I think our version of llvm has to match zigs....though it may be fine if our version of llvm is newer than zigs. Cause llvm probably can load old llvm ir.
Also pushed to a new branch that is less confusing https://github.com/roc-lang/roc/tree/upgrade-llvm-zig
I forgot about that
Ok, based on my 2 minutes research... We have zig producing LLVM bitcode. Zig 12 produces LLVM 17.0.6, so I think we should keep them paired just in case.
Cause llvm probably can load old llvm ir.
does this help? https://releases.llvm.org/18.1.0/docs/ReleaseNotes.html#changes-to-the-llvm-ir
I just search all of the builtins bitcode .ll
files and none of these are used. So maybe we should be fine.
Ok, so back to a happy place. :tada:
I've got to the following point;
Now I'm up to cargo build crashing at the builtins, which is because of the obvious changes between zig versions.
@John Murray -- it's been a while, but wondering if you would be able to skim through my flake changes and let me know if there is anything I've probably broken?? https://github.com/roc-lang/roc/tree/upgrade-llvm-zig
zig 13 is on the latest nix unstable which we can easily use by updating to a more recent commit here. It's just called zig, not zig_0_13. I'd prefer to use that over the overlay because it's less complex.
Ok, sounds good.
What is the difference between a release like 24.05
and unstable
? Is it like tracking main but we can still pin to a specific commit? With the intention to switch to a release later when it's available.
Is it like tracking main but we can still pin to a specific commit?
Yes, exactly. I think the main benefit of using e.g. 24.05 is better caching and build times.
@Anton -- I had a crack at switching back to using the unstable branch. I had a few issues though. For some reason that llvm_18
package doesn't include the same folder structure. It doesn't include a /lib
, and it's missing lld
. I've just left it using zig-overlay for now.
It has multiple outputs, I would try llvm_18.dev
or llvm_18.lib
Quick update on this.
To get the compiler compiling, all we needed to do was comment out a few calls to different LLVM optimisation passes, as these are no longer exposed in the new Inkwell API. I ran the gen-llvm tests and had only 1 failure. Our examples seems to run fine. But I think we should investigate what impact the API change has, and if we need to do something differently to turn optimisations on, or if it's just on by default now.
The next issue with this change is that zig has changed it's cli arguments, and the --mod
and --deps
arguments are no longer available. We used these to pass in the zig builtins "glue" code for each of the platform's hosts.
Thanks to @Ryan Barth who helped me come up with a really simple way to workaround this issue. I had been exploring using zig packages, and build scripts and all kinds of things but thankfully these aren't necessary.
We can simply use a bash script and copy the zig src files into the platform directories. There doesn't need to be any big changes, and this also plays nicely with the refactor host PR. We can gitignore the copied files, and when the script re-runs it will overwrite these.
So I'm just working on making this script and having it called once before the tests are run. Then we should be able to run the full test suite and see if anything breaks.
Ok, so I think I have all of the fiddly and tedious bits out of the way. We've upgraded the test platforms, zig builtins etc.
I've left the CI parts to work through with Anton later, as we may want to do things differently -- but that should be really straightforward I think.
I'm only seeing 1 test failure when running gen-test-llvm, num_to_str_f32
-- I'm thinking this will be a simple fix, but haven't really looked at the IR or anything yet. (not that I know what I'm looking for :sweat_smile: )
---- gen_num::num_to_str_f32 stdout ----
thread 'gen_num::num_to_str_f32' panicked at crates/compiler/test_gen/src/helpers/llvm.rs:575:13:
assertion `left == right` failed: LLVM test failed
left: "340282350000000000000000000000000000000"
right: "340282346638528860000000000000000000000"
failures:
gen_num::num_to_str_f32
test result: FAILED. 1293 passed; 1 failed; 17 ignored; 0 measured; 0 filtered out; finished in 612.91s
The only other tests that are failing now are in roc_glue
which all look to be related to the same issue. I think its related to zig builtin linking -- maybe zig changed the way it does things or something about remove dead code maybe.
$ ./target/debug/roc build crates/glue/tests/fixtures/basic-record/app.roc
🔨 Rebuilding platform...
Undefined symbols for architecture arm64:
"_roc_getppid", referenced from:
_roc_builtins.utils.expect_failed_start_shared_file in roc_appNDMFxA.o
_roc_builtins.utils.read_env_shared_buffer in roc_appNDMFxA.o
"_roc_mmap", referenced from:
_roc_builtins.utils.expect_failed_start_shared_file in roc_appNDMFxA.o
_roc_builtins.utils.read_env_shared_buffer in roc_appNDMFxA.o
"_roc_shm_open", referenced from:
_roc_builtins.utils.expect_failed_start_shared_file in roc_appNDMFxA.o
_roc_builtins.utils.read_env_shared_buffer in roc_appNDMFxA.o
ld: symbol(s) not found for architecture arm64
crates/glue/tests/fixtures/basic-record/app: is already signed
thread 'main' panicked at crates/compiler/build/src/program.rs:1031:17:
not yet implemented: gracefully handle `ld` (or `zig` in the case of wasm with --optimize) returning exit code Some(1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Also, I would appreciate assistance from anyone with rust or LLVM knowledge to look at the changes in crates/compiler/gen_llvm/src/llvm/build.rs
and help me investigate what the changes mean.
I think this is the relevant part of the release notes.
- The legacy optimization pipeline (
PassManagerBuilder.h
) has been removed. See the new pass manager docs for how to use the new pass manager APIs.
I bet that zig changed there float printing algorim and how it rounds
So I bet the test case is incorrect now.
What a guess. Looks like your correct :smile:
$ zig build run
All your 340282346638528860000000000000000000000 are belong to us.
Run `zig build test` to run the tests.
(nix:nix-shell-env) 192-168-1-103:zig-test-13 luke$ zig build-exe src/main.zig && ./main
All your 340282350000000000000000000000000000000 are belong to us.
I work with floats a lot in ml....that just looked correct to be the minimum length float that still would parse back to the original value
Also, I would appreciate assistance from anyone with rust or LLVM knowledge to look at the changes
I would try to find other rust projects that use inkwell and that have already performed this update to the new pass manager.
Thanks to @Folkert de Vries we're down to one issue blocking a run at CI now I think -- just the "Undefined symbols for architecture arm64" thing above which I haven't looked into yet.
We got the optimisation passes back (or at least hacked in) and working for llvm, and fixed the gen_num::num_to_str_f32
test.
Ok, 100% tests are :check: passing on my mac :tada:
I might give CI a run just to see if everything passes there too.
Still need to cleanup the LLVM stuff, mostly ripping out the unused things, and also fixing up the pointer types which are giving us a warning now -- but should be pretty straight forward.
We are still using the zig and zls overlays... I couldn't get the nixpkgs thing working when I tried. I might defer changing that back to someone who knows what they are doing.
Also, there are documentation and various references to the old LLVM 16 which need to be updated
Ok, CI fails with this. Which I think means I've done something wrong in nix/builder.nix
$ nix build
warning: Git tree '/Users/luke/Documents/GitHub/roc' is dirty
warning: input 'rust-overlay' has an override for a non-existent input 'flake-utils'
error: builder for '/nix/store/gi3j8vs8qq54zy5vv0zji6h2l6cdrqsv-roc_cli-0.0.1.drv' failed with exit code 101;
last 25 log lines:
> Compiling roc_bitcode_bc v0.0.1 (/private/tmp/nix-build-roc_cli-0.0.1.drv-0/source/crates/compiler/builtins/bitcode/bc)
> Compiling packed_struct v0.10.1
> Compiling inkwell v0.4.0 (https://github.com/TheDan64/inkwell?rev=89e06af#89e06af9)
> error: No suitable version of LLVM was found system-wide or pointed
> to by LLVM_SYS_180_PREFIX.
>
nix develop
work for me, so it's just a nix setup thing.
I think I (actually Claude) found a solution.
Awesome thing about the llvm 18 upgrade. Nice perf gain:
Summary
./cc-fluxsort ran
1.20 ± 0.02 times faster than ./roc-builtinsort-new
1.36 ± 0.02 times faster than ./cc-quadsort
1.64 ± 0.02 times faster than ./roc-builtinsort-old
Not quite up to speed with raw c++ on m1 mac, but now in the same ballpark.
I think we also just run more passes now maybe? anyway that is really nice
(deleted)
Update with this PR.
It's passing all tests on macos in the nix shell for me locally. I've kicked off CI to see if it passes there too.
Our CI machines don't have ZIG 13 and LLVM 18 installed, so I expect the non-nix machines to fail.
I've cleaned up all of the LLVM issues, and updated the stray references to LLVM 16 -- there's some work here to upgrade the CI machines and build scripts. I don't think I can make progress this part without Anton.
There is one blocking issue on linux that needs further investigation. Unfortunately, I don't think I can make much progress to investigate this. It looks to be related to absolute relocations in the surgical linker. @Brendan Hansknecht and I thought we fixed it by adding an llvm globaldce
pass, but it's still an issue.
___________
The roc command:
"/home/lb-dev/Documents/Github/roc/target/release/roc --max-threads=1 /home/lb-dev/Documents/Github/roc/examples/helloWorld.roc --"
had unexpected stderr:
The surgical linker currently has issue #3609 and would fail linking your app.
Please use `--linker=legacy` to avoid the issue for now.
___________
TLDR - we're really close, one linux surgical linker issue and then just CI admin to go. :fingers_crossed:
New linux specific issue
thread 'cli_run::expects_dev_and_test' panicked at crates/cli/tests/cli_run.rs:153:13:
___________
The roc command:
"/home/lb-dev/Documents/Github/roc/target/debug/roc dev --max-threads=1 /home/lb-dev/Documents/Github/roc/crates/cli/tests/expects/expects.roc --"
had unexpected stderr:
🔨 Rebuilding platform...
thread 220007 panic: cast causes pointer to be null
/nix/store/5yk32f31879lfsnyv0yhl0af0v2dz9dz-zig-0.13.0/lib/zig/std/start.zig:497:40: 0x55555560ee46 in main (host)
return callMainWithArgs(@as(usize, @intCast(c_argc)), @as([*][*:0]u8, @ptrCast(c_argv)), envp);
^
???:?:?: 0x7ffff7df214d in ??? (libc.so.6)
Unwind information for `libc.so.6:0x7ffff7df214d` was not available, trace may be incomplete
___________
I think I found the issue -- hard to know until I get the changes across to my linux machine and run a full test suite. But basically zig is stricter now with exporting symbols I think -- and the ABI of those symbols. So you can't have a host with signature pub fn main() !void {
any longer, it needs to be pub export fn main() u8
to link correctly with the roc side.
I've got most of the CI tests passing on macos/linux now.
I'm down to the wasm repl tests and we've got a bunch of errors like
thread '<unnamed>' panicked at crates/repl_wasm/src/repl.rs:265:86:
called `Result::unwrap()` on an `Err` value: ParseError { offset: 469912, message: "Unknown relocation type 0x b" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I haven't figured out where this is coming from.
My best guess is that zig 13 behaves differently -- and the way are using zig to build the test platform, get wasi-libc, and linking things together needs to be investigated.
I've tested the cli repl manually and it looks like there is no issue. I haven't figured out how to spool up a wasm repl to test that manually.
I can repro the test failures locally in nix using bash crates/repl_test/test_wasm.sh
.
Lots of progress :tada: I'll try to install the CI dependencies today
zig 13 and llvm 18 have been installed on all machines
Run curl.exe -f -L -O -H "Authorization: token ***" https://github.com/roc-lang/llvm-package-windows/releases/download/v18.1.8/LLVM-18.1.8-win64.7z
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404
@Anton -- when you get some time, can you please load LLVM 18 for windows onto roc-lang/llvm-package-windows
.
I commented all the jobs out in CI manager (GH workflows) except the nix-linux-x64 and nix-macos-apple-silicon jobs. I just want to see if we can get this PR to pass all the tests on those first and add the others back once we know it's working correctly.
I've spent some time looking at this PR today. I merged main, and ran through the tests and pushed to see how it runs against CI.
My conclusion is that we should wait until after the build-host PR lands before trying to land this. There are issues that affect both, and fixing them over there first, and then merging will be much simpler.
can you please load LLVM 18 for windows onto
roc-lang/llvm-package-windows
Just merged main, and ran a full test using nix on macos and linux with both passing.
$ nix develop
$ cargo test --release
Ok, merged main in... let's see how CI fairs. It's looking pretty good :smiley:
We talked about making a zig package in https://roc.zulipchat.com/#narrow/channel/304902-show-and-tell/topic/Advent.20of.20Code.20platform/near/481816535 @Jasper Woudenberg
I think this is a great idea, though we can probably live with my workaround for the purpose of landing this change. Essentially, I've made a rust crate that copies the zig glue files into each of the test platforms once per test run. It works, but it would be nice to use the zig ecosystem properly and cleanup these zig test platforms (and also improve the zig platform dev experience).
Everything is :check: for me locally on my mac.
I think all that's left to land this is some code review, minor cleanup, and fixing the GH actions (making sure the zig 13, llvm deps are correct etc).
I'll check it out :)
Ok, I've got this PR :check: on both apple silicon macos, and x64 linux running in Nix. I think we are on the home stretch. :smiley:
@Brendan Hansknecht we need to merge the latest changes.
I haven't got around to investigating why the nix CI's are failing tests now. I thought they were good.
I'm expecting the only failure that should remain is CI manager / start-ubuntu-x86-64-tests / test zig, rust, wasm...
that looks to be failing with this which needs investigation.
+ zig test -target wasm32-wasi-musl -O ReleaseFast src/main.zig --test-cmd ../../../../target/release/roc_wasm_interp --test-cmd-bin
thread 'main' panicked at crates/wasm_interp/src/wasi.rs:142:26:
not yet implemented: WASI fd_fdstat_get([I32(2), I32(16777168)])
I'm still messing around with basic-cli and basic-webserver, just wanted to give you an update
Did we leave in some sort of debug printing that might call fd_fdstat_get
?
Accidentally compiling it into the zig bitcode for wasm?
Oh, also looks like we have special cased stdout
for fd_fdstat_get
, but don't have anything for stderr
. So if it isn't an accidental extra debug print, looks easy to add the functionality.
don't see any debug prints. Let me try to push a fix
Ok. Pushed a handful of changes. I'm hopeful
Benchmarks are failing in a way that doesn't make sense to me: https://github.com/roc-lang/roc/actions/runs/12288409183/job/34292167154?pr=6921
It is failing in a function that no longer exists on the branch somehow:
cannot find `glue.zig`. Check the source code in find_zig_glue_path() to show all the paths I tried.
Location: crates/compiler/build/src/link.rs:95:5
oh, it's in bench-folder-main
...that makes sense
So now :fingers_crossed:
@Anton any idea for https://github.com/roc-lang/roc/actions/runs/12288516921/job/34292459513?pr=6921
zig build ir -Drelease=true failed with:
ir
+- install generated to builtins-host.ll
+- zig build-obj builtins-host ReleaseFast native-macos failure
error: error: CacheUnavailable
looks to only happen for nix macos on CI. Note, zig cache dir is now .zig-cache
Normally I just restart the runs when this happens and it fixes itself
I've seen it multiple times in a row now, but hopefully goes away
Ah that's a pain
Yeah, seems to be hit 100% of the time on the mac builder during nix-build
Also, I'm hopeful that's soon to be the last error...but hitting relocation issues in wasm and benchmarking issue due to glue.zig being removed. I think I have a fix for both, but hard to say.
I have been poking at this for a while and am not having luck.
nix-build
on apple silicon is failing with: https://github.com/roc-lang/roc/actions/runs/12290625251/job/34298069385?pr=6921
zig build ir -Drelease=true failed with:
ir
+- install generated to builtins-host.ll
+- zig build-obj builtins-host ReleaseFast native-macos failure
error: error: CacheUnavailable
I can't repro locallly. Probably related to https://github.com/ziglang/zig/issues/20501
the benchmark job is failing due to main being unable to find glue.zig
: https://github.com/roc-lang/roc/actions/runs/12290625251/job/34298069754?pr=6921
I get that we removed the file, but there are multiple search paths in find glue and I'm pretty sure it should be finding the file.
Lastly, wasm_test_platform.wasm
is broken with the new version of zig building it. I have no idea why, but it is now emitting new types of relocations we don't support. I tried toggling PIC related flags and thought I had figured it out, but it is still broken. I can repro this one locally, but have not found a fix.
https://github.com/roc-lang/roc/actions/runs/12290625251/job/34298070201?pr=6921
Overall, feels close, but it is hard to tell if these issues will be minor flag changes or major drags to figure out.
I think this will fix the benchmarking issue. It needs to land in main such that the main reference loaded for benchmarking is correct:
https://github.com/roc-lang/roc/pull/7349
So ended up not fully fixing the benchmarking issue. Realized that the current failure is due to all benchmarks running in the context of the new nix flake. This means that all benchmarks are trying to run with zig 0.13.0. Given main is only at zig 0.11.0 this breaks. I don't think this is worth fixing now though (just think it is too much hassle for something that only matter every zig upgrade). So I think once the other two issues are fixed, we should land this PR with the benchmark failing.
@Anton when you get the chance can you look at the macos failure with the unavailable cache? I feel like you will be most likely to figure out how to workaround that.
I'm digging into gen-wasm issue now. :fingers_crossed:
I think gen wasm should be fixed now
So if we can fix the mac cache issue, I think we can land
Figured out the mac cache issue :tada:
That said, turns out that wasm isn't fully fixed. Still has 1 test failing after the previous fix.
Which test?
linking_with_dce
reverting the platform change fixes it
So :fingers_crossed: that this next ci run passes everything but benchmarking
I thought I had to change that for zig 13, maybe that was only required in another platform -- or a specific OS/Arch
I mean... I think it should be required
Actually, I guess not. Cause zig is controlling the main. There is no c main needed, right?
So I think for anything wasi, this should be fine
I guess for anything called into by js, it is probably required to do things differently
I suspect it may be something like we build wasm_linking_test_host.zig
using zig cli into an object and not a static library.
in this case, we build it into an exe
so that is probably why main() !void
is allowed
Yeah, something like that. It's building into an object file build-obj
, and then into an exe using build-exe
in two steps. I suspect where we build a static library zig doesn't like the !void
Digging into that stuff was hella confusing, the combination of different options, exporting vs not exporting, static library vs exectuable, zig 0.11 vs zig .13.
makes sense. Expects to follow c-abi cause static lib is not expected to have a main
All the wasm tests pass for me locally.
I'm going to read through all the changes and see if we've missed anything obvious
I haven't found anything blocking...
there was one TODO left where we verify the LLVM IR, should we measure the impact on performance?
we need to verify/test some of the building from source README changes, like is the llvm-18 package available on fedora etc
The sort
-> instertion
change in crates/compiler/builtins/benchmark-dec.zig
may have broken things, but I guess we can fix this in a follow up and also resolve the issue preventing the benchmark CI runner from completing.
Also, just making notes here to summarise as that PR is so massive my browser struggles to find things
We did it :tada: :smiley:
Screenshot 2024-12-13 at 15.09.49.png
there was one TODO left where we verify the LLVM IR, should we measure the impact on performance?
I think it is fine to leave in the todo for now. I added more color in the PR.
let's merge this!
Screenshot 2024-12-13 at 15.11.18.png
:tada:
Looks like it is time to brush off my static Roc branch
Don't need to ask me twice :smiley:
Thank you @Brendan Hansknecht @Anton @Sam Mohr @Folkert de Vries for all the work to land this.
For anyone interested, this change ;
We also may have broken some things or made a typo in the building from source documentation, so if you are able to do that (especially any non-nix users on different os/arch's) and can report any success/failures, that would be really helpful.
Also apologies in advance @Anton if our scripts for making a release are broken at all, I'm pretty sure they're all good, but I couldn't fully check them. I figure we can give it a test run with the prerelease for the new PI basic-cli
That static linking could be really nice for weird deployments!
I'm thankful to the people that did real work on this and made me look like I did something :sweat_smile:
Moral support counts :blush:
Was the inline expect change part of this PR? Or how is it possible to static link a roc binaries with an inline expect?
inline expects are not emitted in output object files or libaries
They are only emitted when running with roc main.roc
or roc dev main.roc
Luke Boswell said:
We also may have broken some things or made a typo in the building from source documentation, so if you are able to do that (especially any non-nix users on different os/arch's) and can report any success/failures, that would be really helpful.
Linux x86_64 checking in (Arch BTW). Successfully compiled roc with llvm 18.1.8 and zig 0.13.0 :sign_of_the_horns: . Got a mostly passing test suite. It hung with
test cli_tests::test_platform_simple_zig::module_params_multiline_pattern has been running for over 60 seconds
I also had a few
test cli_tests::test_platform_basic_cli::combine_tasks_with_record_builder ... ignored, broken when running in nix CI, TODO replace with a zig test platform
despite not using nix.
Last updated: Jul 06 2025 at 12:14 UTC