Folkert introduced me to the AArch backend work. Last night, after about 5 hours of staring at Assembly in gdb
, I found the two-character change I needed to fix 114 tests :sweat_smile:
https://github.com/roc-lang/roc/pull/5886
It was actually a 3-lines change originally, but today Folkert mentioned I could simplify it
Nice work finding that.
Down to 36 test failures. We're getting close :smile:
https://github.com/roc-lang/roc/pull/5896
So excited to use this. Will be a nice speedup for silicon macs! :smiley:
a good number of them should be tests that should panic. I'm working on the setjmp/longjmp logic that we need for that
then it's the long tail of subtle segfaults, probably
we got it down to
Summary [ 506.665s] 901 tests run: 870 passed (2 slow), 31 failed, 7 skipped
I'm down to one failing test on the pi. it seems related to some other tests that still fail on the M2. I've nailed it down to
test gen_list::list_map2_different_lengths has been running for over 60 seconds
==56630==
==56630== Process terminating with default action of signal 11 (SIGSEGV)
==56630== Bad permissions for mapped region at address 0x4D17000
==56630== at 0x7B5FE7C: str.RocStr.reallocateFresh (in /tmp/.tmpqbDNId/app.so.1.0)
it's very inconsistent
==59022== Invalid write of size 8
==59022== at 0x7B48494: #UserApp_main_2777513159257922988 (in /tmp/.tmpBdWYSq/app.so.1.0)
==59022== Address 0x30 is not stack'd, malloc'd or (recently) free'd
==59022==
==59022==
==59022== Process terminating with default action of signal 11 (SIGSEGV)
==59022== Access not within mapped region at address 0x30
==59022== at 0x7B48494: #UserApp_main_2777513159257922988 (in /tmp/.tmpBdWYSq/app.so.1.0)
allright, well it's late so I should stop but I got this down to
// Run both with and without lazy literal optimization.
{
assert_evals_to!($src, $expected, $ty, $transform, $leak, false);
}
{
assert_evals_to!($src, $expected, $ty, $transform, $leak, true);
}
the top one works, the bottom one does not. I suspect it's (big?) string constants? @Brendan Hansknecht any ideas here?
also @Luke Boswell this just looks a lot like the windows issues too
Nice work, I can have a look later and see if I can find anything.
so what is happening here is that we write the end of a string "ggg" into memory, then write its total length 28 into memory, and then copy it to some other place. The problem is that the x16 register contains NULL, so moving data there obviously fails.
0x7ff78a8468 <#UserApp_main_2777513159257922988+408> mov x17, #0x67 // #103
0x7ff78a846c <#UserApp_main_2777513159257922988+412> sturb w17, [x8, #25]
0x7ff78a8470 <#UserApp_main_2777513159257922988+416> mov x17, #0x67 // #103
0x7ff78a8474 <#UserApp_main_2777513159257922988+420> sturb w17, [x8, #26]
0x7ff78a8478 <#UserApp_main_2777513159257922988+424> mov x17, #0x67 // #103
0x7ff78a847c <#UserApp_main_2777513159257922988+428> sturb w17, [x8, #27]
0x7ff78a8480 <#UserApp_main_2777513159257922988+432> stur x8, [x29, #-104]
0x7ff78a8484 <#UserApp_main_2777513159257922988+436> mov x17, #0x1c // #28
0x7ff78a8488 <#UserApp_main_2777513159257922988+440> stur x17, [x29, #-96]
0x7ff78a848c <#UserApp_main_2777513159257922988+444> stur x17, [x29, #-88]
0x7ff78a8490 <#UserApp_main_2777513159257922988+448> ldur x8, [x29, #-104]
>0x7ff78a8494 <#UserApp_main_2777513159257922988+452> stur x8, [x16, #48]
0x7ff78a8498 <#UserApp_main_2777513159257922988+456> ldur x8, [x29, #-96]
0x7ff78a849c <#UserApp_main_2777513159257922988+460> stur x8, [x16, #56]
0x7ff78a84a0 <#UserApp_main_2777513159257922988+464> ldur x8, [x29, #-88]
0x7ff78a84a4 <#UserApp_main_2777513159257922988+468> stur x8, [x16, #64]
0x7ff78a84a8 <#UserApp_main_2777513159257922988+472> stur x16, [x29, #-128]
this could be a general register allocation bug, or maybe it's something specific to the literals
got it (2 bugs actuallly), so now
------------
Summary [ 816.378s] 902 tests run: 902 passed (1 slow), 7 skipped
it works on my machine!
wooooo!!!
does that mean it can be activated for the repl? :heart_eyes:
(on aarch64)
almost? this fails on CI
------------
Summary [ 142.098s] 314 tests run: 313 passed, 1 failed, 595 skipped
SIGABRT [ 1.484s] test_gen::test_gen gen_tags::recursive_tag_id_in_allocation_eq
error: test run failed
but it works on the pi so needs some debugging on an M1/M2 still I think
nvm it does trigger these valgrind warnings
test gen_tags::recursive_tag_id_in_allocation_eq has been running for over 60 seconds
==73198== Thread 2 gen_tags::recur:
==73198== Invalid write of size 8
==73198== at 0x7747ED0: #UserApp_x_8352987475006511248 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198== Address 0x69cb0a0 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x7748278: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198== Address 0x69cb0a0 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x774827C: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198== Address 0x69d63a0 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x77484A4: #UserApp_#help1_Dec_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198== Address 0x69d63a0 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid write of size 8
==73198== at 0x77481F0: #UserApp_y_8352987475006511248 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198== Address 0x6a10b60 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x7748268: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198== Address 0x505b090 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x774826C: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198== Address 0x50c8170 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198== at 0x77484D4: #UserApp_#help1_Dec_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198== Address 0x50c8170 is 0 bytes after a block of size 16 alloc'd
==73198== at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198== by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==
fixed it (turns out it was an issue on x86 too). I expect https://github.com/roc-lang/roc/pull/5839 to pass now
also I can run the repl on the pi no problem, which based on the source code will use the dev backend already
#[cfg(not(target_os = "linux"))]
let (lib, main_fn_name, subs, layout_interner) =
mono_module_to_dylib_llvm(&arena, target, loaded, opt_level)
.expect("we produce a valid Dylib");
#[cfg(target_os = "linux")]
let (lib, main_fn_name, subs, layout_interner) =
mono_module_to_dylib_asm(&arena, target, loaded, opt_level)
.expect("we produce a valid Dylib");
so someone on an M1/M2 can just try that I guess
Yeah, works on m1
The REPL running on AArch64 dev backend (M1)! :tada:
CleanShot-2023-10-09-at-14.44.472x.png
ci passed on the PR now
With my M2 mac on beed1e3d6ea093f3cbc6281b04c655449ff88109
I get the following for cargo nextest-gen-dev --no-fail-fast
Summary [ 574.116s] 902 tests run: 900 passed (2 slow), 2 failed, 7 skipped
FAIL [ 0.164s] test_gen::test_gen gen_list::list_ends_with_empty
FAIL [ 0.163s] test_gen::test_gen gen_list::list_starts_with_empty
So close !! :octopus:
Actually... this might be a nix issue on my part, I'll re-run Looks like we still have a couple to go
a hunch, in crates/compiler/gen_dev/src/generic64/mod.rs
can you uncomment the and here
fn build_not(&mut self, dst: &Symbol, src: &Symbol, arg_layout: &InLayout<'a>) {
match self.interner().get_repr(*arg_layout) {
LayoutRepr::BOOL => {
let dst_reg = self.storage_manager.claim_general_reg(&mut self.buf, dst);
let src_reg = self.storage_manager.load_to_general_reg(&mut self.buf, src);
ASM::mov_reg64_imm64(&mut self.buf, dst_reg, 1);
ASM::xor_reg64_reg64_reg64(&mut self.buf, src_reg, src_reg, dst_reg);
// we may need to mask out other bits in the end? but a boolean should be 0 or 1.
// if that invariant is upheld, this mask should not be required
// ASM::and_reg64_reg64_reg64(&mut self.buf, src_reg, src_reg, dst_reg);
ASM::mov_reg64_reg64(&mut self.buf, dst_reg, src_reg);
}
x => todo!("Not: layout, {:?}", x),
}
}
valgrind is ok with it at least so I'm not sure what else to do
admin@raspberrypi ~/roc (aarch-records)> valgrind --track-origins=yes target/debug/deps/test_gen-f2c946b9d248f5c7 list_ends_with_empty
==89204== Memcheck, a memory error detector
==89204== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==89204== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==89204== Command: target/debug/deps/test_gen-f2c946b9d248f5c7 list_ends_with_empty
==89204==
running 1 test
test gen_list::list_ends_with_empty has been running for over 60 seconds
test gen_list::list_ends_with_empty ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 785 filtered out; finished in 342.49s
==89204==
==89204== HEAP SUMMARY:
==89204== in use at exit: 3,746,224 bytes in 17,229 blocks
==89204== total heap usage: 437,310 allocs, 420,081 frees, 264,614,202 bytes allocated
==89204==
==89204== LEAK SUMMARY:
==89204== definitely lost: 3,271,420 bytes in 15,096 blocks
==89204== indirectly lost: 182,040 bytes in 21 blocks
==89204== possibly lost: 119,592 bytes in 7 blocks
==89204== still reachable: 173,172 bytes in 2,105 blocks
==89204== suppressed: 0 bytes in 0 blocks
==89204== Rerun with --leak-check=full to see details of leaked memory
==89204==
==89204== For lists of detected and suppressed errors, rerun with: -s
==89204== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
but it may be that we need to task the boolean, which uncommenting that line should do. (I'm assuming the result of the test is an empty list?)
I can look at this again later this evening around
I'm pretty busy today so I probably won't have time to debug this properly but at least in my machine this fails at link time:
--- STDERR: test_gen::test_gen gen_list::list_ends_with_empty ---
ld: invalid r_symbolnum=32 in '/private/tmp/nix-shell.yttg98/.tmpnHbSn5/app.o'
a thought: there might be some sort of "enable more debug info" environment variable for the linker that could help here
Will try that. I'll post an objdump too, there's something weird with a reloc.
https://gist.github.com/agu-z/554090ab7c5ed5c6e83bd8b170af7e74#file-list_ends_with_empty-s-L582-L583
That looks interesting to me. That symbol (_#UserApp_#help0_Eq_InLayout(VOID)_1
) only shows up in the reloc and not as a label.
Hm. It looks like we also have it for Linux, though:
100: 94000000 bl 0 <#UserApp_#help0_Eq_InLayout(23)_1>
100: R_AARCH64_CALL26 #UserApp_#help0_Eq_InLayout(VOID)_1
ld
with -v
just printed some more irrelevant info
well-spotted! indeed, we don't generate that function. On linux that is apparently fine because the function call is never reached
but macos does not let you get away with that (arguably the better approach)
Summary [ 46.667s] 136 tests run: 136 passed, 2 skipped
REPL tests using aarch64 dev backend (using some changes that Folkert will add soon)
https://github.com/roc-lang/roc/pull/5897 enables all tests on apple silicon, and should pass
It did! :partying_face:
Summary [ 309.074s] 903 tests run: 903 passed, 6 skipped
Folkert de Vries said:
but macos does not let you get away with that (arguably the better approach)
Agreed. What a bad error message, though. These lower-level macOS tools leave much to be desired :upside_down:
Last updated: Jul 06 2025 at 12:14 UTC