Stream: compiler development

Topic: AArch64 dev backend


view this post on Zulip Agus Zubiaga (Oct 04 2023 at 18:41):

Folkert introduced me to the AArch backend work. Last night, after about 5 hours of staring at Assembly in gdb, I found the two-character change I needed to fix 114 tests :sweat_smile:

https://github.com/roc-lang/roc/pull/5886

view this post on Zulip Agus Zubiaga (Oct 04 2023 at 18:42):

It was actually a 3-lines change originally, but today Folkert mentioned I could simplify it

view this post on Zulip Luke Boswell (Oct 04 2023 at 19:56):

Nice work finding that.

view this post on Zulip Agus Zubiaga (Oct 08 2023 at 04:07):

Down to 36 test failures. We're getting close :smile:

https://github.com/roc-lang/roc/pull/5896

view this post on Zulip Luke Boswell (Oct 08 2023 at 04:14):

So excited to use this. Will be a nice speedup for silicon macs! :smiley:

view this post on Zulip Folkert de Vries (Oct 08 2023 at 07:46):

a good number of them should be tests that should panic. I'm working on the setjmp/longjmp logic that we need for that

view this post on Zulip Folkert de Vries (Oct 08 2023 at 07:46):

then it's the long tail of subtle segfaults, probably

view this post on Zulip Folkert de Vries (Oct 08 2023 at 12:16):

we got it down to

Summary [ 506.665s] 901 tests run: 870 passed (2 slow), 31 failed, 7 skipped

view this post on Zulip Folkert de Vries (Oct 08 2023 at 20:45):

I'm down to one failing test on the pi. it seems related to some other tests that still fail on the M2. I've nailed it down to

test gen_list::list_map2_different_lengths has been running for over 60 seconds
==56630==
==56630== Process terminating with default action of signal 11 (SIGSEGV)
==56630==  Bad permissions for mapped region at address 0x4D17000
==56630==    at 0x7B5FE7C: str.RocStr.reallocateFresh (in /tmp/.tmpqbDNId/app.so.1.0)

view this post on Zulip Folkert de Vries (Oct 08 2023 at 21:27):

it's very inconsistent

==59022== Invalid write of size 8
==59022==    at 0x7B48494: #UserApp_main_2777513159257922988 (in /tmp/.tmpBdWYSq/app.so.1.0)
==59022==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
==59022==
==59022==
==59022== Process terminating with default action of signal 11 (SIGSEGV)
==59022==  Access not within mapped region at address 0x30
==59022==    at 0x7B48494: #UserApp_main_2777513159257922988 (in /tmp/.tmpBdWYSq/app.so.1.0)

view this post on Zulip Folkert de Vries (Oct 08 2023 at 21:57):

allright, well it's late so I should stop but I got this down to

        // Run both with and without lazy literal optimization.
        {
            assert_evals_to!($src, $expected, $ty, $transform, $leak, false);
        }
        {
            assert_evals_to!($src, $expected, $ty, $transform, $leak, true);
        }

the top one works, the bottom one does not. I suspect it's (big?) string constants? @Brendan Hansknecht any ideas here?

also @Luke Boswell this just looks a lot like the windows issues too

view this post on Zulip Luke Boswell (Oct 08 2023 at 22:01):

Nice work, I can have a look later and see if I can find anything.

view this post on Zulip Folkert de Vries (Oct 08 2023 at 22:08):

so what is happening here is that we write the end of a string "ggg" into memory, then write its total length 28 into memory, and then copy it to some other place. The problem is that the x16 register contains NULL, so moving data there obviously fails.

   0x7ff78a8468 <#UserApp_main_2777513159257922988+408>    mov     x17, #0x67                      // #103
   0x7ff78a846c <#UserApp_main_2777513159257922988+412>    sturb   w17, [x8, #25]
   0x7ff78a8470 <#UserApp_main_2777513159257922988+416>    mov     x17, #0x67                      // #103
   0x7ff78a8474 <#UserApp_main_2777513159257922988+420>    sturb   w17, [x8, #26]
   0x7ff78a8478 <#UserApp_main_2777513159257922988+424>    mov     x17, #0x67                      // #103
   0x7ff78a847c <#UserApp_main_2777513159257922988+428>    sturb   w17, [x8, #27]
   0x7ff78a8480 <#UserApp_main_2777513159257922988+432>    stur    x8, [x29, #-104]
   0x7ff78a8484 <#UserApp_main_2777513159257922988+436>    mov     x17, #0x1c                      // #28
   0x7ff78a8488 <#UserApp_main_2777513159257922988+440>    stur    x17, [x29, #-96]
   0x7ff78a848c <#UserApp_main_2777513159257922988+444>    stur    x17, [x29, #-88]
   0x7ff78a8490 <#UserApp_main_2777513159257922988+448>    ldur    x8, [x29, #-104]
  >0x7ff78a8494 <#UserApp_main_2777513159257922988+452>    stur    x8, [x16, #48]
   0x7ff78a8498 <#UserApp_main_2777513159257922988+456>    ldur    x8, [x29, #-96]
   0x7ff78a849c <#UserApp_main_2777513159257922988+460>    stur    x8, [x16, #56]
   0x7ff78a84a0 <#UserApp_main_2777513159257922988+464>    ldur    x8, [x29, #-88]
   0x7ff78a84a4 <#UserApp_main_2777513159257922988+468>    stur    x8, [x16, #64]
   0x7ff78a84a8 <#UserApp_main_2777513159257922988+472>    stur    x16, [x29, #-128]

this could be a general register allocation bug, or maybe it's something specific to the literals

view this post on Zulip Folkert de Vries (Oct 09 2023 at 13:25):

got it (2 bugs actuallly), so now

------------
     Summary [ 816.378s] 902 tests run: 902 passed (1 slow), 7 skipped

it works on my machine!

view this post on Zulip Richard Feldman (Oct 09 2023 at 14:08):

wooooo!!!

view this post on Zulip Richard Feldman (Oct 09 2023 at 14:08):

does that mean it can be activated for the repl? :heart_eyes:

view this post on Zulip Richard Feldman (Oct 09 2023 at 14:08):

(on aarch64)

view this post on Zulip Folkert de Vries (Oct 09 2023 at 14:49):

almost? this fails on CI

------------
     Summary [ 142.098s] 314 tests run: 313 passed, 1 failed, 595 skipped
     SIGABRT [   1.484s] test_gen::test_gen gen_tags::recursive_tag_id_in_allocation_eq
error: test run failed

but it works on the pi so needs some debugging on an M1/M2 still I think

view this post on Zulip Folkert de Vries (Oct 09 2023 at 15:12):

nvm it does trigger these valgrind warnings

test gen_tags::recursive_tag_id_in_allocation_eq has been running for over 60 seconds
==73198== Thread 2 gen_tags::recur:
==73198== Invalid write of size 8
==73198==    at 0x7747ED0: #UserApp_x_8352987475006511248 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==  Address 0x69cb0a0 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198==    at 0x7748278: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==  Address 0x69cb0a0 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198==    at 0x774827C: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==  Address 0x69d63a0 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198==    at 0x77484A4: #UserApp_#help1_Dec_InLayout(25)_1 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==  Address 0x69d63a0 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==
==73198== Invalid write of size 8
==73198==    at 0x77481F0: #UserApp_y_8352987475006511248 (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==  Address 0x6a10b60 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E8EB: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpuIFUPW/app.so.1.0)
==73198==


==73198== Invalid read of size 1
==73198==    at 0x7748268: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==  Address 0x505b090 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198==    at 0x774826C: #UserApp_#help0_Eq_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==  Address 0x50c8170 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==
==73198== Invalid read of size 1
==73198==    at 0x77484D4: #UserApp_#help1_Dec_InLayout(25)_1 (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==  Address 0x50c8170 is 0 bytes after a block of size 16 alloc'd
==73198==    at 0x4849E4C: malloc (vg_replace_malloc.c:307)
==73198==    by 0x775E91B: roc_builtins.utils.allocate_with_refcount (in /tmp/.tmpU2tUPy/app.so.1.0)
==73198==

view this post on Zulip Folkert de Vries (Oct 09 2023 at 16:35):

fixed it (turns out it was an issue on x86 too). I expect https://github.com/roc-lang/roc/pull/5839 to pass now

view this post on Zulip Folkert de Vries (Oct 09 2023 at 16:50):

also I can run the repl on the pi no problem, which based on the source code will use the dev backend already

    #[cfg(not(target_os = "linux"))]
    let (lib, main_fn_name, subs, layout_interner) =
        mono_module_to_dylib_llvm(&arena, target, loaded, opt_level)
            .expect("we produce a valid Dylib");

    #[cfg(target_os = "linux")]
    let (lib, main_fn_name, subs, layout_interner) =
        mono_module_to_dylib_asm(&arena, target, loaded, opt_level)
            .expect("we produce a valid Dylib");

view this post on Zulip Folkert de Vries (Oct 09 2023 at 16:51):

so someone on an M1/M2 can just try that I guess

view this post on Zulip Brendan Hansknecht (Oct 09 2023 at 17:25):

Yeah, works on m1

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 17:45):

The REPL running on AArch64 dev backend (M1)! :tada:

CleanShot-2023-10-09-at-14.44.472x.png

view this post on Zulip Folkert de Vries (Oct 09 2023 at 18:12):

ci passed on the PR now

view this post on Zulip Luke Boswell (Oct 09 2023 at 20:15):

With my M2 mac on beed1e3d6ea093f3cbc6281b04c655449ff88109 I get the following for cargo nextest-gen-dev --no-fail-fast

Summary [ 574.116s] 902 tests run: 900 passed (2 slow), 2 failed, 7 skipped
        FAIL [   0.164s] test_gen::test_gen gen_list::list_ends_with_empty
        FAIL [   0.163s] test_gen::test_gen gen_list::list_starts_with_empty

So close !! :octopus:

view this post on Zulip Luke Boswell (Oct 09 2023 at 20:17):

Actually... this might be a nix issue on my part, I'll re-run Looks like we still have a couple to go

view this post on Zulip Folkert de Vries (Oct 09 2023 at 21:42):

a hunch, in crates/compiler/gen_dev/src/generic64/mod.rs can you uncomment the and here

    fn build_not(&mut self, dst: &Symbol, src: &Symbol, arg_layout: &InLayout<'a>) {
        match self.interner().get_repr(*arg_layout) {
            LayoutRepr::BOOL => {
                let dst_reg = self.storage_manager.claim_general_reg(&mut self.buf, dst);
                let src_reg = self.storage_manager.load_to_general_reg(&mut self.buf, src);

                ASM::mov_reg64_imm64(&mut self.buf, dst_reg, 1);
                ASM::xor_reg64_reg64_reg64(&mut self.buf, src_reg, src_reg, dst_reg);

                // we may need to mask out other bits in the end? but a boolean should be 0 or 1.
                // if that invariant is upheld, this mask should not be required
                // ASM::and_reg64_reg64_reg64(&mut self.buf, src_reg, src_reg, dst_reg);

                ASM::mov_reg64_reg64(&mut self.buf, dst_reg, src_reg);
            }
            x => todo!("Not: layout, {:?}", x),
        }
    }

view this post on Zulip Folkert de Vries (Oct 09 2023 at 21:44):

valgrind is ok with it at least so I'm not sure what else to do

view this post on Zulip Folkert de Vries (Oct 09 2023 at 21:44):

admin@raspberrypi ~/roc (aarch-records)> valgrind --track-origins=yes target/debug/deps/test_gen-f2c946b9d248f5c7 list_ends_with_empty
==89204== Memcheck, a memory error detector
==89204== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==89204== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==89204== Command: target/debug/deps/test_gen-f2c946b9d248f5c7 list_ends_with_empty
==89204==

running 1 test
test gen_list::list_ends_with_empty has been running for over 60 seconds
test gen_list::list_ends_with_empty ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 785 filtered out; finished in 342.49s

==89204==
==89204== HEAP SUMMARY:
==89204==     in use at exit: 3,746,224 bytes in 17,229 blocks
==89204==   total heap usage: 437,310 allocs, 420,081 frees, 264,614,202 bytes allocated
==89204==
==89204== LEAK SUMMARY:
==89204==    definitely lost: 3,271,420 bytes in 15,096 blocks
==89204==    indirectly lost: 182,040 bytes in 21 blocks
==89204==      possibly lost: 119,592 bytes in 7 blocks
==89204==    still reachable: 173,172 bytes in 2,105 blocks
==89204==         suppressed: 0 bytes in 0 blocks
==89204== Rerun with --leak-check=full to see details of leaked memory
==89204==
==89204== For lists of detected and suppressed errors, rerun with: -s
==89204== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

view this post on Zulip Folkert de Vries (Oct 09 2023 at 21:45):

but it may be that we need to task the boolean, which uncommenting that line should do. (I'm assuming the result of the test is an empty list?)

view this post on Zulip Luke Boswell (Oct 09 2023 at 21:54):

I can look at this again later this evening around

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:12):

I'm pretty busy today so I probably won't have time to debug this properly but at least in my machine this fails at link time:

--- STDERR:              test_gen::test_gen gen_list::list_ends_with_empty ---
ld: invalid r_symbolnum=32 in '/private/tmp/nix-shell.yttg98/.tmpnHbSn5/app.o'

view this post on Zulip Folkert de Vries (Oct 09 2023 at 22:15):

a thought: there might be some sort of "enable more debug info" environment variable for the linker that could help here

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:18):

Will try that. I'll post an objdump too, there's something weird with a reloc.

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:24):

https://gist.github.com/agu-z/554090ab7c5ed5c6e83bd8b170af7e74#file-list_ends_with_empty-s-L582-L583

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:25):

That looks interesting to me. That symbol (_#UserApp_#help0_Eq_InLayout(VOID)_1) only shows up in the reloc and not as a label.

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:48):

Hm. It looks like we also have it for Linux, though:

 100:   94000000    bl  0 <#UserApp_#help0_Eq_InLayout(23)_1>
            100: R_AARCH64_CALL26   #UserApp_#help0_Eq_InLayout(VOID)_1

view this post on Zulip Agus Zubiaga (Oct 09 2023 at 22:52):

ld with -v just printed some more irrelevant info

view this post on Zulip Folkert de Vries (Oct 10 2023 at 12:01):

well-spotted! indeed, we don't generate that function. On linux that is apparently fine because the function call is never reached

view this post on Zulip Folkert de Vries (Oct 10 2023 at 12:01):

but macos does not let you get away with that (arguably the better approach)

view this post on Zulip Luke Boswell (Oct 10 2023 at 12:08):

Summary [ 46.667s] 136 tests run: 136 passed, 2 skipped REPL tests using aarch64 dev backend (using some changes that Folkert will add soon)

view this post on Zulip Folkert de Vries (Oct 10 2023 at 12:30):

https://github.com/roc-lang/roc/pull/5897 enables all tests on apple silicon, and should pass

view this post on Zulip Agus Zubiaga (Oct 10 2023 at 14:49):

It did! :partying_face:

Summary [ 309.074s] 903 tests run: 903 passed, 6 skipped

view this post on Zulip Agus Zubiaga (Oct 10 2023 at 14:51):

Folkert de Vries said:

but macos does not let you get away with that (arguably the better approach)

Agreed. What a bad error message, though. These lower-level macOS tools leave much to be desired :upside_down:


Last updated: Jul 06 2025 at 12:14 UTC