Stream: compiler development

Topic: accessing zig builtins from zig compiler


view this post on Zulip Richard Feldman (May 20 2025 at 02:20):

right now all the zig builtins live in crates/ which makes them inaccessible to the zig compiler...should we to move them to src/builtins or something?

view this post on Zulip Brendan Hansknecht (May 20 2025 at 02:20):

I would just fork and split

view this post on Zulip Brendan Hansknecht (May 20 2025 at 02:21):

They have to change anyway due to zig version and new API with explicit function pointers passed in

view this post on Zulip Brendan Hansknecht (May 20 2025 at 02:22):

And yeah, sec/builtins sounds like a good dir for it

view this post on Zulip Richard Feldman (May 20 2025 at 02:28):

cool, sounds good! :thumbs_up:

view this post on Zulip Anthony Bullard (May 20 2025 at 02:44):

it'll also need quite a rewrite with all of the new syntax

view this post on Zulip Luke Boswell (May 20 2025 at 02:51):

Is this something I can help with? Do we have something minimal like Bool or maybe Str we need?

view this post on Zulip Richard Feldman (May 20 2025 at 02:55):

sure! Here's a WIP PR with just that change: https://github.com/roc-lang/roc/pull/7802

view this post on Zulip Richard Feldman (May 20 2025 at 02:55):

if you want to try to get it updated and tests passing, that would be awesome! :smiley:

view this post on Zulip Richard Feldman (May 20 2025 at 03:02):

Str would be enough for hello world, but probably at that point you've gotten all the hard parts anyway :sweat_smile:

view this post on Zulip Luke Boswell (May 20 2025 at 03:13):

We're not doing the shared memory buffer thing anymore with expect right?

view this post on Zulip Luke Boswell (May 20 2025 at 03:14):

You've copied the the builtins verbatim and we can cut out things we don't need right?

view this post on Zulip Richard Feldman (May 20 2025 at 03:16):

yes and yes!

view this post on Zulip Richard Feldman (May 20 2025 at 03:17):

expect will just work the same way dbg does, where the host exposes a function for "this gets called whenever an inline expect fails, and it's up to the host to decide what to do with that"

view this post on Zulip Luke Boswell (May 20 2025 at 06:46):

I've got the tests passing now... but it's not really being used anywhere yet so it's hard to know if it's all ok.

Are we still planning on compiling the builtins to bitcode and then linking with our code gen somehow? I'm interested to know whats next from here... would this PR be good to merge like this and we'll start using it sometime, somehow.

view this post on Zulip Richard Feldman (May 20 2025 at 11:35):

yeah we'll still need to compile them to bitcode so LLVM can import them

view this post on Zulip Richard Feldman (May 20 2025 at 11:35):

but for now, the interpreter needs them in a general "just import them and start using them directly" sense

view this post on Zulip Richard Feldman (May 20 2025 at 13:14):

very cool! @Luke Boswell looks like CI is still failing on a bunch of missing doc comments :smile:

view this post on Zulip Luke Boswell (May 20 2025 at 22:27):

I think this typos check is getting a bit carried away...

error: `numer` should be `number`
  --> ./src/builtins/dec.zig:797:56
    |
797 |         sr = 1 + N_UDWORD_BITS + denom_leading_zeros - numer_hi_leading_zeros;
    |                                                        ^^^^^
    |

view this post on Zulip Luke Boswell (May 20 2025 at 22:30):

Or another example from CI failure

error: `fo` should be `of`, `for`, `do`, `go`, `to`
  --> ./src/builtins/str.zig:1198:34
     |
1198 |     const fo = RocStr.fromSlice("fo");
     |                                  ^^
     |

view this post on Zulip Luke Boswell (May 20 2025 at 22:30):

There must be a config we're missing somewhere

view this post on Zulip Luke Boswell (May 21 2025 at 00:19):

What is unfortunate about this PR, is that they are all new files... so it's hard to see the individual changes from the originals.

view this post on Zulip Luke Boswell (May 21 2025 at 00:35):

@Richard Feldman @Brendan Hansknecht I have found an issue I think with our builtins. We're now running the zig tests on Windows ARM.

The test is being run on Windows with ARM64 architecture, but the inline assembly code in main.zig is written for x86_64 architecture (using registers like %rcx, %rdx, etc. which are specific to x86_64).

I'm thinking of asking Claude for help making an implementation, or maybe stubbing it out for ARM. I'm not really sure here. Another option is to ignore the tests on Windows ARM... :thinking:

view this post on Zulip Richard Feldman (May 21 2025 at 00:36):

what are we using that for again? setjmp/longjmp?

view this post on Zulip Richard Feldman (May 21 2025 at 00:37):

and yeah I think Claude 3.7 Sonnet has a reasonable chance of translating it if it's pretty small, although at that point we do need to be pretty confident that our tests are exercising it sufficiently :sweat_smile:

view this post on Zulip Luke Boswell (May 21 2025 at 00:38):

Yes, and yeah it's tiny

view this post on Zulip Luke Boswell (May 21 2025 at 00:38):

comptime {
    if (builtin.os.tag == .windows and builtin.target.cpu.arch == .x86_64) {
        asm (
            \\.global windows_longjmp;
            \\windows_longjmp:
            \\  movq 0x00(%rcx), %rdx
            \\  movq 0x08(%rcx), %rbx
            \\  # note 0x10 is not used yet!
            \\  movq 0x18(%rcx), %rbp
            \\  movq 0x20(%rcx), %rsi
            \\  movq 0x28(%rcx), %rdi
            \\  movq 0x30(%rcx), %r12
            \\  movq 0x38(%rcx), %r13
            \\  movq 0x40(%rcx), %r14
            \\  movq 0x48(%rcx), %r15
            \\
            \\  # restore stack pointer
            \\  movq 0x10(%rcx), %rsp
            \\
            \\  # load jmp address
            \\  movq 0x50(%rcx), %r8
            \\
            \\  # set up return value
            \\  movq %rbx, %rax
            \\
            \\  movdqu 0x60(%rcx), %xmm6
            \\  movdqu 0x70(%rcx), %xmm7
            \\  movdqu 0x80(%rcx), %xmm8
            \\  movdqu 0x90(%rcx), %xmm9
            \\  movdqu 0xa0(%rcx), %xmm10
            \\  movdqu 0xb0(%rcx), %xmm11
            \\  movdqu 0xc0(%rcx), %xmm12
            \\  movdqu 0xd0(%rcx), %xmm13
            \\  movdqu 0xe0(%rcx), %xmm14
            \\  movdqu 0xf0(%rcx), %xmm15
            \\
            \\  jmp *%r8
            \\
            \\.global windows_setjmp;
            \\windows_setjmp:
            \\  movq %rdx, 0x00(%rcx)
            \\  movq %rbx, 0x08(%rcx)
            \\  # note 0x10 is not used yet!
            \\  movq %rbp, 0x18(%rcx)
            \\  movq %rsi, 0x20(%rcx)
            \\  movq %rdi, 0x28(%rcx)
            \\  movq %r12, 0x30(%rcx)
            \\  movq %r13, 0x38(%rcx)
            \\  movq %r14, 0x40(%rcx)
            \\  movq %r15, 0x48(%rcx)
            \\
            \\  # the stack location right after the windows_setjmp call
            \\  leaq 0x08(%rsp), %r8
            \\  movq %r8, 0x10(%rcx)
            \\
            \\  movq (%rsp), %r8
            \\  movq %r8, 0x50(%rcx)
            \\
            \\  movdqu %xmm6,  0x60(%rcx)
            \\  movdqu %xmm7,  0x70(%rcx)
            \\  movdqu %xmm8,  0x80(%rcx)
            \\  movdqu %xmm9,  0x90(%rcx)
            \\  movdqu %xmm10, 0xa0(%rcx)
            \\  movdqu %xmm11, 0xb0(%rcx)
            \\  movdqu %xmm12, 0xc0(%rcx)
            \\  movdqu %xmm13, 0xd0(%rcx)
            \\  movdqu %xmm14, 0xe0(%rcx)
            \\  movdqu %xmm15, 0xf0(%rcx)
            \\
            \\  xorl %eax, %eax
            \\  ret
            \\
        );
    }
}

view this post on Zulip Richard Feldman (May 21 2025 at 00:46):

worth a try I guess!

view this post on Zulip Richard Feldman (May 21 2025 at 00:46):

could also ask a new Claude chat afterwards to review the generated code, look for problems etc

view this post on Zulip Luke Boswell (May 21 2025 at 01:36):

No real luck unfortunately... I've got an ARM and x86 Windows machines, I might compile something and objdump that to see what zig does for setjmp and longjmp and use that asm as inspiration for our implementation.

I've added a test for setjmp and longjmp that seems to be working well on all the non-Windows machines, but fails on Windows... so I'm also guessing our existing Windows impl may not be right either.

Should be able to look at this sometime later in the week.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 01:58):

I feel like we shouldn't need those anymore

view this post on Zulip Luke Boswell (May 21 2025 at 02:03):

That sounds much easier... I'm not sure where we used it tbh

view this post on Zulip Richard Feldman (May 21 2025 at 02:42):

if I remember right, we needed them so that if a crash happens in a roc test test, we can turn that into a failed test rather than taking out the entire roc test run

view this post on Zulip Richard Feldman (May 21 2025 at 02:43):

not sure how else we'd handle that scenario other than something really heavyweight liek spawning separate OS threads or processes :sweat_smile:

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:45):

Sure, but it can be in the platform instead of in the builtins

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:46):

Or we can do other hacks when roc_crash is called.

view this post on Zulip Richard Feldman (May 21 2025 at 02:47):

hm, does that help us? :thinking:

view this post on Zulip Richard Feldman (May 21 2025 at 02:48):

I guess the compiler itself can use libc's setjmp and longjmp maybe

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:49):

Yeah

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:50):

Or if the crash is always put at the end of a test, we might be able to set a global and the just return from the crash....but that depends on a lot of things.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:50):

And maybe would break things

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:51):

Oh wait, we'll be running the the interpreter for most tests, right? So like probably can just have it set state and then return.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:51):

Idk...just ideas

view this post on Zulip Brendan Hansknecht (May 21 2025 at 02:51):

I'm just not a fan of having that code in the bultins

view this post on Zulip Luke Boswell (May 21 2025 at 03:09):

The roc test use case is really fuzzy for me. Would it be possible to step through it in a few dot points?

^^ is this even remotely close?

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:13):

Oh wait, I was thinking of our internal unit tests, not roc test

view this post on Zulip Luke Boswell (May 21 2025 at 03:13):

In my imagining here.. the roc_crashed implementation just returns from the process with a non-zero exit code

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:13):

For roc test, I 100% think we should use subprocesses

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:17):

2 main reasons:

  1. It allows us to have parallelism configs
  2. It means we can clean up memory leaks from crashes and not have cascading memory failures.

view this post on Zulip Luke Boswell (May 21 2025 at 03:19):

@Brendan Hansknecht your not worried about spawning processes being heavyweight which I think Richard was referring to?

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:20):

For tests, not really. Tests run on the human scale. Humans are slow.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:21):

And you don't need a process per test, you can group things.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:21):

So it will be amortized

view this post on Zulip Luke Boswell (May 21 2025 at 03:24):

Also... the primary dev loop can probably use the interpreter for running tests. So I can imagine roc test uses that, and roc test --opt full compiles a test exectuable using llvm and then runs the tests.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:25):

Probably compiles a shared library for the llvm case, but yeah

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:26):

Oh, actually, for roc, we probably don't even need full process, I think we can just use threads. They just need separate memory allocators. A gpa per thread.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:26):

Cause roc can't segfaults or crash except via roc_crash and roc_expect_failed and we can handle those cleanly.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:27):

Only need a process if you are afraid of real segfaults and crashes (which correct roc code should never do and in tests, we can make memory allocation failures be treated like crashes)

view this post on Zulip Luke Boswell (May 21 2025 at 03:28):

It's beautiful...

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:28):

Only exception is stack overflows.....oh....I guess we need procresses to handle stack overflows

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:28):

Cause those will kill a full procress

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:28):

And can still exist in valid roc code

view this post on Zulip Luke Boswell (May 21 2025 at 03:29):

Didn't we plan to leave those... and just have a timer or interrupt to check if it's still going

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:29):

Not infinite loops, but stack overflows which I think kill the full process....though maybe they just kill a single thread....need to test

view this post on Zulip Luke Boswell (May 21 2025 at 03:31):

So you fork one child process, which then runs everything in threads. If any of them stack overflow you can recover and report it.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:31):

Yeah, stack overflow kills full processes, so we need process isolation to catch them

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:32):

And yeah we could do that, but then you don't know which test led to the overflow (also, I guess the interpreter can catch what would ne stack overflows and report them directly)

view this post on Zulip Luke Boswell (May 21 2025 at 03:33):

If you had a write only log each time you started/completed a test or ran the tests one at a time or something you could maybe report which one is the issue (or narrow it down)

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:34):

:nod-yes:

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:34):

Frankly, I'm really not to worried about the cost of processes. So I would vote for starting simple first.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:35):

Just use process isolation for catching issues and improve with thread strategies later if it actually has issues.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 03:36):

Spawning a process takes a few millisecond generally. And the single process can then run many tests only needing to relaunch on a stack overflow.

view this post on Zulip Luke Boswell (May 21 2025 at 03:37):

Well we've come a long way around... but I'm reasonably convinced we don't need setjmp and longjmp in the builtins anymore. We can add it back easily enough if that changes. Simplest way forward to land this PR and the builtins is just to remove it.

view this post on Zulip Luke Boswell (May 21 2025 at 03:38):

I like the idea of the platform (or test implementation) handling everything through our nice clean host ABI interface instead

view this post on Zulip Anton (May 21 2025 at 09:13):

Luke Boswell said:

There must be a config we're missing somewhere

It's typos.toml at the root of the repo.

view this post on Zulip Luke Boswell (May 21 2025 at 10:11):

Yeah I played with that a bit but ended up just renaming things

view this post on Zulip Richard Feldman (May 21 2025 at 11:53):

Brendan Hansknecht said:

And yeah we could do that, but then you don't know which test led to the overflow (also, I guess the interpreter can catch what would ne stack overflows and report them directly)

our greenthreads approach can do that!

view this post on Zulip Richard Feldman (May 21 2025 at 11:54):

we'd have a global segfault handler, but it would be able to tell from the address of the segfault which guard page (on which greenthread) caused the fault, which in turn would tell us which test overflowed its stack

view this post on Zulip Richard Feldman (May 21 2025 at 11:54):

I think it would be really great to pursue that approach, because it's something we know we want to get working anyway for Roc hosts

view this post on Zulip Richard Feldman (May 21 2025 at 11:54):

and this is a way we could unblock getting it really working and robust right now

view this post on Zulip Richard Feldman (May 21 2025 at 12:00):

because we could set up tests in our code base where we:

view this post on Zulip Richard Feldman (May 21 2025 at 12:02):

at that point, we can have different dylib source code files which simulate different conditions in a test, e.g.

and then we can make tests which verify that we're doing things like capturing dbg and failed epxects, recovering from crash and stack overflows, etc.

view this post on Zulip Richard Feldman (May 21 2025 at 12:02):

and since they're green threads, we definitely wouldn't need setjmp/longjmp because all we'd do when one of them stack overflows or crashes is to just have them yield

view this post on Zulip Richard Feldman (May 21 2025 at 12:03):

and then the caller can say "cool, all the memory in the arena we gave them is now garbage and we can reset the arena for another test, and all the stack memory we gave them in the greenthread is also garbage and can be reused for another greenthread too"

view this post on Zulip Richard Feldman (May 21 2025 at 12:03):

in other words, the cleanup is just "those resources are available for something else now"

view this post on Zulip Brendan Hansknecht (May 21 2025 at 14:50):

I don't think we should use arenas for tests

view this post on Zulip Brendan Hansknecht (May 21 2025 at 14:50):

But otherwise, yeah, agree

view this post on Zulip Brendan Hansknecht (May 21 2025 at 14:52):

We should not use arenas cause we don't know what code the user is testing and if it will actually run on arenas
If the tests lead to tons of reallocations, they might oom with arenas, but work fine with a standard allocator. I think we likely want a standard allocator per thread.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 15:00):

That said, I do think the test platform could be a lot simpler than a full green thread platform.. cause it only has compute heavy work and no async requests to wait on. So it really just needs a process per core and a single fake stack per process to avoid stack overflows. Then would just hot loop perform compute for tests.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 15:01):

But we could still build out a full green thread base if we prefer, just to really push and build out the primitives.

view this post on Zulip Brendan Hansknecht (May 21 2025 at 15:07):

Oh, other big opportunity. Tests have no inputs, can we make the interpreter able to just run them. No dylib or linking at all, and smarter debug info around things like stack overflows and maybe eventually other things?

view this post on Zulip Richard Feldman (May 21 2025 at 20:05):

yeah I think so!

view this post on Zulip Richard Feldman (May 21 2025 at 20:07):

and yeah I agree about greenthreads not being strictly a requirement here, but it seems to me like:

view this post on Zulip Brendan Hansknecht (May 22 2025 at 00:08):

Yep

view this post on Zulip Richard Feldman (May 22 2025 at 00:18):

@Brendan Hansknecht do you have any interest in trying that out, based on the greenthread code you've already done? I think you're the foremost expert in greenthreads right now! :smiley:

view this post on Zulip Brendan Hansknecht (May 22 2025 at 00:31):

Yes, though not sure I have the time currently

view this post on Zulip Richard Feldman (May 22 2025 at 00:37):

cool cool!

view this post on Zulip Luke Boswell (May 22 2025 at 04:14):

Do the green threads thing work for Windows?

view this post on Zulip Luke Boswell (May 22 2025 at 04:14):

I'm just wondering if it's similar i.e. with dlopen and a dynamic library

view this post on Zulip Luke Boswell (May 22 2025 at 04:14):

Maybe we need to start with a subprocess anyway

view this post on Zulip Brendan Hansknecht (May 22 2025 at 04:49):

Green threads should work on windows the same as Linux

view this post on Zulip Brendan Hansknecht (May 22 2025 at 04:49):

Only place they won't work is wasm

view this post on Zulip Brendan Hansknecht (May 22 2025 at 04:49):

All they are is a chunk of memory for a call stack and a tiny bit of assembly to jump into it

view this post on Zulip Richard Feldman (May 22 2025 at 11:07):

I remember looking into wasm and concluding it was possible but required getting JS involved behind the scenes

view this post on Zulip Joshua Warner (May 26 2025 at 15:39):

You can compile green threads into a state machine, similar to what happens with rust futures or Quasar from the Java world

view this post on Zulip Brendan Hansknecht (May 26 2025 at 15:43):

I feel like that would just be for cooperative green threads

view this post on Zulip Brendan Hansknecht (May 26 2025 at 15:43):

But don't actually know for sure

view this post on Zulip Joshua Warner (May 26 2025 at 15:44):

By cooperative, do you mean non-preemptable?

view this post on Zulip Brendan Hansknecht (May 26 2025 at 15:44):

Yes

view this post on Zulip Joshua Warner (May 26 2025 at 15:45):

The compiler can always do things like insert yields at the start of recursive functions (and loops), so if you want it to be preemptable, it can be made so.

view this post on Zulip Brendan Hansknecht (May 26 2025 at 15:47):

Yeah, but that tends not to work well in practice. I mean it isn't terrible or anything, but after years of trying to avoid it, go added preemption. Either you have too much overhead or you risk getting stuck in a single rogue thread (even if not fully stuck, you often can get locked in major latency ruining and CPU unfair splits)

view this post on Zulip Joshua Warner (May 26 2025 at 15:50):

Yeah there's definitely a bunch of overhead in adding yields all over the place - but my point was just that it's possible to "compile-in" pre-emptability; it doesn't actually have to be a native property of the system you're running on

view this post on Zulip Joshua Warner (May 26 2025 at 15:53):

That can be made lower overhead by not yielding if it hasn't been over some configurable amount of "time" (e.g. a count of candidate yields passed, or an approximate instruction count, etc)

view this post on Zulip Brendan Hansknecht (May 26 2025 at 16:27):

Yep


Last updated: Jul 06 2025 at 12:14 UTC