right now all the zig builtins live in crates/
which makes them inaccessible to the zig compiler...should we to move them to src/builtins
or something?
I would just fork and split
They have to change anyway due to zig version and new API with explicit function pointers passed in
And yeah, sec/builtins sounds like a good dir for it
cool, sounds good! :thumbs_up:
it'll also need quite a rewrite with all of the new syntax
Is this something I can help with? Do we have something minimal like Bool or maybe Str we need?
sure! Here's a WIP PR with just that change: https://github.com/roc-lang/roc/pull/7802
if you want to try to get it updated and tests passing, that would be awesome! :smiley:
Str
would be enough for hello world, but probably at that point you've gotten all the hard parts anyway :sweat_smile:
We're not doing the shared memory buffer thing anymore with expect
right?
You've copied the the builtins verbatim and we can cut out things we don't need right?
yes and yes!
expect
will just work the same way dbg
does, where the host exposes a function for "this gets called whenever an inline expect
fails, and it's up to the host to decide what to do with that"
I've got the tests passing now... but it's not really being used anywhere yet so it's hard to know if it's all ok.
Are we still planning on compiling the builtins to bitcode and then linking with our code gen somehow? I'm interested to know whats next from here... would this PR be good to merge like this and we'll start using it sometime, somehow.
yeah we'll still need to compile them to bitcode so LLVM can import them
but for now, the interpreter needs them in a general "just import them and start using them directly" sense
very cool! @Luke Boswell looks like CI is still failing on a bunch of missing doc comments :smile:
I think this typos
check is getting a bit carried away...
error: `numer` should be `number`
--> ./src/builtins/dec.zig:797:56
|
797 | sr = 1 + N_UDWORD_BITS + denom_leading_zeros - numer_hi_leading_zeros;
| ^^^^^
|
Or another example from CI failure
error: `fo` should be `of`, `for`, `do`, `go`, `to`
--> ./src/builtins/str.zig:1198:34
|
1198 | const fo = RocStr.fromSlice("fo");
| ^^
|
There must be a config we're missing somewhere
What is unfortunate about this PR, is that they are all new files... so it's hard to see the individual changes from the originals.
@Richard Feldman @Brendan Hansknecht I have found an issue I think with our builtins. We're now running the zig tests on Windows ARM.
The test is being run on Windows with ARM64 architecture, but the inline assembly code in
main.zig
is written for x86_64 architecture (using registers like%rcx
,%rdx
, etc. which are specific to x86_64).
I'm thinking of asking Claude for help making an implementation, or maybe stubbing it out for ARM. I'm not really sure here. Another option is to ignore the tests on Windows ARM... :thinking:
what are we using that for again? setjmp/longjmp?
and yeah I think Claude 3.7 Sonnet has a reasonable chance of translating it if it's pretty small, although at that point we do need to be pretty confident that our tests are exercising it sufficiently :sweat_smile:
Yes, and yeah it's tiny
comptime {
if (builtin.os.tag == .windows and builtin.target.cpu.arch == .x86_64) {
asm (
\\.global windows_longjmp;
\\windows_longjmp:
\\ movq 0x00(%rcx), %rdx
\\ movq 0x08(%rcx), %rbx
\\ # note 0x10 is not used yet!
\\ movq 0x18(%rcx), %rbp
\\ movq 0x20(%rcx), %rsi
\\ movq 0x28(%rcx), %rdi
\\ movq 0x30(%rcx), %r12
\\ movq 0x38(%rcx), %r13
\\ movq 0x40(%rcx), %r14
\\ movq 0x48(%rcx), %r15
\\
\\ # restore stack pointer
\\ movq 0x10(%rcx), %rsp
\\
\\ # load jmp address
\\ movq 0x50(%rcx), %r8
\\
\\ # set up return value
\\ movq %rbx, %rax
\\
\\ movdqu 0x60(%rcx), %xmm6
\\ movdqu 0x70(%rcx), %xmm7
\\ movdqu 0x80(%rcx), %xmm8
\\ movdqu 0x90(%rcx), %xmm9
\\ movdqu 0xa0(%rcx), %xmm10
\\ movdqu 0xb0(%rcx), %xmm11
\\ movdqu 0xc0(%rcx), %xmm12
\\ movdqu 0xd0(%rcx), %xmm13
\\ movdqu 0xe0(%rcx), %xmm14
\\ movdqu 0xf0(%rcx), %xmm15
\\
\\ jmp *%r8
\\
\\.global windows_setjmp;
\\windows_setjmp:
\\ movq %rdx, 0x00(%rcx)
\\ movq %rbx, 0x08(%rcx)
\\ # note 0x10 is not used yet!
\\ movq %rbp, 0x18(%rcx)
\\ movq %rsi, 0x20(%rcx)
\\ movq %rdi, 0x28(%rcx)
\\ movq %r12, 0x30(%rcx)
\\ movq %r13, 0x38(%rcx)
\\ movq %r14, 0x40(%rcx)
\\ movq %r15, 0x48(%rcx)
\\
\\ # the stack location right after the windows_setjmp call
\\ leaq 0x08(%rsp), %r8
\\ movq %r8, 0x10(%rcx)
\\
\\ movq (%rsp), %r8
\\ movq %r8, 0x50(%rcx)
\\
\\ movdqu %xmm6, 0x60(%rcx)
\\ movdqu %xmm7, 0x70(%rcx)
\\ movdqu %xmm8, 0x80(%rcx)
\\ movdqu %xmm9, 0x90(%rcx)
\\ movdqu %xmm10, 0xa0(%rcx)
\\ movdqu %xmm11, 0xb0(%rcx)
\\ movdqu %xmm12, 0xc0(%rcx)
\\ movdqu %xmm13, 0xd0(%rcx)
\\ movdqu %xmm14, 0xe0(%rcx)
\\ movdqu %xmm15, 0xf0(%rcx)
\\
\\ xorl %eax, %eax
\\ ret
\\
);
}
}
worth a try I guess!
could also ask a new Claude chat afterwards to review the generated code, look for problems etc
No real luck unfortunately... I've got an ARM and x86 Windows machines, I might compile something and objdump that to see what zig does for setjmp and longjmp and use that asm as inspiration for our implementation.
I've added a test for setjmp and longjmp that seems to be working well on all the non-Windows machines, but fails on Windows... so I'm also guessing our existing Windows impl may not be right either.
Should be able to look at this sometime later in the week.
I feel like we shouldn't need those anymore
That sounds much easier... I'm not sure where we used it tbh
if I remember right, we needed them so that if a crash
happens in a roc test
test, we can turn that into a failed test rather than taking out the entire roc test
run
not sure how else we'd handle that scenario other than something really heavyweight liek spawning separate OS threads or processes :sweat_smile:
Sure, but it can be in the platform instead of in the builtins
Or we can do other hacks when roc_crash
is called.
hm, does that help us? :thinking:
I guess the compiler itself can use libc's setjmp
and longjmp
maybe
Yeah
Or if the crash is always put at the end of a test, we might be able to set a global and the just return from the crash....but that depends on a lot of things.
And maybe would break things
Oh wait, we'll be running the the interpreter for most tests, right? So like probably can just have it set state and then return.
Idk...just ideas
I'm just not a fan of having that code in the bultins
The roc test
use case is really fuzzy for me. Would it be possible to step through it in a few dot points?
roc test
from the cli^^ is this even remotely close?
Oh wait, I was thinking of our internal unit tests, not roc test
In my imagining here.. the roc_crashed
implementation just returns from the process with a non-zero exit code
For roc test
, I 100% think we should use subprocesses
2 main reasons:
@Brendan Hansknecht your not worried about spawning processes being heavyweight which I think Richard was referring to?
For tests, not really. Tests run on the human scale. Humans are slow.
And you don't need a process per test, you can group things.
So it will be amortized
Also... the primary dev loop can probably use the interpreter for running tests. So I can imagine roc test
uses that, and roc test --opt full
compiles a test exectuable using llvm and then runs the tests.
Probably compiles a shared library for the llvm case, but yeah
Oh, actually, for roc, we probably don't even need full process, I think we can just use threads. They just need separate memory allocators. A gpa per thread.
Cause roc can't segfaults or crash except via roc_crash
and roc_expect_failed
and we can handle those cleanly.
Only need a process if you are afraid of real segfaults and crashes (which correct roc code should never do and in tests, we can make memory allocation failures be treated like crashes)
It's beautiful...
Only exception is stack overflows.....oh....I guess we need procresses to handle stack overflows
Cause those will kill a full procress
And can still exist in valid roc code
Didn't we plan to leave those... and just have a timer or interrupt to check if it's still going
Not infinite loops, but stack overflows which I think kill the full process....though maybe they just kill a single thread....need to test
So you fork one child process, which then runs everything in threads. If any of them stack overflow you can recover and report it.
Yeah, stack overflow kills full processes, so we need process isolation to catch them
And yeah we could do that, but then you don't know which test led to the overflow (also, I guess the interpreter can catch what would ne stack overflows and report them directly)
If you had a write only log each time you started/completed a test or ran the tests one at a time or something you could maybe report which one is the issue (or narrow it down)
Frankly, I'm really not to worried about the cost of processes. So I would vote for starting simple first.
Just use process isolation for catching issues and improve with thread strategies later if it actually has issues.
Spawning a process takes a few millisecond generally. And the single process can then run many tests only needing to relaunch on a stack overflow.
Well we've come a long way around... but I'm reasonably convinced we don't need setjmp and longjmp in the builtins anymore. We can add it back easily enough if that changes. Simplest way forward to land this PR and the builtins is just to remove it.
I like the idea of the platform (or test implementation) handling everything through our nice clean host ABI interface instead
Luke Boswell said:
There must be a config we're missing somewhere
It's typos.toml at the root of the repo.
Yeah I played with that a bit but ended up just renaming things
Brendan Hansknecht said:
And yeah we could do that, but then you don't know which test led to the overflow (also, I guess the interpreter can catch what would ne stack overflows and report them directly)
our greenthreads approach can do that!
we'd have a global segfault handler, but it would be able to tell from the address of the segfault which guard page (on which greenthread) caused the fault, which in turn would tell us which test overflowed its stack
I think it would be really great to pursue that approach, because it's something we know we want to get working anyway for Roc hosts
and this is a way we could unblock getting it really working and robust right now
because we could set up tests in our code base where we:
host_abi.zig
type (e.g. varargs of pointers)dlopen
that dylibat that point, we can have different dylib source code files which simulate different conditions in a test, e.g.
crash
dbg
expect
failedand then we can make tests which verify that we're doing things like capturing dbg
and failed epxect
s, recovering from crash
and stack overflows, etc.
and since they're green threads, we definitely wouldn't need setjmp/longjmp because all we'd do when one of them stack overflows or crashes is to just have them yield
and then the caller can say "cool, all the memory in the arena we gave them is now garbage and we can reset the arena for another test, and all the stack memory we gave them in the greenthread is also garbage and can be reused for another greenthread too"
in other words, the cleanup is just "those resources are available for something else now"
I don't think we should use arenas for tests
But otherwise, yeah, agree
We should not use arenas cause we don't know what code the user is testing and if it will actually run on arenas
If the tests lead to tons of reallocations, they might oom with arenas, but work fine with a standard allocator. I think we likely want a standard allocator per thread.
That said, I do think the test platform could be a lot simpler than a full green thread platform.. cause it only has compute heavy work and no async requests to wait on. So it really just needs a process per core and a single fake stack per process to avoid stack overflows. Then would just hot loop perform compute for tests.
But we could still build out a full green thread base if we prefer, just to really push and build out the primitives.
Oh, other big opportunity. Tests have no inputs, can we make the interpreter able to just run them. No dylib or linking at all, and smarter debug info around things like stack overflows and maybe eventually other things?
yeah I think so!
and yeah I agree about greenthreads not being strictly a requirement here, but it seems to me like:
Yep
@Brendan Hansknecht do you have any interest in trying that out, based on the greenthread code you've already done? I think you're the foremost expert in greenthreads right now! :smiley:
Yes, though not sure I have the time currently
cool cool!
Do the green threads thing work for Windows?
I'm just wondering if it's similar i.e. with dlopen and a dynamic library
Maybe we need to start with a subprocess anyway
Green threads should work on windows the same as Linux
Only place they won't work is wasm
All they are is a chunk of memory for a call stack and a tiny bit of assembly to jump into it
I remember looking into wasm and concluding it was possible but required getting JS involved behind the scenes
You can compile green threads into a state machine, similar to what happens with rust futures or Quasar from the Java world
I feel like that would just be for cooperative green threads
But don't actually know for sure
By cooperative, do you mean non-preemptable?
Yes
The compiler can always do things like insert yields at the start of recursive functions (and loops), so if you want it to be preemptable, it can be made so.
Yeah, but that tends not to work well in practice. I mean it isn't terrible or anything, but after years of trying to avoid it, go added preemption. Either you have too much overhead or you risk getting stuck in a single rogue thread (even if not fully stuck, you often can get locked in major latency ruining and CPU unfair splits)
Yeah there's definitely a bunch of overhead in adding yields all over the place - but my point was just that it's possible to "compile-in" pre-emptability; it doesn't actually have to be a native property of the system you're running on
That can be made lower overhead by not yielding if it hasn't been over some configurable amount of "time" (e.g. a count of candidate yields passed, or an approximate instruction count, etc)
Yep
Last updated: Jul 06 2025 at 12:14 UTC