Hey folks! Over the last couple of weeks, I have been looking at the macho surgical linker, on and off. I haven't yet got anywhere near it actually doing a full surgical link on arm64 macOS but I have done some refactors and cleanups to the macho linker. My question now is, would it be something that you would like to see upstreamed in its current form? I will abstain from submitting a PR just yet, however you can have a look at the my WIP here -> https://github.com/roc-lang/roc/compare/main...kubkon:roc:macho-surgery?expand=1
Oh, and a fun fact, I did manage to get Roc as a Rust project to link with bold
linker (used to be zld
a while back) ;-)
So awesome to hear you've been looking at this. It will be really nice to have surgical linking for macos some day.
What would be easiest for you? Are you looking for feedback on your work so far? Is there anything we could do to help you with this?
I've tried to poke around at the surgical likner, but it's like black magic to me. I've always asked @Brendan Hansknecht to help guide me through getting things set up for the surgical linker from the platform development side of things. I'm sure he will be interested in your work here. :smiley:
Let me flip the question and ask what is better for you and the project? I would prefer to submit small incremental PRs so that they are easier to review and other folks can contribute if they feel like it too. Also, I have only limited time to work on this which would probably align itself best with doing small incremental bits at a time.
Yeah, I think smaller incremental PRs are always a good approach.
Nice, so lemme try and clean the changeset up a little bit (I have some of my own debugging inserts floating around which probably should not make it into the repo) and submit a PR and go through the review process. Does that sound good?
Yeah, happy to accept anything. The project is really just a port of the elf surgical linker and definitely is not working currently. Anything to push it towards working or to clean it up is highly welcome.
Alright then, PR up and ready for review https://github.com/roc-lang/roc/pull/7424 I have highlighted bits I was unsure of in comments.
@Brendan Hansknecht what if instead of having an env var for overriding supported(..)
output and in particular Target::MacArm64
arm to return true
for development purposes, that function return an enum such that supported(..) -> SupportLevel
where enum SupportLevel { Full, Dev, None }
and SupportLevel::Full
would correspond to current true
, SupportLevel::None
to false
and SupportLevel::Dev
would allow enabling the surgical linker with a warning message that it is a work-in-progress, so don't expect much, or something like that.
SupportLevel::Dev
could be SupportLevel::Wip
or something
Sure. As long as it falls back on the legacy linker when the files are available that sounds fine.
Lemme cook up a PR and discuss it there over something more concrete then.
PR is up https://github.com/roc-lang/roc/pull/7433
Pushed two more commits that I think solve the immediate issue. The tl;dr is: if no user --linker
flag was specified and linker support level is below SupportLevel::Full
(i.e., SupportLevel::Wip
or SupportLevel::None
) we fall back to the legacy linker by default. If --linker=surgical
was specified and SupportLevel::Wip
then we will warn the user and use the surgical linker.
@Jakub Konka I had to revert PR#7424 (in PR#7435), you can reproduce the test failures with cargo test --release -p roc_cli cli_tests
.
The PR got merged even though CI failed because we've been having a lot of flaky failures on macos lately and this appeared to be a typical flaky failure and so the merge was forced due to human error.
Anton said:
Jakub Konka I had to revert PR#7424 (in PR#7435), you can reproduce the test failures with
cargo test --release -p roc_cli cli_tests
.The PR got merged even though CI failed because we've been having a lot of flaky failures on macos lately and this appeared to be a typical flaky failure and so the merge was forced due to human error.
No probs, I will resubmit the changes with #7433 which actually fixes macos tests after my PR got merged.
@Anton Revert of revert with test fixes up in https://github.com/roc-lang/roc/pull/7436
I noticed that signed commits are required in the repo - I will make sure to sign my commits from now on. Apologies for not having done that till now!
No worries. Happens to everyone.
I just realized this is a Surgical linker for Mach-O, not a in-some-way brawnier version of the surgical linker
Anthony Bullard said:
I just realized this is a Surgical linker for Mach-O, not a in-some-way brawnier version of the surgical linker
brawny == macho? Bit of a stretch but maybe? ;-)
jokes aside, it's my fault, I just tend to drop the hyphen
I never type the hyphen either. Also, it is a more macho problem. Mach-o is simply more complex than elf.
another PR up and ready for review - all commits signed this time! https://github.com/roc-lang/roc/pull/7441
So I was toying with the idea of using gimli-rs/object
's provided LoadCommandIterator
to iterate over the load commands but given how the preprocess step really wants very low-level control over what to copy over, I think that's a bad idea. Nevertheless, if anyone's interested what that entails, here's where it would lead: kubkon:roc:use-macho-iterator
I'll definitely have to take a look. Also, is gimli fully functional for macho now. I feel like when I first started the macho surgical linker it was still missing a lot of macho.
It seems like it is. I am still unclear if it's worth using though (for parsing) rather than sticking to load_struct_in_place
with some helpers sprinkled around. gimli uses Cow<..>
under-the-hood for reading which means it should be zero-cost, however since surgical linker is a lot about manipulation at a byte level, it may be not worth it. Dunno, I cannot make up my mind... yet :grinning:
Anyhow, I'd be curious to learn what you think about it!
I'm trying to grasp the surgical linker pipeline in the compiler, and I don't seem to understand what the purpose of libapp.dylib
actually is. So far I see that we emit it in spawn_surgical_host_build_thread
where we:
libapp.dylib
libapp.dylib
and symbols it exportsWhile here I've also noticed that the number of dynamic system dependencies of libapp.dylib
and rebuilt host are different (ignoring host dependence on libapp.dylib
) which is surprising as somehow I thought they should match but I am probably not understanding the build process well yet.
Does this diagram help at all? https://github.com/roc-lang/basic-cli/blob/main/basic-cli-build-steps.png
My understanding is that libapp.dylib is all the roc stuff gets compiled to a shared library.
Then it gets linked to the host, so the host executable is expecting to be dynamically linked to a shared library.
Then the "preprocess" host step takes the compiled host executable (which is dynamically linking the roc app), and does some surgical magic to it so we can swap out those parts later with a different roc app (that still uses the same platform/API).
This is my layman's understanding. @Brendan Hansknecht is definitely the expert on how it works on a technical level.
Luke Boswell said:
My understanding is that libapp.dylib is all the roc stuff gets compiled to a shared library.
"all the roc stuff" is that a dummy app or similar written in roc?
Thanks @Luke Boswell ! I had a look at that diagram but it still left some questions unanswered. Perhaps if we go over an example from basic-cli
? Say I am trying to run examples/echo.roc
. In it, I specify platform/main.roc
which corresponds to the cli
platform, correct? So there are 3 components at play here: user program (echo.roc
) <-> platform (cli
) <-> host (??).
The first node (user program) is expected to be pluggable by the user as far as I understand, but what does it compile too? An object file or an executable that links against platform + host
, or something else?
Next, platform <-> host
are prebuilt. Am I correct here? And this is where host
initially is linking dynamically against libapp.dylib
which really is libplatform.dylib
. We then surgically merge host
with libapp.dylib
to emit platform
as a dynamically linked executable (on macOS, since we always link at the very least against libSystem.dylib
). How am I doing so far?
Thanks for the chat @Jakub Konka, let me know if I can help in any other way.
Luke Boswell said:
Thanks for the chat Jakub Konka, let me know if I can help in any other way.
Thank you! It was very helpful!
Ok, so I'm sure we mix up our own wording sometimes, but here are some details.
Firstly, naming. A roc executable is generated from 2 parts, the platform (basic-cli) and the application (echo.roc + any packages it uses). The platform is made out of two parts, the host (the part written in another language, rust for basic cli) and the roc API.
From a compilation and linking perspective, it is probably best to think of the split instead as host (again, the part in another language like rust or zig) and everything else (which is all written in roc)
The goal with the surgical linker was to get as close to a completed executable as possible without compiling anything in a .roc
file. Then to simply hack in the roc part of the code.
The closest thing to a completed executable, is a dynamically linked executable where the host is completely compiled, and it attaches to a shared libraries that contains all the roc code.
That is what we have the host compile into
We have it link to a dummy shared library that just has all the correct headings to look like a correct roc app + platform for apis. We also could just emit a shared library containing a real application.
Oh, also, to clarify, one of your questions above, all .roc
files compile into a single object file.
Ok, so the process is essentially:
generating the dummy shared lib was required to get host languages to generate a complete executables
That's excellent, thank you @Brendan Hansknecht and @Luke Boswell I think the bit I was missing was the fact that the host is dynamically linking against a dynamic library so that the linker synthesises all relevant bits which the surgical linker can then substitute with static references to an Roc app. As discussed with Luke already, this implies that any Roc app that is compiled into the host must not depend on any additional libc/framework symbols. This however does not match what I've seen so far for MachO where the app does contain references to _setjmp
and _longjmp
. I haven't investigated why or where that happens but will post it here once I find out more.
I thought they were only used for tests and not regular roc....hmm
And a lot of the other functions we force the platform to implement like memcpy, malloc, etc
I will try digging in. It definitely feels like a bug somewhere now that I know how's it supposed to work under-the-hood.
Looks like genuine roc (glue or builtins?) symbols missing from the platform
Screenshot 2025-01-06 at 08.49.52.png
oh ok, these seemingly come from crates/compiler/builtins/bitcode/src/main.zig
to my inexperienced eyes it seems to match @Brendan Hansknecht intuition - _setjmp
and _longjmp
are referenced by __roc_force_setjmp
and __roc_force_longjmp
which seem to be force into a translation unit (when using llvm) in loop https://github.com/roc-lang/roc/blob/main/crates/compiler/gen_llvm/src/llvm/build.rs#L1056-L1083
yep, confirmed, commenting out __roc_force_setjmp
and __roc_force_longjmp
from the must_keep
list does not emit refs to setjmp
and longjmp
Screenshot 2025-01-06 at 09.13.35.png
so the remaining question is why this happens in the first place especially if it is assumed they should only be kept for test binaries (or similar)
on re-reading the comments tho, are they really tests only? it seems they are special functions that should resolve to an LLVM intrinsic but for some reason they do not, at least not on aarch64-macos
I haven't checked what happens on x86_64-linux
- could anyone confirm for me if setjmp
is present in echo.o
built with roc build --no-link echo.roc
? Perhaps it gets lowered to an actual in-place implementation by llvm, whereas on aarch64 (maybe only macos) to a libc call?
I am not sure what Brendan meant by "tests" by presumably building tests in a roc app will require a platform with some extended functionality which provides PLT entries for those symbols (on arm64 macos), so perhaps when not building tests we could either:
1) check if we are indeed building a test app and check for __roc_force_setjmp
only then, or
2) always link apps with -undefined dynamic_lookup
The latter carries the risk of not erroring out on undefined symbols that were meant to be resolved at link time though.
@Folkert de Vries may be able to answer these questions
Yeah, I have a few notes:
roc test ...
(which doesn't actually link to a platform at all).aarch64
...no idea why.Looking into this a bit more, I think a lot of this is actually outdated infrastructure
nowadays panicking in roc calls the platform exposed roc_panic
.
It used to be that panicking in roc was done via setjmp
longjmp
Hmm...I'm not actually sure if setjmp
and longjmp
are needed in roc anymore at all. That said, it appears in a lot of locations due to old infra that I think can get ripped out.
That said, I still think some of our tests use setjmp
longjmp
. So unwinding this may be a bit complex...
I don't think they should be needed unless we decide to make roc_panic
automatically get translated into the original function call returning a Result
to the host
that said, I know that for nea
Folkert has also implemented architecture-specific setjmp and longjmp using assembly that doesn't require a libc dependency, so that could be an option if we decide to do that
either way, I don't think we should need to link them
Yeah, I think they are just still used in many places across the compiler. Like all the gen-tests use them. Also roc test
may use them, but not 100% sure.
i think roc test uses it for expects
and they are used for gen tests to catch errors
hm, why would inline expects need to longjmp? :face_with_raised_eyebrow:
they don't halt the test, they just record that the test should fail, report the failure, and keep going
and top-level expects should just run the code to completion
For roc test, I think it is used to return back to main execution on a panic.
So run top level expect, It fails due to hitting a panic. Long jump to code to print the failure and continue to the next top level expect
For gen_test, I think we should be able to implement roc_panic
in "platform". And it can use setjmp and longjmp. Avoiding needing it in the builtins.
ahh that makes sense
So here's my understanding of the situation based on the discussion above:
setjmp
and longjmp
is aarch64
-specific__roc_force_setjmp
and __roc_force_longjmp
which are Roc's builtins used exclusively for tests - in particular, roc_panic
in tests is implemented using themaarch64-linux
surgical linker too but this one is not implemented yet__roc_force_setjmp
and __roc_force_longjmp
are builtins/compiler-rt symbols for the Roc compiler and Roc has a very specialised compiler pipeline, compiler-rt is always inlined in the sense that it is never linked in as a static archive but rather always part of the emitted app.o
object file - this then makes emitting those symbols conditionally depending if roc test
was called non-trivial to implement)Since bullet point 3 is going to happen sooner or later, I am thinking of simply special-casing those two externs (setjmp
and longjmp
) in the linker and simply ignoring them. If nothing apart from __roc_force_setjmp
and __roc_force_longjmp
is referencing them, then it should not impact linking in any way. This way, once bullet point 3 lands, updating the linker amounts to removing this special casing and nothing else. Is this a viable temporary workaround?
Proposed change to the macho linker https://github.com/roc-lang/roc/pull/7474
That sounds good
I don't think there is any need in shifting loadable segments in file offsets/memory when preprocessing host. Instead, it should be fine to rely on the linker to emit enough padding between the end of load commands and the start of the first section to insert the required Roc load commands. Then during surgical link we would put the segments after the last existing loadable segment but before the __LINKEDIT
segment - __LINKEDIT
segment always has to come last so there's nothing we can do about that anyhow.
Proposed changes 7499
I wonder if elf has lower alignment constraints for this. I think with elf there was regularly not enough space (thus all the shifting)
As a note, we may also need a few more sections eventually. .data
and potentially both of .tbss
and .tdata
Yeah, very possibly. For MachO this is so common to want to add new load commands that there is a linker flag for just that -headerpad size
.
Yeah, I expect more sections, but for the time being let's do something simple(r) and get it working first.
Last updated: Jul 06 2025 at 12:14 UTC