Stream: compiler development

Topic: surgical macho linker


view this post on Zulip Jakub Konka (Dec 28 2024 at 09:13):

Hey folks! Over the last couple of weeks, I have been looking at the macho surgical linker, on and off. I haven't yet got anywhere near it actually doing a full surgical link on arm64 macOS but I have done some refactors and cleanups to the macho linker. My question now is, would it be something that you would like to see upstreamed in its current form? I will abstain from submitting a PR just yet, however you can have a look at the my WIP here -> https://github.com/roc-lang/roc/compare/main...kubkon:roc:macho-surgery?expand=1

view this post on Zulip Jakub Konka (Dec 28 2024 at 09:14):

Oh, and a fun fact, I did manage to get Roc as a Rust project to link with bold linker (used to be zld a while back) ;-)

view this post on Zulip Luke Boswell (Dec 28 2024 at 09:53):

So awesome to hear you've been looking at this. It will be really nice to have surgical linking for macos some day.

What would be easiest for you? Are you looking for feedback on your work so far? Is there anything we could do to help you with this?

I've tried to poke around at the surgical likner, but it's like black magic to me. I've always asked @Brendan Hansknecht to help guide me through getting things set up for the surgical linker from the platform development side of things. I'm sure he will be interested in your work here. :smiley:

view this post on Zulip Jakub Konka (Dec 28 2024 at 10:27):

Let me flip the question and ask what is better for you and the project? I would prefer to submit small incremental PRs so that they are easier to review and other folks can contribute if they feel like it too. Also, I have only limited time to work on this which would probably align itself best with doing small incremental bits at a time.

view this post on Zulip Luke Boswell (Dec 28 2024 at 10:50):

Yeah, I think smaller incremental PRs are always a good approach.

view this post on Zulip Jakub Konka (Dec 28 2024 at 10:58):

Nice, so lemme try and clean the changeset up a little bit (I have some of my own debugging inserts floating around which probably should not make it into the repo) and submit a PR and go through the review process. Does that sound good?

view this post on Zulip Brendan Hansknecht (Dec 28 2024 at 16:46):

Yeah, happy to accept anything. The project is really just a port of the elf surgical linker and definitely is not working currently. Anything to push it towards working or to clean it up is highly welcome.

view this post on Zulip Jakub Konka (Dec 28 2024 at 20:54):

Alright then, PR up and ready for review https://github.com/roc-lang/roc/pull/7424 I have highlighted bits I was unsure of in comments.

view this post on Zulip Jakub Konka (Dec 29 2024 at 21:43):

@Brendan Hansknecht what if instead of having an env var for overriding supported(..) output and in particular Target::MacArm64 arm to return true for development purposes, that function return an enum such that supported(..) -> SupportLevel where enum SupportLevel { Full, Dev, None } and SupportLevel::Full would correspond to current true, SupportLevel::None to false and SupportLevel::Dev would allow enabling the surgical linker with a warning message that it is a work-in-progress, so don't expect much, or something like that.

view this post on Zulip Jakub Konka (Dec 29 2024 at 21:52):

SupportLevel::Dev could be SupportLevel::Wip or something

view this post on Zulip Brendan Hansknecht (Dec 29 2024 at 22:16):

Sure. As long as it falls back on the legacy linker when the files are available that sounds fine.

view this post on Zulip Jakub Konka (Dec 29 2024 at 22:23):

Lemme cook up a PR and discuss it there over something more concrete then.

view this post on Zulip Jakub Konka (Dec 29 2024 at 22:30):

PR is up https://github.com/roc-lang/roc/pull/7433

view this post on Zulip Jakub Konka (Dec 30 2024 at 09:16):

Pushed two more commits that I think solve the immediate issue. The tl;dr is: if no user --linker flag was specified and linker support level is below SupportLevel::Full (i.e., SupportLevel::Wip or SupportLevel::None) we fall back to the legacy linker by default. If --linker=surgical was specified and SupportLevel::Wip then we will warn the user and use the surgical linker.

view this post on Zulip Anton (Dec 30 2024 at 13:38):

@Jakub Konka I had to revert PR#7424 (in PR#7435), you can reproduce the test failures with cargo test --release -p roc_cli cli_tests.

The PR got merged even though CI failed because we've been having a lot of flaky failures on macos lately and this appeared to be a typical flaky failure and so the merge was forced due to human error.

view this post on Zulip Jakub Konka (Dec 30 2024 at 14:37):

Anton said:

Jakub Konka I had to revert PR#7424 (in PR#7435), you can reproduce the test failures with cargo test --release -p roc_cli cli_tests.

The PR got merged even though CI failed because we've been having a lot of flaky failures on macos lately and this appeared to be a typical flaky failure and so the merge was forced due to human error.

No probs, I will resubmit the changes with #7433 which actually fixes macos tests after my PR got merged.

view this post on Zulip Jakub Konka (Dec 30 2024 at 15:34):

@Anton Revert of revert with test fixes up in https://github.com/roc-lang/roc/pull/7436

view this post on Zulip Jakub Konka (Dec 30 2024 at 16:32):

I noticed that signed commits are required in the repo - I will make sure to sign my commits from now on. Apologies for not having done that till now!

view this post on Zulip Brendan Hansknecht (Dec 30 2024 at 16:35):

No worries. Happens to everyone.

view this post on Zulip Anthony Bullard (Dec 30 2024 at 19:21):

I just realized this is a Surgical linker for Mach-O, not a in-some-way brawnier version of the surgical linker

view this post on Zulip Jakub Konka (Dec 30 2024 at 19:22):

Anthony Bullard said:

I just realized this is a Surgical linker for Mach-O, not a in-some-way brawnier version of the surgical linker

brawny == macho? Bit of a stretch but maybe? ;-)

view this post on Zulip Jakub Konka (Dec 30 2024 at 19:23):

jokes aside, it's my fault, I just tend to drop the hyphen

view this post on Zulip Brendan Hansknecht (Dec 30 2024 at 19:24):

I never type the hyphen either. Also, it is a more macho problem. Mach-o is simply more complex than elf.

view this post on Zulip Jakub Konka (Dec 30 2024 at 22:47):

another PR up and ready for review - all commits signed this time! https://github.com/roc-lang/roc/pull/7441

view this post on Zulip Jakub Konka (Jan 02 2025 at 21:57):

So I was toying with the idea of using gimli-rs/object's provided LoadCommandIterator to iterate over the load commands but given how the preprocess step really wants very low-level control over what to copy over, I think that's a bad idea. Nevertheless, if anyone's interested what that entails, here's where it would lead: kubkon:roc:use-macho-iterator

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 22:19):

I'll definitely have to take a look. Also, is gimli fully functional for macho now. I feel like when I first started the macho surgical linker it was still missing a lot of macho.

view this post on Zulip Jakub Konka (Jan 02 2025 at 22:35):

It seems like it is. I am still unclear if it's worth using though (for parsing) rather than sticking to load_struct_in_place with some helpers sprinkled around. gimli uses Cow<..> under-the-hood for reading which means it should be zero-cost, however since surgical linker is a lot about manipulation at a byte level, it may be not worth it. Dunno, I cannot make up my mind... yet :grinning:

view this post on Zulip Jakub Konka (Jan 02 2025 at 22:37):

Anyhow, I'd be curious to learn what you think about it!

view this post on Zulip Jakub Konka (Jan 05 2025 at 07:34):

I'm trying to grasp the surgical linker pipeline in the compiler, and I don't seem to understand what the purpose of libapp.dylib actually is. So far I see that we emit it in spawn_surgical_host_build_thread where we:

  1. emit libapp.dylib
  2. rebuild host (presumably somehow related to step 1., but dunno how exactly just yet)
  3. preprocess host from 2. by stripping dynamic refs to libapp.dylib and symbols it exports
    Could anyone shed some more light at what the flow of data here is, and how the steps are interconnected at the compiler/link level?

view this post on Zulip Jakub Konka (Jan 05 2025 at 07:35):

While here I've also noticed that the number of dynamic system dependencies of libapp.dylib and rebuilt host are different (ignoring host dependence on libapp.dylib) which is surprising as somehow I thought they should match but I am probably not understanding the build process well yet.

view this post on Zulip Luke Boswell (Jan 05 2025 at 07:50):

Does this diagram help at all? https://github.com/roc-lang/basic-cli/blob/main/basic-cli-build-steps.png

view this post on Zulip Luke Boswell (Jan 05 2025 at 07:51):

My understanding is that libapp.dylib is all the roc stuff gets compiled to a shared library.

view this post on Zulip Luke Boswell (Jan 05 2025 at 07:51):

Then it gets linked to the host, so the host executable is expecting to be dynamically linked to a shared library.

view this post on Zulip Luke Boswell (Jan 05 2025 at 07:53):

Then the "preprocess" host step takes the compiled host executable (which is dynamically linking the roc app), and does some surgical magic to it so we can swap out those parts later with a different roc app (that still uses the same platform/API).

view this post on Zulip Luke Boswell (Jan 05 2025 at 07:54):

This is my layman's understanding. @Brendan Hansknecht is definitely the expert on how it works on a technical level.

view this post on Zulip Jakub Konka (Jan 05 2025 at 08:46):

Luke Boswell said:

My understanding is that libapp.dylib is all the roc stuff gets compiled to a shared library.

"all the roc stuff" is that a dummy app or similar written in roc?

view this post on Zulip Jakub Konka (Jan 05 2025 at 08:53):

Thanks @Luke Boswell ! I had a look at that diagram but it still left some questions unanswered. Perhaps if we go over an example from basic-cli? Say I am trying to run examples/echo.roc. In it, I specify platform/main.roc which corresponds to the cli platform, correct? So there are 3 components at play here: user program (echo.roc) <-> platform (cli) <-> host (??).

The first node (user program) is expected to be pluggable by the user as far as I understand, but what does it compile too? An object file or an executable that links against platform + host, or something else?

Next, platform <-> host are prebuilt. Am I correct here? And this is where hostinitially is linking dynamically against libapp.dylib which really is libplatform.dylib. We then surgically merge host with libapp.dylib to emit platform as a dynamically linked executable (on macOS, since we always link at the very least against libSystem.dylib). How am I doing so far?

view this post on Zulip Luke Boswell (Jan 05 2025 at 09:25):

Thanks for the chat @Jakub Konka, let me know if I can help in any other way.

view this post on Zulip Jakub Konka (Jan 05 2025 at 14:40):

Luke Boswell said:

Thanks for the chat Jakub Konka, let me know if I can help in any other way.

Thank you! It was very helpful!

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:31):

Ok, so I'm sure we mix up our own wording sometimes, but here are some details.

Firstly, naming. A roc executable is generated from 2 parts, the platform (basic-cli) and the application (echo.roc + any packages it uses). The platform is made out of two parts, the host (the part written in another language, rust for basic cli) and the roc API.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:32):

From a compilation and linking perspective, it is probably best to think of the split instead as host (again, the part in another language like rust or zig) and everything else (which is all written in roc)

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:32):

The goal with the surgical linker was to get as close to a completed executable as possible without compiling anything in a .roc file. Then to simply hack in the roc part of the code.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:33):

The closest thing to a completed executable, is a dynamically linked executable where the host is completely compiled, and it attaches to a shared libraries that contains all the roc code.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:34):

That is what we have the host compile into

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:36):

We have it link to a dummy shared library that just has all the correct headings to look like a correct roc app + platform for apis. We also could just emit a shared library containing a real application.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:40):

Oh, also, to clarify, one of your questions above, all .roc files compile into a single object file.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:42):

Ok, so the process is essentially:

  1. generate a dummy dynamic library (could also just generate a shared library from a real app)
  2. create a host that dynamically links to the library above
  3. Preprocess the host by rip out all the dynamic parts and recording useful info for completing linking quickly
  4. compile the roc code to an object file
  5. merge that with the preprocessed host fixing up everything

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 15:43):

generating the dummy shared lib was required to get host languages to generate a complete executables

view this post on Zulip Jakub Konka (Jan 05 2025 at 21:00):

That's excellent, thank you @Brendan Hansknecht and @Luke Boswell I think the bit I was missing was the fact that the host is dynamically linking against a dynamic library so that the linker synthesises all relevant bits which the surgical linker can then substitute with static references to an Roc app. As discussed with Luke already, this implies that any Roc app that is compiled into the host must not depend on any additional libc/framework symbols. This however does not match what I've seen so far for MachO where the app does contain references to _setjmp and _longjmp. I haven't investigated why or where that happens but will post it here once I find out more.

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 23:37):

I thought they were only used for tests and not regular roc....hmm

view this post on Zulip Brendan Hansknecht (Jan 05 2025 at 23:38):

And a lot of the other functions we force the platform to implement like memcpy, malloc, etc

view this post on Zulip Jakub Konka (Jan 06 2025 at 06:23):

I will try digging in. It definitely feels like a bug somewhere now that I know how's it supposed to work under-the-hood.

view this post on Zulip Jakub Konka (Jan 06 2025 at 07:52):

Looks like genuine roc (glue or builtins?) symbols missing from the platform
Screenshot 2025-01-06 at 08.49.52.png

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:00):

oh ok, these seemingly come from crates/compiler/builtins/bitcode/src/main.zig

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:12):

to my inexperienced eyes it seems to match @Brendan Hansknecht intuition - _setjmp and _longjmp are referenced by __roc_force_setjmp and __roc_force_longjmp which seem to be force into a translation unit (when using llvm) in loop https://github.com/roc-lang/roc/blob/main/crates/compiler/gen_llvm/src/llvm/build.rs#L1056-L1083

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:13):

yep, confirmed, commenting out __roc_force_setjmp and __roc_force_longjmp from the must_keep list does not emit refs to setjmp and longjmp

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:13):

Screenshot 2025-01-06 at 09.13.35.png

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:14):

so the remaining question is why this happens in the first place especially if it is assumed they should only be kept for test binaries (or similar)

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:16):

on re-reading the comments tho, are they really tests only? it seems they are special functions that should resolve to an LLVM intrinsic but for some reason they do not, at least not on aarch64-macos

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:29):

I haven't checked what happens on x86_64-linux - could anyone confirm for me if setjmp is present in echo.o built with roc build --no-link echo.roc? Perhaps it gets lowered to an actual in-place implementation by llvm, whereas on aarch64 (maybe only macos) to a libc call?

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:33):

I am not sure what Brendan meant by "tests" by presumably building tests in a roc app will require a platform with some extended functionality which provides PLT entries for those symbols (on arm64 macos), so perhaps when not building tests we could either:
1) check if we are indeed building a test app and check for __roc_force_setjmp only then, or
2) always link apps with -undefined dynamic_lookup

view this post on Zulip Jakub Konka (Jan 06 2025 at 08:34):

The latter carries the risk of not erroring out on undefined symbols that were meant to be resolved at link time though.

view this post on Zulip Luke Boswell (Jan 06 2025 at 08:34):

@Folkert de Vries may be able to answer these questions

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:03):

Yeah, I have a few notes:

  1. when I said only used in tests, I mean that is is only used when calling roc test ... (which doesn't actually link to a platform at all).
  2. It looks like we only emit setjmp and longjmp with the builtins for aarch64...no idea why.
  3. The must keep is supposed to stop them from getting dead code eliminated too early. I thought llvm still removed them later if they were unused.

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:10):

Looking into this a bit more, I think a lot of this is actually outdated infrastructure

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:10):

nowadays panicking in roc calls the platform exposed roc_panic.

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:10):

It used to be that panicking in roc was done via setjmp longjmp

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:15):

Hmm...I'm not actually sure if setjmp and longjmp are needed in roc anymore at all. That said, it appears in a lot of locations due to old infra that I think can get ripped out.

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:23):

That said, I still think some of our tests use setjmp longjmp. So unwinding this may be a bit complex...

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:25):

I don't think they should be needed unless we decide to make roc_panic automatically get translated into the original function call returning a Result to the host

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:26):

that said, I know that for nea Folkert has also implemented architecture-specific setjmp and longjmp using assembly that doesn't require a libc dependency, so that could be an option if we decide to do that

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:26):

either way, I don't think we should need to link them

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 17:35):

Yeah, I think they are just still used in many places across the compiler. Like all the gen-tests use them. Also roc test may use them, but not 100% sure.

view this post on Zulip Ayaz Hafiz (Jan 06 2025 at 17:38):

i think roc test uses it for expects

view this post on Zulip Ayaz Hafiz (Jan 06 2025 at 17:38):

and they are used for gen tests to catch errors

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:48):

hm, why would inline expects need to longjmp? :face_with_raised_eyebrow:

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:48):

they don't halt the test, they just record that the test should fail, report the failure, and keep going

view this post on Zulip Richard Feldman (Jan 06 2025 at 17:48):

and top-level expects should just run the code to completion

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 18:06):

For roc test, I think it is used to return back to main execution on a panic.

So run top level expect, It fails due to hitting a panic. Long jump to code to print the failure and continue to the next top level expect

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 18:07):

For gen_test, I think we should be able to implement roc_panic in "platform". And it can use setjmp and longjmp. Avoiding needing it in the builtins.

view this post on Zulip Richard Feldman (Jan 06 2025 at 18:15):

ahh that makes sense

view this post on Zulip Jakub Konka (Jan 06 2025 at 21:08):

So here's my understanding of the situation based on the discussion above:

view this post on Zulip Jakub Konka (Jan 06 2025 at 21:11):

Since bullet point 3 is going to happen sooner or later, I am thinking of simply special-casing those two externs (setjmp and longjmp) in the linker and simply ignoring them. If nothing apart from __roc_force_setjmp and __roc_force_longjmp is referencing them, then it should not impact linking in any way. This way, once bullet point 3 lands, updating the linker amounts to removing this special casing and nothing else. Is this a viable temporary workaround?

view this post on Zulip Jakub Konka (Jan 06 2025 at 21:59):

Proposed change to the macho linker https://github.com/roc-lang/roc/pull/7474

view this post on Zulip Brendan Hansknecht (Jan 06 2025 at 22:37):

That sounds good

view this post on Zulip Jakub Konka (Jan 10 2025 at 21:49):

I don't think there is any need in shifting loadable segments in file offsets/memory when preprocessing host. Instead, it should be fine to rely on the linker to emit enough padding between the end of load commands and the start of the first section to insert the required Roc load commands. Then during surgical link we would put the segments after the last existing loadable segment but before the __LINKEDIT segment - __LINKEDIT segment always has to come last so there's nothing we can do about that anyhow.

Proposed changes 7499

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:51):

I wonder if elf has lower alignment constraints for this. I think with elf there was regularly not enough space (thus all the shifting)

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:52):

As a note, we may also need a few more sections eventually. .data and potentially both of .tbss and .tdata

view this post on Zulip Jakub Konka (Jan 10 2025 at 21:52):

Yeah, very possibly. For MachO this is so common to want to add new load commands that there is a linker flag for just that -headerpad size.

view this post on Zulip Jakub Konka (Jan 10 2025 at 21:53):

Yeah, I expect more sections, but for the time being let's do something simple(r) and get it working first.


Last updated: Jul 06 2025 at 12:14 UTC