Stream: ideas

Topic: enabling dead effect elimination


view this post on Zulip Richard Feldman (Nov 23 2023 at 12:33):

I thought of a future way we could reduce the size of binaries like hello world on basic-cli:

view this post on Zulip Richard Feldman (Nov 23 2023 at 12:33):

the basic idea is that we have roc generate a function which runs the effect interpreter switch for the host.

So for example, we say:

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:12):

That is no different than what we have today with the legacy linker from a function garbage collection standpoint.

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:13):

sure, but it would work with the effect interpreter

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:13):

which the current effect interpreter design would not :big_smile:

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:15):

There are a few core missing pieces:

  1. This isn't a solution that will work in a world of all surgical linking
  2. It is up to the host compiler to generate linking information in a way that enables us to clean up the dead functions and their dependencies (this is why the legacy linker despite theoretically being able to do this does not get super small binaries)

view this post on Zulip Brian Carroll (Nov 23 2023 at 16:15):

Oh wow that's a great idea :light_bulb:

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:16):

I'm assuming someday we could make the surgical linker do this DCE

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:16):

doesn't seem that tricky, right? Write down all the symbols we define, go through and see which ones actually get used; all the ones that never get used are dead and can be eliminated

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:17):

(maybe we'd only want to do that in --optimize)

view this post on Zulip Brian Carroll (Nov 23 2023 at 16:17):

And the transitive calls

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:18):

:thinking: transitive calls?

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:21):

I don't think it would work. Generally that information is gone in the final executable. So even though we know the locations of a few symbols, we don't really have the information to remove them (it is very hard to shift anything around at a minimum cause all relocations in the exe are already resolved). On top of that, in any system where we are having the host expose symbols to a shared library, we are already doing tricks just to get the host to keep the symbols around. So they will all look like they are being used by the host itself anyway.

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:26):

Also, if roc builds the state machine, suddenly you lose the ability to make it an async rust state machine which is one of the main gains of the new system.

view this post on Zulip Brian Carroll (Nov 23 2023 at 16:28):

I implemented this for Wasm. I trace the full call graph of what is used and eliminate everything else.
You have to do the full call graph because functions call other functions. Starting with what you want to keep ends up being more efficient I think.
The tricky part is indirect calls. If you have any of those, it's tricky to be confident whether they're called or not.
And I believe we are planning to use those for closures.

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:32):

Brendan Hansknecht said:

Also, if roc builds the state machine, suddenly you lose the ability to make it an async rust state machine which is one of the main gains of the new system.

I think that part can work fine - we can have roc expect the host-implemented effect function to receive a callback Roc closure and a pointer to the loop state (which the switch function would have received from the host, and can pass along), so it knows what callback to run once the async effect is done

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:34):

You cant call an async function in a function called by roc code

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:35):

Rust really needs the hole stack back down to the root to build it correctly

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:35):

So rust has to own the switching function

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:35):

oh you mean literally using Rust's async keyword

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:35):

At least that is what I remember from really trying to make it work before with the current effect system.

view this post on Zulip Richard Feldman (Nov 23 2023 at 16:36):

as opposed to like directly using io_uring in a host that happens to be implemented in Rust

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:36):

Being able to use hyper with nonblocking async http requests for example

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 16:37):

Even if you can call into io uring, you would have to make a blocking call to io_uring in the rust effect. No way to let something else run on the same thread like you get with async rust

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:01):

hm, I wonder if there's some way we could make the DCE happen

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:02):

it feels like if we have the right information in the host and give the right app usage information to the surgical linker, with their powers combined it seems like it should be possible

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:07):

like an alternative idea (that sounds kind of ridiculous, but potentially possible) is to do some partial interpreting of the actual instructions in the host, based on knowledge of what discriminants mainForHost could possibly return, and then eliminating branches which follow from jumps on comparisons that we know will never pass

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:08):

that wouldn't solve the "these symbols look like they're used for other reasons" problems, but if those functions are only called from eliminated branches, then their usages go to zero and they can be eliminated

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:09):

Two thoughts:

  1. If we write our own full linker to do the linking and preprocessing of the host. it definitely could generate everything we need and cache stuff (though it would still depend on the host compiler splitting every single function out into its own section so we can gc it)
  2. Maybe we could make features be other shared libraries. Then surgically link everything....nvm, that won't work...we won't be able to surgically link them cause they will have dependencies that the surgical linker won't be able to handle.

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:11):

maybe instead of writing our own full linker we could fork existing ones like lld (especially if we'd only be using them in --optimize)

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:11):

That said, even if we have every single function in it's own section, without relocation information, we have to disassemble the entire application and understand all indirect calls to have any hope of being able to remove code. On top of that, any resolve relocation likely will be a pain to move, so we will hit a wall in terms of trying to actually move around the code to shrink the binary.

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:12):

I don't think there would be any indirect calls in this part of the program

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:12):

Yeah, I don't think we can do this with the surgical linker. I think it really needs to be done with a standard linker at the cost of link time performance.

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:13):

I don't think there would be any indirect calls in this part of the program

We have to get the full transitive list of calls and dependencies. For example, in the hype case, we want to remove as much of tokio, the async stack, web stuff, tls, etc as possible (of course we can't remove all of it)

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:14):

hm yeah that's a good point

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:14):

If we can't tell if a function is called by some random indirect call elsewhere in the app, we can't remove it.

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:14):

even if you never do http, tokio has to spin up

view this post on Zulip Richard Feldman (Nov 23 2023 at 17:14):

because that's structurally part of the host program no matter what

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:14):

yeah

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:20):

I think if people want binary optimization that cares about a few megabytes, they have a number of options without us dealing with this:

  1. fork the platform and just cut some dependencies (platforms could even have feature flags to enable this first class without forking, just need building from source)
  2. Use a slimmer platform or build an optimized platform for your application may in not rust to save on bytes.
  3. Let the host compiler control the final build. Just have roc emit a static library or object file that the host compiler can run gc-sections on. If you are really extreme, give roc a mode to emit only the tag variants that are used by an application and the regenerate glue in an informative way or specify feature flags to the host platform to enable the host to automatically cut out sections of it's code.

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:21):

Like today, we could have a small version of basic cli if we just make basic_cli_no_web or basic_cli_ureq that forces always blocking io and doesn't pull in an async runtime. (apparently ureq isn't really smaller than tokio + hyper for this), or probably basic_cli_zig

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:25):

given feature flags are a thing in rust, it wouldn't be hard to cut multiple version of basic cli that clip off the largest effects if wanted but still guarantee the exact same api for roc. Can even have a friendly panic message,

basic_cli_no_web does not support web requests, but you called an http effect.
Please switch to `basic_cli_with_web` if you want to use http effects
"link to basic_cli_with_web"

view this post on Zulip Brendan Hansknecht (Nov 23 2023 at 17:27):

Of course, that doesn't have to be at runtime, basic-cli-no-web could cut a release that fully removes all things http even from the roc side and then when someone asks about http calls, we could point them to the other platform.


Last updated: Jun 16 2026 at 16:19 UTC