add an `init` function to `roc` code · ideas

I want to keep this proposal with a very small scope. This proposal is just to deal with some linking pains that current roc causes. Any sort of static global initialization and such is out of scope. In the future, that is a possibility, but it is ignored for now.

The fundamental reason for this change is that roc_alloc, roc_panic, etc exist. In other words, it is because roc depends on calling into the host. There are also all of the roc_fx_* functions, but those will go away when we switch to effect interpreters.

When it comes to shared libraries, it is not normal for a shared library to call into a host. As such, when compiling, hosts assume this is impossible and DCE a lot of things. To work around the problem in basic cli, we have this monstrosity. Recently with @Luke Boswell, I was trying to do the same in the roc compiler for some glue changes. We end up in a painful situation where we get weird compiler crashes.

example crash

thread 'main' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0xf', /Users/bren077s/Projects/roc/crates/compiler/gen_llvm/src/run_roc.rs:41:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
stack backtrace:
   0:        0x107890788 - std::backtrace_rs::backtrace::libunwind::trace::h35c35bd6a1eea1a3
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:        0x107890788 - std::backtrace_rs::backtrace::trace_unsynchronized::h6e6a37fdc09ec2da
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x107890788 - std::sys_common::backtrace::_print_fmt::h6ae2889d87a644d1
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:65:5
   3:        0x107890788 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb75be2413dd8ceb3
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:44:22
   4:        0x1078b2580 - core::fmt::rt::Argument::fmt::hfc9103857ff63de2
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/rt.rs:138:9
   5:        0x1078b2580 - core::fmt::write::h95d50546e769656f
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/mod.rs:1094:21
   6:        0x10788c0a4 - std::io::Write::write_fmt::hce019ca594763835
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/io/mod.rs:1713:15
   7:        0x1078905dc - std::sys_common::backtrace::_print::h15dce0f07dfee3db
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:47:5
   8:        0x1078905dc - std::sys_common::backtrace::print::hae7307dcada41b2a
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:34:9
   9:        0x107891ab4 - std::panicking::default_hook::{{closure}}::h30c41986d637ef23
  10:        0x1078918bc - std::panicking::default_hook::h81d03189ef2e7e78
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:288:9
  11:        0x107891f24 - std::panicking::rust_panic_with_hook::he8360f4d28da55fc
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:705:13
  12:        0x107891df4 - std::panicking::begin_panic_handler::{{closure}}::he9e07d605072520f
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:595:13
  13:        0x107890b68 - std::sys_common::backtrace::__rust_end_short_backtrace::he03bf52f9c1eb73a
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:151:18
  14:        0x107891ba4 - rust_begin_unwind
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:593:5
  15:        0x10790f408 - core::panicking::panic_nounwind_fmt::h6a635b966d82551e
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:96:14
  16:        0x10790f484 - core::panicking::panic_nounwind::h241eff56ec55ba7a
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:126:5
  17:        0x10790f578 - core::panicking::panic_cannot_unwind::h911ff5cd61f62d63
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:188:5
  18:        0x1042bcf54 - from<roc_std::RocResult<roc_std::roc_list::RocList<roc_glue::roc_type::File>, roc_std::roc_str::RocStr>>
                               at /Users/bren077s/Projects/roc/crates/compiler/gen_llvm/src/run_roc.rs:37:5
  19:        0x1042ed660 - generate
                               at /Users/bren077s/Projects/roc/crates/glue/src/load.rs:158:31
  20:        0x104206350 - main
                               at /Users/bren077s/Projects/roc/crates/cli/src/main.rs:116:17
  21:        0x1042057e4 - call_once<fn() -> core::result::Result<(), std::io::error::Error>, ()>
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250:5
  22:        0x104204ca0 - __rust_begin_short_backtrace<fn() -> core::result::Result<(), std::io::error::Error>, core::result::Result<(), std::io::error::Error>>
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135:18
  23:        0x10420a48c - {closure#0}<core::result::Result<(), std::io::error::Error>>
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:166:18
  24:        0x1078847fc - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h0689b9cc840db667
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:284:13
  25:        0x1078847fc - std::panicking::try::do_call::h8d21b0c0c04af112
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
  26:        0x1078847fc - std::panicking::try::h618481d45c1b815c
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
  27:        0x1078847fc - std::panic::catch_unwind::hbdeff70f3984ee7b
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
  28:        0x1078847fc - std::rt::lang_start_internal::{{closure}}::haa4994ba13a3cd15
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:48
  29:        0x1078847fc - std::panicking::try::do_call::h39b55541875d339a
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
  30:        0x1078847fc - std::panicking::try::h93ac0a218f84acad
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
  31:        0x1078847fc - std::panic::catch_unwind::h07a4f62359dfd8f0
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
  32:        0x1078847fc - std::rt::lang_start_internal::hdd06e3566639fc5b
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20
  33:        0x10420a464 - lang_start<core::result::Result<(), std::io::error::Error>>
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:165:17
  34:        0x1042090c4 - _main
thread caused non-unwinding panic. aborting.
[1]    82053 abort      cargo run glue crates/glue/src/RustGlue.roc /tmp/glue

The roc compiler is harder to shoehorn in these symbols because it is a much larger and more complex code base. Not to mention, Rust is a lot more finicky around these features when trying to emit a static binary like via musl.

All in all, this is essentially not a supported feature and I wouldn't be surprised if it totally fails with certain security settings or on specific operating systems or rust updates.

To fix this pain, I think that roc should never call into the host except via function pointers. My suggestion is that we expose a roc_init function that takes all of these pointers as input and sets some globals on the roc side. Roc applications will no longer directly depend on the host functions. Instead it will be though function pointers. (and in the future through returning commands to the effect interpreter).

My only real concern with this idea is that I am not sure if/how it works with wasm.

Brendan Hansknecht (Jan 14 2024 at 02:04):

I need to double check, but I am pretty sure that stripping a host binary also breaks the ability of a shared lib to depend on functions in the host.

Richard Feldman (Jan 14 2024 at 02:07):

Richard Feldman (Jan 14 2024 at 02:08):

you just pass in the struct of function pointers when you call the Roc function from the host, no linking or init or anything like that involved

Richard Feldman (Jan 14 2024 at 02:08):

Brendan Hansknecht (Jan 14 2024 at 02:08):

Sure, that is fine too. It is theoretically slightly more cost, but shouldn't matter in practices.

Richard Feldman (Jan 14 2024 at 02:09):

Brendan Hansknecht (Jan 14 2024 at 02:09):

Brendan Hansknecht (Jan 14 2024 at 02:10):

But in general effects should be much slower than the cost of the function call. And it is just a pointer that always consumes a register passed through all roc functions.

Richard Feldman (Jan 14 2024 at 02:10):

true, although (A) that'll change after effect interpreters and (B) if it's being passed around as a pointer to a struct anyway, probably doesn't matter haha

Richard Feldman (Jan 14 2024 at 02:11):

Brendan Hansknecht (Jan 14 2024 at 02:12):

Brendan Hansknecht (Jan 14 2024 at 02:24):

Richard Feldman (Jan 14 2024 at 02:31):

Brendan Hansknecht (Jan 14 2024 at 02:34):

As a note, as I am looking into this more, I think we would need to add a void * context parameter to roc_alloc and friends. The allocator struct we pass into roc would also contain the void *context.

Brendan Hansknecht (Jan 14 2024 at 02:34):

Brendan Hansknecht (Jan 14 2024 at 02:35):

Otherwise, the rust would still be stuck with needing to figure out how to use globals to deal with distinguishing async threads to be able to use the right arena when calling roc_alloc

Brendan Hansknecht (Jan 14 2024 at 02:36):

Cause you can't pass a pointer to rust closure that use the arena as roc_alloc

Brendan Hansknecht (Jan 14 2024 at 02:38):

but anyway, that is the standard way to build this kind of api in c, so no big deal.

Brendan Hansknecht (Jan 14 2024 at 02:38):

Brendan Hansknecht (Jan 14 2024 at 06:38):

Brendan Hansknecht (Jan 14 2024 at 06:39):

Richard Feldman (Jan 14 2024 at 11:31):

Brian Carroll (Jan 14 2024 at 13:05):

I don't see any concerns here for Wasm. If there are any specific questions let me know.

Romain Lepert (Jan 14 2024 at 14:03):

Can I ask what are effect interpreters and where these have been discussed / presented ?

Brendan Hansknecht (Jan 14 2024 at 15:44):

My only concern/unknown, can a js host pass a function pointer into roc wasm code?

Though now that I think of it, there is normally a layer of indirection. So it would be a zig wasm host passing a function pointer into roc, that must work, right?

Brian Carroll (Jan 14 2024 at 17:32):

The Wasm module defines a set of named JS imports with expected type signatures, and they all get a function index. That function index is what you use to call it either directly, or in this case, indirectly. The index is how Wasm actually implements a "function pointer".

Brian Carroll (Jan 14 2024 at 17:33):

But also as you mentioned, in our JS hosts we often add a Zig layer in between the JS and the Roc.

Brendan Hansknecht (Jan 14 2024 at 18:02):

Oskar Hahn (May 01 2024 at 09:31):

@Brian Carroll I am not sure, if this will work with wasm. If you have a zig layer, then there will be no problem. But it would be nice, if you could build a Wasm module with Roc without zig.

When I understand it correctly, you have to do this in wasm with a Table. But you can not store JavaScript functions in a table. Only functions exported by Wasm can be stored in a table.

You can pass a data structure as described here to Wasm, when you replace the function with the Table indexes. But I don't see, how you can fill the Table with your functions?

Brian Carroll (May 01 2024 at 09:48):

1) When you "create a function pointer" in languages like C or Zig or Rust, and compile it to Wasm, the compiler "puts the function index into a table". And then if the C/Zig code "calls the function though a pointer" it compiles to the call_indirect instruction which takes the table index of that function. This is just how function pointers are implemented in the Wasm instruction set. There is no concept of "tables" C or Zig or JS or any other high-level language. It is a low-level WebAssembly concept.

2) If your C/Zig program declares an extern function, that will compile to an "import" in the Wasm module. And imports do have function indices, just like any other Wasm function. So you can put them into the table. If you want to express that in C/Zig you just take the address of that extern function like &my_imported_function.

3) Roc is _deliberately_ not capable of expressing low level concepts like function pointers because it is a higher level language. You are going to need some lower level language that compiles to Wasm because otherwise you just cannot express the concept of a function pointer.

Doing it without Zig, but with C or Rust or something instead, is totally possible and "just a matter of doing the work".
But doing it without a systems-level language... I'm not sure. You definitely can't do function pointers. Maybe you could write a memory allocator in JS. It would definitely be hard. It might not be possible without breaking some core principle of Roc somewhere.

Oskar Hahn (May 01 2024 at 11:00):

How hard it is to write an memory allocator in JS might depend on your use case. I found it quite easy to write an arena memory allocator in JS.

The proposal was created, so Roc does not have to call into the host, which fixes linking problems with shared libraries. The first idea was to solve this with an init function. The later idea was to pass the allocator around. This is a nice idea. It makes it possible to build arena allocators without threadlocal. But it comes at the cost, that it is no longer possible to build wasm modules without a system-level language.

Could you consider, if there are other ways to solve the problems, where it is still possible to compile Roc to a Wasm module without a host language?

To solve the arena problem, you could just pass around a Context. This would just be a pointer to something.

For the initial problem, Roc could provide two solutions. Either the current way, where the host exports the functions, or with an init function, that passes in function pointers. Of cause, this only works, if the compiler or linker can detect, if the host exports the functions. I don't know, if this is possible.

But maybe there are other ways to solve this. I think, there are a lot of interesting things you could do with Roc if you compile it to Wasm that would get harder.

Stream: ideas

Topic: add an `init` function to `roc` code

Brendan Hansknecht (Jan 14 2024 at 01:57):

Brendan Hansknecht (Jan 14 2024 at 02:04):

Richard Feldman (Jan 14 2024 at 02:07):

Richard Feldman (Jan 14 2024 at 02:07):

Richard Feldman (Jan 14 2024 at 02:08):

Richard Feldman (Jan 14 2024 at 02:08):

Brendan Hansknecht (Jan 14 2024 at 02:08):

Richard Feldman (Jan 14 2024 at 02:09):

Richard Feldman (Jan 14 2024 at 02:09):

Brendan Hansknecht (Jan 14 2024 at 02:09):

Brendan Hansknecht (Jan 14 2024 at 02:10):

Richard Feldman (Jan 14 2024 at 02:10):

Richard Feldman (Jan 14 2024 at 02:11):

Richard Feldman (Jan 14 2024 at 02:11):

Brendan Hansknecht (Jan 14 2024 at 02:12):

Brendan Hansknecht (Jan 14 2024 at 02:12):

Brendan Hansknecht (Jan 14 2024 at 02:24):

Richard Feldman (Jan 14 2024 at 02:31):

Richard Feldman (Jan 14 2024 at 02:31):

Richard Feldman (Jan 14 2024 at 02:31):

Brendan Hansknecht (Jan 14 2024 at 02:34):

Brendan Hansknecht (Jan 14 2024 at 02:34):

Brendan Hansknecht (Jan 14 2024 at 02:34):

Brendan Hansknecht (Jan 14 2024 at 02:35):

Brendan Hansknecht (Jan 14 2024 at 02:36):

Brendan Hansknecht (Jan 14 2024 at 02:38):

Brendan Hansknecht (Jan 14 2024 at 02:38):

Brendan Hansknecht (Jan 14 2024 at 06:38):

Brendan Hansknecht (Jan 14 2024 at 06:39):

Richard Feldman (Jan 14 2024 at 11:31):

Brian Carroll (Jan 14 2024 at 13:05):

Romain Lepert (Jan 14 2024 at 14:03):

Brendan Hansknecht (Jan 14 2024 at 15:44):

Brian Carroll (Jan 14 2024 at 17:32):

Brian Carroll (Jan 14 2024 at 17:32):

Brian Carroll (Jan 14 2024 at 17:33):

Brian Carroll (Jan 14 2024 at 17:33):

Brendan Hansknecht (Jan 14 2024 at 18:02):

Oskar Hahn (May 01 2024 at 09:31):

Brian Carroll (May 01 2024 at 09:48):

Oskar Hahn (May 01 2024 at 11:00):