Stream: ideas

Topic: module params Task-based ffi


view this post on Zulip Richard Feldman (Jul 31 2024 at 01:30):

Brendan Hansknecht said:

Personally, if a user opts into it, I would prefer to just allow roc to load a shared library as a platform extension.

some random thoughts on this topic

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:32):

supposing I'm an application author and I want to use BLAS/LAPACK, or sqlite, or something like that in my CLI app or web server...it seems fine for me to be able to say "hey I'm going to opt into bringing this into my code base, even though it's not coming from the platform and could potentially have security problems etc"

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:34):

and really, at that point, if I'm just sort of trusting the dependency to "follow the rules" regarding memory (e.g. no copying like we talked about in #ideas > wasm Task-based ffi?) then I guess it's about the same to allow pure functions in FFI

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:35):

in the sense that if I'm trusting this third-party code to run unrestricted in my own process, and then trusting the pointers it gives back, the consequences of what can go wrong are essentially unlimited, so "it said it was pure but it actually wasn't" is kind of a drop in the bucket at that point

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:38):

obviously for something like this, platforms would have to be able to say that this sort of thing is or isn't allowed, e.g. so that safe-script can actually continue to be safe :stuck_out_tongue:

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:38):

and plugins and wasm can still work, etc. etc.

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:38):

one thing I really dislike about this area is that although it opens up some new and potentially positive user experiences, it will absolutely introduce new categories of negative user experiences

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:39):

like today, if I get a segfault in my roc app, there are a grand total of two possible explanations:

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:39):

same with weird memory corruption issues or anything like that

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:39):

in this world, I now have N potential explanations, where N is the number of dependencies I have on third-party FFI code

and the more ergonomic it is to use those dependencies, the more they'll get used, the more those bugs will come up, and the harder it will be for people to track them down

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:40):

And the less incentive to write nice Roc packages

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:41):

right, that too

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:41):

another problem with dylibs specifically is that the default distribution story is absolutely rancid

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:41):

I think there is a nice balance with WASM. It's a great escape hatch for, I have a lot of expensive legacy code... I can now wrap it and use it. But I still get to work with the nice Roc ecosystem

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:42):

And to do that with WASM, you need a shim, which requires some API design and thought into how to do it well

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:42):

I still remember earlier in my career when I'd install a package and see linker errors when my program started up and having absolutely no idea how to solve it...so I'd hunt down some cursory explanation on the internet, and get it working on my computer, then it wouldn't work on my teammate's computer, then it wouldn't work on our server etc.

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:43):

and then after getting all of that working, eventually there would be a server upgrade and it would stop working again

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:43):

I just really do not want to...I'll use the word inflict because that's how I feel about it...that user experience on end users

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:43):

obviously there are some great uses for dylibs in terms of loading plugins and so forth

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:44):

but just for general like "I want access to this code that's written in another language" the user experience is so hostile

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:44):

Roc's goal is to be high level... arbitrary FFI explodes the complexity out in other ways too. Does a library/package author now need to support all current and future roc targets? compiling to macos-aarch64, linux-aarch64 ...

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:44):

I think it honestly might be best as an escape hatch that is not allowed in the package ecosystem.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:45):

Like you can go manually download the roc-tf-shim, compile it to a shared library and extend basic-cli with it, but you will never get any sort of shared library dependent code form the package ecosystem.

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:46):

At that point, why not just fork the platform?

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:46):

that's an option, but it creates an incentive to end up with READMEs like "you can't install this as a normal Roc package, so to install it, here's what you do: copy these .roc files, and then install this dylib..."

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:46):

Like we need friction to the level of a user writing performance (or depeendency) critical code that they might use once or twice.

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:46):

If you need high performance and that level of flexiblity, you can always make a custom platform

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:46):

why not just fork the platform?

I think it is about the need for extensibility in various languages without forking

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:47):

And a normal user could do

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:47):

Like it basically allows for opt in platform composibility at higher cost to setup and friction.

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:48):

I'm confused tough... I thought you suggested we can have these, but they cant be made into a package.

Or do you mean, they are explicitly not permitted on the centralised index?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:48):

Cause as a library author, I would love to make roc-ml that wraps some machine learning framework and anyone can use with any platform if they really need ml.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:49):

they are explicitly not permitted on the centralised index?

Yeah. Can't be on the central index and roc will never just load one. Always requires more specific opt in and setup.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:49):

So a user will be 100% clear what they are getting into

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:49):

Brendan Hansknecht said:

they are explicitly not permitted on the centralised index?

Yeah. Can't be on the central index and roc will never just load one. Always requires more specific opt in and setup.

fwiw I know from Elm that if you do this, someone will build a competing centralized index that allows this

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:49):

In a future world where WASM gets super powers like SIMD etc... could you make a roc-ml using WASM?

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:50):

I think wasm already supports simd

view this post on Zulip Luke Boswell (Jul 31 2024 at 01:52):

Oh I see. You are talking about just wrapping an existing framework. It would have to already support WASM

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:55):

Also talking about doing something effectful. An ml library will at a minimum use the gpu if one is available (arguably effectful, but for roc, definitely an effect).

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:56):

setting aside other considerations for a moment, how would allocations work in that world?

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:56):

like let's say my platform is nea

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:57):

I bring in something that uses malloc for all of its allocations

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:57):

how does that work?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:57):

It uses malloc and nea has less guarantees.

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:57):

haha I think it means nea doesn't work anymore :sweat_smile:

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:57):

Why?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:58):

Just set nea not to use 100% of ram and malloc can live in the last x%

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 01:58):

Or it means that nea doesn't support these extensions. Which is also fine

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:59):

yeah the whole point of nea is that each request gets a fixed amount of memory, and OOM can't affect the others etc.

view this post on Zulip Richard Feldman (Jul 31 2024 at 01:59):

yeah it probably wouldn't be able to support that

view this post on Zulip Luke Boswell (Jul 31 2024 at 02:00):

Doesn't WASM have the same problem though? Or would it be using an interpreter that mallocs from the platform just the same.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:00):

Yeah, I don't think this needs to work with every platform. As was mentioned before, any platform should be able to opt out.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:01):

Doesn't WASM have the same problem though?

No, we can force wasm through roc alloc. Or allocate a static buffer for it. That said, it might jump the minimum memory footprint of each request.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:02):

at this point though, what would be the advantage of making something more first-class compared to the status quo where #ideas > Shared Library FFI Packages is already possible?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:08):

Having roc generate the tasks and types for calls would be huge for reducing errors. Theoretically the shims could even get glue support the same as platforms. Also, it unties it from platforms which is nice. Oh, and libffi is a pain to work with, having a static api known at compile time is simply way nicer.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:10):

what if we made glue work for any module, not just platform modules?

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:10):

like if you pass it an ordinary interface module, it works on whatever's exported

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:12):

It would have to generate different code for ffi modules unless roc is dispatching the effects or somehow enriching the types the platform sends over.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:13):

Cause everything has extra indirection with ffi

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:13):

but the glue script itself could take care of that, right?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:13):

at least with runtime ffi

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:13):

as long as it has access to the roc types like normal

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:13):

Just noting it is different from generic platform glue if using #ideas > Shared Library FFI Packages. If we supported it directly, it would be the same as normal platform glue.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:15):

hmm... This is probably gonna fall apart with effect interpreters. Cause you would need ffi calls to interact with the host state machine. Perferably to be async somehow. Otherwise, they just block the async effect interpretter.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:16):

yeah that's part of why I like the idea of having this be more of a userspace thing that platform authors can implement (since they already can implement whatever they want)

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:16):

like for example they can choose to open things in a subprocess with shared memory

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:17):

to avoid blocking their own process's threads

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:17):

or offer 2 different primitives, one of which runs in the current process and the other of which is in a separate process, etc.

view this post on Zulip Luke Boswell (Jul 31 2024 at 02:18):

If you're loading an arbitrary dylib, it could make a syscall though too right? spwan children etc

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:18):

yeah, that's expected.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:18):

The goal here is to able to load a dll that can run compute on the gpu for example.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:19):

so then a generic library could exist that enables most platforms to be able to access gpu compute

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:19):

That is at least my simple motivating example.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:20):

yeah, the relevant part to me is that this is not a new language feature or anything

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:20):

it's just a thing that all platforms can innately do because they're written in languages that can open dylibs :big_smile:

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:21):

I think it's a very relevant distinction if it's a first-class thing in the language compared to something that a given platform can choose to offer, or not, in userspace

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:22):

I think the biggest pain is that dlls are much nicer to interact with if you know the api at compile time. Like even if we are using libffi, it would be preferable to send a roc type spec to the platform so it can setup the call properly without extra boxing or anything

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:22):

for example, the fact that there are versions that are varying degrees of safe

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:22):

it's already innately the case that you have to trust your platform

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:22):

because they can do arbitrary code execution

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:23):

Brendan Hansknecht said:

I think the biggest pain is that dlls are much nicer to interact with if you know the api at compile time. Like even if we are using libffi, it would be preferable to send a roc type spec to the platform so it can setup the call properly without extra boxing or anything

but can't glue do that already?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:24):

No, the platform is precompiled. It won't get ffi glue to understand the types.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:26):

hm, ok so what would a language feature version of this look like?

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:26):

like let's say you want to wrap a ml library in a dylib and make it available to any application author whose platform offers support for running dylibs without safety checks

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:26):

what would the language feature be that facilitates that?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:28):

So still with a platform specific set of ffi primitives and trying to add as little to the language as possible, but not make it painful to write the ml dylib

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:31):

hm, so where would the foreign types be specified?

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:31):

I think would would be wanted would be:
The ability to write a hosted module for dylibs that can do glue generation and can create a type spec for each generated function to pass to the platform.

I think the rest could be orchestrated in userland.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:32):

Oh, also would want a way to pass something dynamic to the platform that roc would guarantee matches the type spec.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:32):

Need to think about this more, but roughly something like that.

view this post on Zulip Richard Feldman (Jul 31 2024 at 02:36):

hm, yeah would be helpful to see a concrete design I think!

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:37):

I'll think about it for sure.

view this post on Zulip Brendan Hansknecht (Jul 31 2024 at 02:38):

There's a chance it could be done with the new encode and allowing glue to generate for a hosted file directly.

view this post on Zulip Matthieu Pizenberg (Jul 31 2024 at 16:38):

Brendan Hansknecht said:

I think it honestly might be best as an escape hatch that is not allowed in the package ecosystem.

this very much

view this post on Zulip Brendan Hansknecht (Aug 01 2024 at 03:07):

Ok, so I have been looking at libffi more and thinking about this.

Note: I am thinking from an effect interpreter world view.

Base State

  1. Today, Roc platforms can add ffi based effects (even if they are bad ergonomics/bug prone). As such, via module params, any package in the ecosystem could theoretically depend on ffi. That said, it requires double opt in to use (1, the platform has to support ffi primitives. 2, the user must explicitly pass the ffi primitives to a package).
  2. Writing ffi packages at best will have the same ergonomics as building a platform. In many setups, they may have worse ergonomics.
  3. The platform will not know anything about the ffi until runtime. It won't know shared library path, it won't know the names of functions to call, it won't know the arguments/return types of the functions. As such, libffi is really the only option.
  4. The Roc app can know all of that information at build time. It would be a big shame to make that untyped/manually/stringly typed instead of using knowledge that needs to be specified in roc anyway (like the effect apis). My prototype in #ideas > Shared Library FFI Packages is essentially untyped. Just sends a box every time and has no way to check types at all.
  5. Even without ffi forking a platform is an option. It is just a lot less flexible. I think some tools fit ffi very well. A simple example is accelerated math packages that run on cpu/gpu (blas, lapack, pytorch, etc). The packages could be wanted by many different platforms, but don't really fit into any specific platform. It highly depends on my app (and hardware) if I need high performance math acceleration for my basic-cli app. Or basic-webserver. Or sdl2 game. Maybe I want to use basic-cli but do physics calculations.

Primitives

As mentioned above, libffi is really the only option for runtime configured ffi. For a good experience, we need to generate their primitives or something that can easily be translated into their primitives.

Fundamentally, libffi has two primitives:

  1. ffi_type for specifying argument and return types.
  2. void ** cause it doesn't know any of the argument types. (essentially a list of pointers to the arguments)

ffi_type is essentially a enum for primitive types. For more complex types, it is a nested structure where complex types reference the primitive types that build them up. This preferably is generated once per ffi function and then reused from that point forward.

The void ** for the args and a void * for the return type is needed every single call.

None of these primitives can be generated directly in roc. So they must either be constructed at runtime by some sort of tagged structure passed to the host, be generated by some form of glue (only viable for ffi_type), or generated by the compiler itself.

Gold Standard

What would the gold standard be? Not saying this should be what roc does, but I generally want to explore what would be the best experience while still constraining to going through the host for all effects. That way, for example, an async host can choose to run ffi calls on a blocking thread (I currently don't have any idea how we could make it work with async).

I think the best experience would be:

  1. Roc has a special hosted-ffi module type that supports glue. The Glue generates the roc types, the effect function prototypes and the ffi_type constant for the function (this could alternatively be generated directly in the roc binary when loading an ffi function, but it would have to be created by the compiler, not roc code).
  2. hosted-ffi does not generate Task. Instead it generates FfiTask. FfiTask can be passed to the platform. When it is passed to the platform, it is passed as a tuple of args, a slot for return type, and a void ** pointing to each individual arg in the tuple. The platform can pass that directly off to libffi and is free from any sort of complex type wrangling. Since the platform is in control, it can spawn it in a subprocess, thread, or just run it directly. After completion of ffi, control is return to roc and roc will deal with cleaning up the args and moving the result type to the correct place.
  3. This one is more complex, orchestration, but I'm still talking about gold here. Roc would manage a small bit of state automatically for the app. When the user makes an ffi call a shared library has to be loaded and a symbol or 2 have to be loaded from the shared libary. It is really inconvenient to need to pass an opaque around representing a function. It is also inefficient to constantly be going from string to symbol. So In the perfect world, roc would in the background spawn a task on the first call to an ffi function to load the correct shared library/symbol. After it is loaded, roc would keep a reference to that and use it for every new ffi call.

Note on 3: With something like stored where a function can load and store from the platform via a unique key, roc wouldn't need to be involved otherwise. That said for stored to work, we need a way for an indvidual function to keep state of unique keys. I think that ffi state could easily become a big hassle to manage without some sort of builtin state management.

Sliver, Bronze, Copper, etc

Obviously, we don't have to match gold, but I think it give a good idea of the pieces required to make this nice. In the worst case, we have what I made for #ideas > Shared Library FFI Packages: everything is always boxed, the user app has to manage all ffi state, the ffi package author has to write their own glue, and no types are guaranteed to be set correctly.

We can pick some or all of the things above to make this picture nicer. I really think 1 and 2 would be huge. 1 makes it much more type safe and allows for easy ffi package dev. 2 enables fast and efficient ffi without a bunch of extra platform complexity. It also shouldn't be too hard to generate a tuple, and some pointers. For 3, I'm sure there must be a smart way to manage state. I mean we have to solve fast state management in general for things like webservers. Maybe the state pipelined around in a smart way. Maybe we allow for generation of unique keys and stored such that state can be local. Either way, I think some sort of solution will emerge.


Last updated: Jun 16 2026 at 16:19 UTC