As far as I understand, today the only way to link foreign interfaces into your Roc program is by doing so through a Host. And, any interface exposed by a Host has to be an Effect. Would Roc be interested in/already have plans for more general FFI bindings, so that e.g. even libraries can link to foreign interfaces?
The motivation here is that there is a plethora of high-quality software for general use (e.g. BLAS/LAPACK for numerical math) that would be great for reuse in Roc (e.g. in a machine learning kernel). Unlike hosts, foreign interfaces like BLAS/LAPACK aren't meant to be entry points to execute code, they just happen to be robust and battle-tested libraries.
In my mind there are two issues with permitting FFI bindings that conflict with Roc's goals:
I don't know how to get around these issues other than to (1) trust people will do the right thing or (2) only permit FFI for libraries that have undergone some centralized Roc-community auditing process. I don't really like the latter one because it's unclear what the community will look like in 1, 2, 5, etc. years. In any case I think that more general FFI would be a good idea; does anyone have thoughts?
It would take quite a bit of design and implementation work, but I think we could technically enable libraries to depend on ffi only via effects. Essentially the same way we do it with platforms, but they would link to a static library instead of a full binary.
Though in the case of numeric libraries like blas, that may make interfacing with them a big pain. Everything call to them would have to be an effect/task.
Yeah I feel like wrapping it in an effect is suboptimal, both in terms of ergonomics and performance
Maybe that just means we need to focus on tooling and auditing. Being able to say, "this library calls into 8 ffi methods. You only potentially call into 3 of them. Here they are so you can double check they don't break pure functional guarantees". Kinda like how unsafe sticks out in rust.
Would it be possible to use wasm for such cases?
I'm not sure I follow. What are you suggesting?
Compile the lib you want to use to wasm.
And then compile it to native as part of the compilation process. (e.g. wasmer also uses llvm to do that afaik)
i'm not so firm in wasm stuff, but if wasi is missing, wouldn't you be limited to pure functions?
this way you could trust that everything is pure without having to check everything.
but like i said, i'm not sure about that...
Yeah, that makes sense. It'd only be relevant in a wasm-runtime context though, and we probably don't want to bundle a WASM executor with such builds.
I guess you could use wasm2c or something and then recompile, but at that's likely to break for projects this would be worth doing for
Also to do this you'd need the original source code, which we likely have but is something to think about
In that case though, I guess you could not link libc, etc (or link a "pure" version) and see how far that gets you since that would limit the interfaces you have to access the outside world
If you're importing WASI into your .wasm bundle then there are no purity guarantees because any function could be calling any of those WASI system calls to do any side effect.
Oh right sorry you said if WASI is missing
without wasi, could there be mutation of values that the roc side could see?
I'd think so
It's still a different concept from functional purity though. The Wasm module has its own memory so it could store things and return a different value from the same function every time
for the same input
unlike pure functions
Brian Carroll said:
It's still a different concept from functional purity though. The Wasm module has its own memory so it could store things and return a different value from the same function every time
ah, you are right.
i was so focused on "limit the access to the file system/network/..., because e.g. editor might execute such a package on the fly" train of thought, that i totally forgot about memory.
because having access to arbitrary I/O breaks the guarantee that something is safe to run because the platform is safe (laid out e.g. in the "The Edges of Cutting-Edge Languages" talk.)
Ayaz Hafiz said:
Also to do this you'd need the original source code, which we likely have but is something to think about
The idea was to ship the already built wasm as part of the package. this way you wouldn't need the original source.
Sure, but that might not be possible for any external lib, for example some shared object last compiled 10 years ago
Im becoming more convinced that it's infeasible to guarantee purity from external interface. We might need to rely on trust like Haskell does, which is maybe okay? And the wasm idea is also a great heuristic/partial guarantee
I'm becoming more convinced that it's infeasible to guarantee purity from external interface
yeah I know someone who worked on a project that tried to do this, and ended up concluding it was impossible :big_smile:
that said, given how fast Roc can be at runtime, I think a promising direction is to use a similar strategy to generate Roc code from other languages
e.g. for a pure function, you could probably write a Rust -> Roc converter, or maybe even a wasm -> Roc converter
and as long as it actually was pure, it would be translatable
at that point, you'd have Roc code which would fit naturally into the whole ecosystem, there'd be no UB, surprise type mismatches or side effects, etc.
it might not be the most beautiful Roc code, but that wouldn't be the point anyway - the point would be to have the desired compile-time and runtime characteristics - and you could of course incrementally refactor it to be nicer if you wanted to actually maintain it
That's a good point. I would just be worried about the development cost of developing such a tool. For example, say I want to get bindings to LAPACK, a linear algebra library. Indeed we should expect that most, if not all, of its API are functions in the mathematical sense. Lapack though (at least this implementation) is a 600K line fortran code base (!), and I would imagine the cost of writing a Fortran->Roc converter, such that it works faithfully for this codebase, and maintaining it, would take a tremendous amount of effort. Moreover you're unlikely to get a lot of benefit out of the convert, because despite how wonderful Fortran is for mathematics and scientific computing, there are not a lot of "standard" libraries that one would like to port over, except maybe lapack and friends.
apparently someone got LAPACK building from Fortran to wasm! https://github.com/pyodide/pyodide/issues/184#issuecomment-616948624
so maybe a wasm -> roc converter might cover a lot of ground :smiley:
wow, that’s really impressive. if it can be done with no JS or wasi dependency, which it probably can, that might be the way to go!
Anyway maybe we should try out a couple different options and see what works and what doesn’t. I’ll try putting together integrations with BLAS/LAPACK and Tensorflow, both via bindings and the wasm idea, in early to mid March
here's an interesting idea for a design: what if we gave the Roc compiler a way to directly import compiled wasm functions, but we restricted the permitted types of the functions that could be imported, and "sandboxed" the memory reads and writes of the compiled wasm somehow? (e.g. do something similar to what the OS does with virtual memory at the process level, such that the function can't access memory outside of what it allocated itself)
if the type restrictions included that they couldn't receive or return pointers, then all math functions would still work, and they could do internal mutation, but it wouldn't be observable
hmmm, but they could still break referential transparency, e.g. by allocating memory and looking at the garbage memory to get pseudorandom values :thinking:
so maybe the restriction would have to be that they couldn't do anything with heap memory at all, just math operations haha
that sounds a lot less useful than the "compile it to Roc and if the Roc compiler accepts it, we know it's not going to break anything" design :big_smile:
:thinking: although maybe we could swap all malloc calls for calloc equivalents? (Or I guess roc_calloc)
I think then everything would be referentially transparent at the function boundary?
The interface could also return stack-local memory or an arbitrary pointer anywhere into memory. Though both of these are easily statically detectable
This is a problem we already have today, no - in that your host can expose a function that “can do whatever it wants”. It’s hidden behind an effect, but if I have an “Effect {} -> I64”, then once I can access the function inside the effect, there’s no guarantee of its consistency!
Going back to limiting IO and friends, in my mind the ideal solution would be to allocate a cgroup with the correct partitions, chuck the binaries/shared objects in there and let it run - then you don’t need to verify any code or do any transformation, just let the kernel verify it for you. But this falls apart for many reasons ):
once I can access the function inside the effect, there’s no guarantee of its consistency
importantly though, there's no way outside the host to access the function inside the effect :big_smile:
well for limiting IO we could allow only a specific known set of external calls, and replace all others with panics
With the roc linker we could theoretically ban a lot of things in FFI, but probably wouldn't be able to make any true guarantees.
The ffi library can't call malloc or any other external functions. It can't have a data section, only an rodata section. It can't link to any dynamic libraries, etc.
can't call malloc or any other external functions
we could replace calls to malloc with calls to roc_alloc, and so on for memcpy etc - since Roc code already depends on those!
I guess the question really boils down to: is it possible to use a combination of restrictions and rewrites to end up with FFI that maintains referential transparency?
I'd be surprised if something like a linear algebra library (for example) really needed to use the data section
and if the library really needed to do I/O or something, there's of course Task for that
linking to dynamic libraries wouldn't work for sure :thumbs_up:
Is there a concern that FFI would lead to packages APIs that don't work well with Roc due to them being wrappers around packages from other languages? In other words, they don't violate referential equality or have unmanaged side effects, but they are clunky to use because the API was not written with Roc in mind.
for sure - I think that's a very real risk
but at the same time I also think LAPACK is an interesting motivating use case to explore
and it's entirely possible that the result of the exploration is "this isn't a good idea after all" but I do want to talk it through!
I'm not familiar with numerical libraries but is it maybe possible to construct the operators in a way similar to effects? So you kind of define the formula and then pass it over so there are fewer calls across the boundary?
kinda like how in tensorflow you build up a computation graph and then pass that to c++ to be evaluated?
Richard Feldman said:
we could replace calls to
mallocwith calls toroc_alloc, and so on formemcpyetc - since Roc code already depends on those!
I was more talking about that as a way to try and limit possible side effects, but I guess if they don't have a data section and we block linking global data, then calling malloc is safe because it won't lead to side effects that roc can see. I was more worried about malloc being used to store state, which would break being pure.
Folkert de Vries said:
kinda like how in tensorflow you build up a computation graph and then pass that to c++ to be evaluated?
That is also one of the biggest gripes that people have with tensorflow. Static graphs are a pain when it comes to control flow and anything dynamic. The reason why tf2 is eager by default.
Not sure how this would present in Roc, but I assume it would make interaction much more painful.
interesting - I don't have any experience with tensorflow :big_smile:
I mean at some level there are going to be inevitable tradeoffs around some combination of ergonomics and what's allowed
maintaining referential transparency is super important, and if it's true that there's no way to guarantee some arbitrary wasm code (let's say) is referentially transparent - which I believe is impossible in the general case - then the question becomes whether we can find a set of restrictions and transformations which guarantee that the referential transparency invariant is preserved, while also still letting enough programs through to be useful!
Yeah, I think the only way to really guarentee that would be to treat libraries just like platforms and wrap everything with effects.
I don't believe Wasm will get us referential transparency. All the function has to do is use a global variable, and what's to stop it?
Ok in case I'm being too negative, let me try to argue against myself for a minute!
We'd have to take a Wasm module and delete the Globals section, and give it no imported functions from the host, and reset its memory after every call into an exported function.
Actually we'd have to allow at least one global for the stack pointer.
I'm not sure how we guarantee that's what it's being used for though!
Resetting the memory every time definitely sounds like a performance problem.
Haha, sounds like it is easier to do this in an elf file. We can block globals, function calls, writeable data sections, and the stack is implicit.
So we wouldn't need to wipe any memory every run
Though it is still easily worked around:
These changes would just likely make it more obvious if the code you are based on is doing anything that breaks referential transparency.
Was thinking about FFI a bit more recently. The main thought was that I want to use sqlite from basic-cli and basic-webserver.
This is currently impossible without adding sqlite specifically to the platforms. Unlike Postgres, there is no clean primitive like tcp for sqlite.
Just trying to think if there is any good solution to this. The only primitive that I think this could be built off of in a sharable manner is generic c FFI with dynamically loaded libraries. That said, it would not be very clean and it would pass around a lot of nested generic tag representations of data due to needing runtime type info. This probably could be done today, but would require a lot of platform twiddling.
Would setting up a platform to do generic c FFI via shared libraries be useful? Is there a better solution to this kind of stuff? Maybe the best solution for wanting access to an arbitrary c library is just to fork a platform and add a bunch of effects.
Aside: sqlite is also often statically linked. It'd be great to have a solution to using sqlite and similar that doesn't necessarily depend on dynamic linking
For it to be statically link, I am pretty certain it would be required to be always built into the platform. So it would be a platform by platform decision of what to include
That's a good point, though we could also potentially have some mechanism, perhaps provided by Roc itself rather than the platform, to generate the same interface whether statically or dynamically linked, i.e. some tooling option or configuration to indicate which libs should be statically linked yet transparently accessible via ffi, or alternatively, provide some mechanism to provide generated modules which wrap around some C lib, with some load task which that no-ops when statically linked
Technically we could once task is a builtin, but I'm not sure how it would mix with the effect interpreter that the platform controls and runs. It definitely wouldn't mix with the surgical linker.
If we had packages that are allowed to statically link in FFI (essentially packages that can create new task variants), we are stuck dealing with final linking which is very brittle and has many problems. On top of that, somehow the effect interpreter on the platform has to seamlessly integrate with these new tasks. Imagine a platform built on an async executor that suddenly is blocking on sqlite freezing the entire thing.
This could be quite elegantly designed once module parameters are available. A platform could provide (or not) a function for doing arbitrary C FFI as part of its regular platform API. Then one could build a library that implements SQLite on top of that. This library would be available for all platforms that provide the arbitrary C FFI function, and not available for the ones that don't (browser, plugin APIs...). It requires platforms a manual opt in, but it should be short and easy to implement, and a lot of the code could be moved to platform independent libraries that build on top of this one platform function.
I wouldn't say easy to implement, but definitely doable. Requires dynamically loading the FFI dependency. Mapping all types to roc types and dealing with lifetimes of exposed values. Also, need to use libffi for dynamically loading unknown functions at runtime.
For example, probably want to treat an sqlite prepared query as an opaque pointer when passed into roc. Want to be able to do whatever chain of calls to prepare a query with any arbitrary combination of arguments. Also, potentially want file io to be dealt with by the platform such that they platform can pass a file handle or opaque path correctly to FFI functions. Not to mention a taco stream or other more complex types.
But yeah, once it is working cleanly, should be a matter of exposing module param primitives and ensuring shared libraries exist to be loaded for FFI.
An async platform can wrap any FFI calls in synchronous threads if they are worried about long running FFI calls that might do synchronous io blocking the async runtime. Though always doing this would be quite pessimistic for short FFI calls.
Is taco stream a typo? :taco:
Haha....tcp stream
Last updated: Jun 16 2026 at 16:19 UTC