wasm Task-based ffi? · ideas · Zulip Chat Archive

something that just occurred to me: I can't think of a reason why Roc couldn't support a Task-based WebAssembly FFI :thinking:

Richard Feldman (Jul 31 2024 at 00:18):

for example (just making things up here) we could have a module type called wasm and in its header it specifies a .wasm file which it wraps

Brendan Hansknecht (Jul 31 2024 at 00:22):

Richard Feldman (Jul 31 2024 at 00:23):

basically as a way to take platform-agnostic code that's written in another language and call it from Roc applications without having to get the platform involved, or use dylibs

Richard Feldman (Jul 31 2024 at 00:23):

Richard Feldman (Jul 31 2024 at 00:24):

Luke Boswell (Jul 31 2024 at 00:25):

So is the external library we are wanting compiled and packaged into a WASM library?

Richard Feldman (Jul 31 2024 at 00:25):

Richard Feldman (Jul 31 2024 at 00:26):

Luke Boswell (Jul 31 2024 at 00:26):

And then platforms can recieve Tasks from roc saying "load this 'someCbutNowWasm.wasm library, and call X passing Y"

Richard Feldman (Jul 31 2024 at 00:26):

Richard Feldman (Jul 31 2024 at 00:27):

Luke Boswell (Jul 31 2024 at 00:27):

Richard Feldman (Jul 31 2024 at 00:27):

Luke Boswell (Jul 31 2024 at 00:27):

If we are just passing standard Roc Types back and forth across the host boundary, and these are translated into types WASM understands

Richard Feldman (Jul 31 2024 at 00:28):

yeah I'm just thinking about what a malicious actor could do in the package ecosystem

Luke Boswell (Jul 31 2024 at 00:28):

Or I guess you could have WASM enabled hosts, that support roc packages which includes WASM binaries

Richard Feldman (Jul 31 2024 at 00:29):

like if it's all .roc files, there are certain exploits that aren't possible, so the question becomes - if there are now .wasm files too, is there some way we can maintain that guarantee?

Richard Feldman (Jul 31 2024 at 00:29):

that you can install and run any roc package and it can't access other parts of the process's memory space, for example

Luke Boswell (Jul 31 2024 at 00:29):

The host/platform still controls everything at that boundary... so unless things can bust out of WASM runtimes I don't see how this could be an issue

Luke Boswell (Jul 31 2024 at 00:30):

Brendan Hansknecht (Jul 31 2024 at 00:33):

Oh, this is for adding tasks to a package while trying to avoid adding general ffi to roc

Richard Feldman (Jul 31 2024 at 00:33):

Richard Feldman (Jul 31 2024 at 00:34):

I assume they'd need to be tasks, but maybe that's not a correct assumption either :laughing:

Brendan Hansknecht (Jul 31 2024 at 00:34):

So would enable someone to write some C and compile it to wasm and then call it wasm

Richard Feldman (Jul 31 2024 at 00:34):

Brendan Hansknecht (Jul 31 2024 at 00:35):

Richard Feldman (Jul 31 2024 at 00:35):

kinda - I think something like WIT where the wasm file says "here are the operations I require"

Richard Feldman (Jul 31 2024 at 00:36):

e.g. if a wasm file says "I need to be able to write to a file" then that operation can be provided in the normal Roc way (module params) just like anything else

Brendan Hansknecht (Jul 31 2024 at 00:37):

If they can do arbitrary system ffi via wasi (even if explicitly specified via wit), why limit to wasm?

Richard Feldman (Jul 31 2024 at 00:37):

Brendan Hansknecht (Jul 31 2024 at 00:37):

Feels like a case where we should just allow wrapping a native dynamic library as tasks without any interaction with the platform.

Richard Feldman (Jul 31 2024 at 00:38):

Brendan Hansknecht (Jul 31 2024 at 00:38):

Richard Feldman (Jul 31 2024 at 00:38):

Brendan Hansknecht (Jul 31 2024 at 00:38):

Richard Feldman (Jul 31 2024 at 00:39):

like the type of wasm you could run in a browser, where you have to provide it with everything and it doesn't know how to do anything natively

Richard Feldman (Jul 31 2024 at 00:39):

Brendan Hansknecht (Jul 31 2024 at 00:39):

Ok yeah, that makes more sense. Was very confused by the mention of wit where you can do package wasi:filesystem;.

Richard Feldman (Jul 31 2024 at 00:40):

Luke Boswell (Jul 31 2024 at 00:40):

The package author includes a WIT file describing the interface for their WASM module, and we may be able to use that to generate the interface on the Roc side?

Richard Feldman (Jul 31 2024 at 00:40):

Brendan Hansknecht (Jul 31 2024 at 00:41):

Also, yes, would still need to be tasks. Cause wasm can use globals among other things that could be very unsafe in roc.

Richard Feldman (Jul 31 2024 at 00:41):

Richard Feldman (Jul 31 2024 at 00:42):

but I could imagine a scenario where you basically call the wasm function in its own isolated sandbox (e.g. give it its own memory arena and don't let it see anything else) and then don't maintain any state in between invocations

Richard Feldman (Jul 31 2024 at 00:43):

and then we could either do that, or maintain state in between calls, depending on whether you specified to call it as a Task or not

Brendan Hansknecht (Jul 31 2024 at 00:43):

Extending platforms with wasm modules that get called by roc should be doable.
By making the wasm freestanding, it won't have any access to ffi.
Will need to use Task due to being able to hold onto state.
Calling into it will of course have all of the memory copying gripes. That said, memory returned from wasm should be possible to directly reference.
We could even use a wasm interpretter that jits/compiles to native to get really solid perf.

Richard Feldman (Jul 31 2024 at 00:44):

so it just ends up in the binary and there's no wasm runtime in the compiled binary

Richard Feldman (Jul 31 2024 at 00:44):

Richard Feldman (Jul 31 2024 at 00:45):

I believe binaryren loads .wasm files into LLVM IR in order to run LLVM optimization passes on it and then output another .wasm file

Brendan Hansknecht (Jul 31 2024 at 00:45):

Yeah, should be doable. So during roc compilation time would compile the wasm and setup the memory restrictions and such.

Richard Feldman (Jul 31 2024 at 00:45):

but if that's possible, then it's also possible to load it into LLVM IR and then emit machine instructions

Richard Feldman (Jul 31 2024 at 00:45):

Brendan Hansknecht (Jul 31 2024 at 00:46):

Just leaves an open question if it becomes less safe/has an easier time escaping the interpreter (cause it is compiled away)

Brendan Hansknecht (Jul 31 2024 at 00:46):

Richard Feldman (Jul 31 2024 at 00:48):

I think passing data in should be straightforward theoretically (it all has to be copied, which is unfortunate, but I don't see a way around that)

Richard Feldman (Jul 31 2024 at 00:49):

I guess theoretically it could be possible to do some Morphic-esque analyis of like "this value is only ever going to be passed into wasm" and then do the roc_alloc equivalent directly into the memory wasm will be given access to, but I dunno about that :laughing:

Richard Feldman (Jul 31 2024 at 00:49):

Brendan Hansknecht (Jul 31 2024 at 00:50):

Richard Feldman (Jul 31 2024 at 00:50):

otherwise the next time you call into wasm it could have stored a pointer into what it gave back last time

Brendan Hansknecht (Jul 31 2024 at 00:50):

Oh, to stop wasm from storing a version of a list and then mutating it in place.

Richard Feldman (Jul 31 2024 at 00:50):

Luke Boswell (Jul 31 2024 at 00:51):

attempt 1

.wit "world" inside a roc package describes the interface for the bundled WASM binary

default world simple_world {
    import {
        // Importing an addition function from the host environment
        fn add(a: i32, b: i32) -> i32
    }

    export {
        // Exporting a multiplication function from the WASM module
        fn multiply(a: i32, b: i32) -> i32
    }
}

module SimplePackage {
    # this is a module parameter that is required to instantiate this package
    # I'm not sure if we have types in the syntax for module params
    add : { a : I32, b : I32 } -> Task I32 *
} [
    multiply,
]

multiply : { a : I32, b : I32 } -> Task I32 *

Brendan Hansknecht (Jul 31 2024 at 00:51):

Personally, if a user opts into it, I would prefer to just allow roc to load a shared library as a platform extension.

Brendan Hansknecht (Jul 31 2024 at 00:52):

Any, but yeah for wasm that all sounds good. Lots of copies, but otherwise just a task based interface and it should be fine.

Brendan Hansknecht (Jul 31 2024 at 00:52):

Brendan Hansknecht (Jul 31 2024 at 00:53):

Like a list of strings would need to copy the list and every string into and out of wasm

Richard Feldman (Jul 31 2024 at 00:53):

Brendan Hansknecht (Jul 31 2024 at 00:53):

Richard Feldman (Jul 31 2024 at 00:53):

so a potentially very valuable use of this could be math functions that don't do heap things anyway

Richard Feldman (Jul 31 2024 at 00:54):

Brendan Hansknecht (Jul 31 2024 at 00:54):

eh, most math functins that matter are for multidimensional arrays. So lots of data

Richard Feldman (Jul 31 2024 at 00:54):

Brendan Hansknecht (Jul 31 2024 at 00:56):

I mean I guess for some game programming stuff it would help. But for most stuff blas is used for, they tends to be at least medium sized. So all the copies could really hurt.

Like I don't think it would work for a generic blas wrapper. But it would probably work if you made a full blas simulation function with many operations and exposed it as one effect.

Richard Feldman (Jul 31 2024 at 00:56):

Luke Boswell (Jul 31 2024 at 00:57):

Why do we need to copy the data? The host is still in control of the information which is sandboxed inside the WASM runtime

Brendan Hansknecht (Jul 31 2024 at 00:57):

I think the copies would get too expensive if you have 2 copies for every single matrix add, multiply, etc.

Brendan Hansknecht (Jul 31 2024 at 00:58):

Have to copy out to stop wasm from holding a reference and mutating it later such that roc seeing random changes to data that is supposed to be constant.

Luke Boswell (Jul 31 2024 at 00:59):

Can you allocate the RocList into an arena, pass that into WASM to modify and then when WASM returns you know it cannot do anything more so it's safe to pass back to roc

Brendan Hansknecht (Jul 31 2024 at 01:00):

Richard Feldman (Jul 31 2024 at 01:00):

I don't know enough about wasm's memory model to be sure if this would work, but maybe in the wasm interop wrapper you could opt into some restrictions that gain performance without sacrificing security, specifically:

Brendan Hansknecht (Jul 31 2024 at 01:00):

Brendan Hansknecht (Jul 31 2024 at 01:01):

Yeah, I think wasm without globals and some extra checks could go a long way. Still definitely wouldn't be safe, but we could at a minimum just do a copy for uniqueness before handing off to wasm.

Richard Feldman (Jul 31 2024 at 01:03):

yeah a relevant question is - given the security requirements, what use cases are left that would be useful in practice?

Richard Feldman (Jul 31 2024 at 01:03):

I guess a possible answer in general is "a thing that at least works, and then in the future it can be rewritten in Roc to be faster because it doesn't have the security overhead"

Luke Boswell (Jul 31 2024 at 01:05):

Maybe if you have a big function that is written in C or something and it's been verified or is trusted and you don't want to rewrite it.

Brendan Hansknecht (Jul 31 2024 at 01:05):

I think the real issue is that it likely would be hard to use existing libraries (especially if no globals/state). So you would be writing raw c/rust/zig for the wasm. Definitely could be used to speed up some computations, but a much bigger lift to create a library for it. Probably can't just import blas/lapack/eigen/tf/etc and build for wasm with no globals and a thin type shim for roc lists.

Luke Boswell (Jul 31 2024 at 01:06):

WASM is already pretty restricted crossing the host boundary. So I wonder if the copy in and out is really that bad?

Brendan Hansknecht (Jul 31 2024 at 01:06):

So I would label it as a potential gain, but personally, I would turn to raw ffi in a platform with a shared library calling into blas before I would use something like this. But I definitely could be really biased.

Luke Boswell (Jul 31 2024 at 01:07):

Noting the massive potential boost to the ecosystem from being able to use code that is written in any language that compiles to WASM

Brendan Hansknecht (Jul 31 2024 at 01:08):

Really depends on use case and how small of a chunk of code each function is. As I mentioned above, calling into a large function that will take a lot of time anyway is probably fine. Calling into wasm for individual ops probably is too costly.

And I think for this to be really nice, you would want to call into it for each individual op.

Luke Boswell (Jul 31 2024 at 01:08):

Brendan Hansknecht (Jul 31 2024 at 01:09):

Like I would want to expose the roc-wasm-matrix library that has all of the matrix operation super fast in wasm. Then the end user can make individual calls to add and sub and matmul. But that would be 2 copies for every matrix add.

Luke Boswell (Jul 31 2024 at 01:09):

But you could now have a Task to load data into WASM, and then the calls could be instructions to operate on that data, and another to eventually get the data back out right?

Luke Boswell (Jul 31 2024 at 01:10):

Brendan Hansknecht (Jul 31 2024 at 01:13):

So wasm returns a handle back to roc and roc works with the handle until it needs the data back out.

Brendan Hansknecht (Jul 31 2024 at 01:13):

Brendan Hansknecht (Jul 31 2024 at 01:14):

So 100% has to be task cause we have to allow wasm to hold state, but as long as we delay returning state, it should be safe from scary mutation and mostly copy free.

Brendan Hansknecht (Jul 31 2024 at 01:18):

x = WasmMatrix.createMatrix! someNumList [12, 22]
y = WasmMatrix.createMatrix! someNumList2 [22, 36]

x = WasmMatrix.mulF32! x 7.2
x = WasmMatrix.subInt! x 3
z = WasmMatrix.matmul! x y
out = WasmMatrix.extractMatrix z
...

Brendan Hansknecht (Jul 31 2024 at 01:18):

Brendan Hansknecht (Jul 31 2024 at 01:19):

Also, probably still will really confuse users depending on if it is inplace or not:

x = WasmMatrix.createMatrix! someNumList [12, 22]
x2 = WasmMatrix.mulF32! x 7.2

Luke Boswell (Jul 31 2024 at 01:23):

I might be misunderstanding here. But the mental model I had in my head, is that on the Roc side it's just Tasks and an abstract/opaque interface.

The host exposes some standard set of calls to roc to work with WASM modules, and roc uses those to instruct the host what to call and what arguments to provide etc.

So the package WASM binary is dynamically linked/loaded by the host. It can be cached in .cache/roc along with the other .roc files when roc runing an app.. or for building a executable it is expected to be available in a sub directory like /wasm-packages or in a path from an environment variable at runtime.

Brendan Hansknecht (Jul 31 2024 at 01:26):

The idea is that roc will compile it into the binary at roc app compilation time. Nothing done on the platform side.

Luke Boswell (Jul 31 2024 at 01:26):

Luke Boswell (Jul 31 2024 at 01:27):

Brendan Hansknecht (Jul 31 2024 at 01:27):

A package would include a task module and a .wasm file. Roc would hopefully compile the wasm to sandboxed native code. Could also just embedded the entire .wasm file into the binary with a wasm interpreter as well.

Luke Boswell (Jul 31 2024 at 01:28):

Lol, imagine a roc app compiled to WASM, that is using a WASM package that is running inside a WASM interpreter

Luke Boswell (Jul 31 2024 at 01:28):

Richard Feldman (Jul 31 2024 at 01:29):

presumably if we know that the compilation target is wasm, we could make use of that knowledge to avoid silliness :big_smile:

Luke Boswell (Jul 31 2024 at 01:30):

I wonder how transitive dependencies would be handled? Like, we would only need to have one WASM binary per package, and the whole app will only use one version for each package.

Stream: ideas

Topic: wasm Task-based ffi?

Richard Feldman (Jul 31 2024 at 00:17):

Richard Feldman (Jul 31 2024 at 00:18):

Brendan Hansknecht (Jul 31 2024 at 00:22):

Richard Feldman (Jul 31 2024 at 00:23):

Richard Feldman (Jul 31 2024 at 00:23):

Richard Feldman (Jul 31 2024 at 00:24):

Luke Boswell (Jul 31 2024 at 00:25):

Richard Feldman (Jul 31 2024 at 00:25):

Richard Feldman (Jul 31 2024 at 00:26):

Luke Boswell (Jul 31 2024 at 00:26):

Richard Feldman (Jul 31 2024 at 00:26):

Richard Feldman (Jul 31 2024 at 00:26):

Richard Feldman (Jul 31 2024 at 00:27):

Luke Boswell (Jul 31 2024 at 00:27):

Richard Feldman (Jul 31 2024 at 00:27):

Richard Feldman (Jul 31 2024 at 00:27):

Luke Boswell (Jul 31 2024 at 00:27):

Richard Feldman (Jul 31 2024 at 00:28):

Luke Boswell (Jul 31 2024 at 00:28):

Richard Feldman (Jul 31 2024 at 00:29):

Richard Feldman (Jul 31 2024 at 00:29):

Luke Boswell (Jul 31 2024 at 00:29):

Luke Boswell (Jul 31 2024 at 00:30):

Luke Boswell (Jul 31 2024 at 00:30):

Brendan Hansknecht (Jul 31 2024 at 00:33):

Richard Feldman (Jul 31 2024 at 00:33):

Richard Feldman (Jul 31 2024 at 00:34):

Brendan Hansknecht (Jul 31 2024 at 00:34):

Richard Feldman (Jul 31 2024 at 00:34):

Brendan Hansknecht (Jul 31 2024 at 00:35):

Richard Feldman (Jul 31 2024 at 00:35):

Richard Feldman (Jul 31 2024 at 00:36):

Richard Feldman (Jul 31 2024 at 00:36):

Richard Feldman (Jul 31 2024 at 00:36):

Brendan Hansknecht (Jul 31 2024 at 00:37):

Richard Feldman (Jul 31 2024 at 00:37):

Richard Feldman (Jul 31 2024 at 00:37):

Brendan Hansknecht (Jul 31 2024 at 00:37):

Richard Feldman (Jul 31 2024 at 00:38):

Richard Feldman (Jul 31 2024 at 00:38):

Brendan Hansknecht (Jul 31 2024 at 00:38):

Richard Feldman (Jul 31 2024 at 00:38):

Brendan Hansknecht (Jul 31 2024 at 00:38):

Richard Feldman (Jul 31 2024 at 00:39):

Richard Feldman (Jul 31 2024 at 00:39):

Richard Feldman (Jul 31 2024 at 00:39):

Brendan Hansknecht (Jul 31 2024 at 00:39):

Richard Feldman (Jul 31 2024 at 00:40):

Luke Boswell (Jul 31 2024 at 00:40):

Richard Feldman (Jul 31 2024 at 00:40):

Brendan Hansknecht (Jul 31 2024 at 00:41):

Richard Feldman (Jul 31 2024 at 00:41):

Richard Feldman (Jul 31 2024 at 00:42):

Richard Feldman (Jul 31 2024 at 00:43):

Brendan Hansknecht (Jul 31 2024 at 00:43):

Richard Feldman (Jul 31 2024 at 00:44):

Richard Feldman (Jul 31 2024 at 00:44):

Richard Feldman (Jul 31 2024 at 00:44):

Richard Feldman (Jul 31 2024 at 00:45):

Brendan Hansknecht (Jul 31 2024 at 00:45):

Richard Feldman (Jul 31 2024 at 00:45):

Richard Feldman (Jul 31 2024 at 00:45):

Brendan Hansknecht (Jul 31 2024 at 00:46):

Brendan Hansknecht (Jul 31 2024 at 00:46):

Richard Feldman (Jul 31 2024 at 00:48):

Richard Feldman (Jul 31 2024 at 00:49):

Richard Feldman (Jul 31 2024 at 00:49):

Richard Feldman (Jul 31 2024 at 00:49):

Richard Feldman (Jul 31 2024 at 00:49):

Brendan Hansknecht (Jul 31 2024 at 00:50):

Brendan Hansknecht (Jul 31 2024 at 00:50):

Richard Feldman (Jul 31 2024 at 00:50):

Richard Feldman (Jul 31 2024 at 00:50):

Brendan Hansknecht (Jul 31 2024 at 00:50):

Richard Feldman (Jul 31 2024 at 00:50):

Richard Feldman (Jul 31 2024 at 00:50):

Luke Boswell (Jul 31 2024 at 00:51):

attempt 1