I wanted to write this out to make sure we are all on the same page. This is especially important cause I know we want to change some of the platform calling convention to make things easier to work with. This does not need to all be done at once, but I think some of it is important to do from the start.
Firstly, and probably the most important (it may effect some of the IRs). All functions that roc generates will now have an implicit arg. The arg will be a constant reference to a record that contains all allocation related functions. This will make it much easier for platforms to control allocations with arenas and what not. One piece I am not sure of with this design, do lambdas capture this record or do they take it as an argument? There may be some weird edge cases here that platforms need to be careful around.
Note: Due to switching to static libraries instead of surgical linking, all other effects will stay the same as they are today. They will not need to be passed in on each call. They will just be linked as normal.
Second, we want to change the host and effect function allowed types. Essentially, the host must box all returned lambdas and type variables. Type variables are allowed to be passed to the host, but they are simply opaque boxes to the host (Like Box model
which the host would see as Box {}
). This removes any sort of variable sized data being passed to the host.
Third, and this is a longer term goal, we want to generate cffi functions that the host can use to interact with all roc primitives. These can be gc'd if the host doesn't use them, but we should generate functions for all types exposed to the host to make them easier to interact with. This will fundamentally work with glue to make interacting with roc types way easier. Instead of repeating the same logic that is rather complex in N different glue scripts, we just have to wrap a couple of functions and tell the host how many bytes a type is.
Brendan Hansknecht said:
One piece I am not sure of with this design, do lambdas capture this record or do they take it as an argument? There may be some weird edge cases here that platforms need to be careful around.
I don't think they should capture. if I'm the host and I run a chunk of Roc code, I always want the whole thing to run using the allocators I specified at the call site. Capturing them would remove that invariant, which definitely sounds undesirable! :big_smile:
but in general that all sounds right to me!
The hard part is that the lambda may capture variables that were allocated with the allocator that created the lambda. Maybe types need to store the allocator they were created with, but I was hoping to avoid that.
Also, if types store the allocator, should they migrate to a new allocator if they grow in a different roc function with a different new allocator.
I think it's best to let the host figure that out
like if the host is going to store returned closures and then use them again later, it's up to the host to make sure they're being used with the same allocator again
or at least with a compatible one, e.g. if the new one is told to deallocate an address it doesn't recognize, it knows to ask the previous allocator to deallocate it
Ok, yeah that sounds fine. I guess worst case the host makes a linked list of allocators. I wonder if this will make it hard to clean up old arenas. Probably depends a lot on context. Worst case the arena would last until all lambdas are resolved
I wanted to mention it in this thread too:
we talked about simplifying the ABI by saying the host always passes exactly 3 pointers to the compiled Roc function:
this means we don't need to deal with C ABI at all
Yeah, that would free us from c call conv, but not from c abi in general (well, I guess glue just has to use c abi layouts to match roc's layout, so it does mostly avoid c abi too)
Also, I think n args is fine.
but yeah, each arg is passed by pointer to avoid abi for the most part
That said, for the interpreter, it would take that and map it to having all args in a list and also a list of types for each arg.
I think if we did one pointer per arg, there would become (correct) folklore that it's better for perf if you just put everything in one arg
so I feel like we should just make that be how it works directly
it's better for perf if you just put everything in one arg
Why would that be better perf?
if I'm passing four u16s, passing the four pointers will use way more memory than the four u16s themselves
I don't think it would be consistently better or worse. The cost of putting them in one arg is making a big stack allocation and copying everything over. If in the platform they are all separate data, it is probably faster to pass them as separate pointers and avoid any copying of data.
fair
ok yeah maybe that's better just to make one less rule to think about
Honestly, I don't think the platform to roc boundary will be in the hot loop generally speaking. If roc is doing that little, you probably don't want to dish out to roc at all. So I think any abi is probably fine.
Is it helpful to think about what would be nice for Go, or Swift, or other languages to work with?
I feel like C or Zig is easy to do anything... but if we stray to far from convention it may make it difficult for those other languages to call into roc
nah
don't think this should matter for them
they have to know how to pass pointers to things regardless
Also -- not sure if we've forgotten about it. But should we discuss hot-reloading or the ideas around that?
yeah! prob in its own thread?
Yeah, I'm not sure what to say though... other than "hey, anyone thought about hot-reloading?"
Also, just to note, we are defining two different specs here. One for libroc
and one for standard platform->roc calls.
TypeSpec
toTypeSpec
(this tells it the types of every arg)roc_load
)The shim would map between those to formats. A standard host would only implement "Platform -> Roc standard FFI". A host directly consuming libroc
would implement "LibRoc".
I'm not understanding the benefit of the typespec thing
The interpreter will run solely using tagged data. You didn't want recursively tagged data like would be traditional if we made a RocObject
that the interpreter used. Without that, we need a type spec so the interpret can understand the underlying data.
that's true inside the running interpreter, yes
As a simple, if the interpreter calls List.len
, it needs to know the element type of the list. This is required so it can decrement the refcount of the elements and free them.
Yes, and if a platform is able to dynamically build up args to call into the interpreter, we want to make sure the args passed in are what the roc function expects
Otherwise, we may blindly use the arg as the wrong type due to only trusting the roc source code and things would go very wrong.
the thing I'm missing is that the caller still has to get the ABI right
like if I give you a pointer to some bytes, and then I also give you a thing that says "hey the pointer to the bytes has this type"
and the type I'm giving you at runtime is always going to be the same type as the type you've statically declared you're expecting
I think it doesn't matter much for the shim use case. The shim gets rid of this type safety anyway by using the static API matching what llvm will use
then the scenario where this helps is:
For direct use of lib roc, more dynamic use cases should be possible
but in all cases we know statically what the expected type is
like there's no value of main.roc
I can pass to it where I don't statically know what types are expected
I guess the scenario could be that I gave it the wrong main.roc maybe?
Sure, I guess then libroc at least needs to expose a function to get the type spec and list of exposed functions in main.roc
That anchors to main.roc
as the source of truth and trusts the platform to follow the spec read from main.roc
. this is more thinking about future libroc
use cases than the current shim plans.
Cause a platform could dynamically load a main.roc file and do something different depending on what is exposed by the main.roc file. A simple example would be supporting plugin versioning. Load main.roc, depending on the return function API, you know the plugin version to run with.
Anyway, assuming the platform -> roc part looks fine, let's move the rest of this discussion over to the libroc thread. I think that dynamic use case is what needs to decide the API for it
I guess if the typespec is like a hash of the types, that's probably very quick?
as in, if it's just for validation
or perhaps it's a hash for quick validation plus an expanded version for more helpful error messages if the types disagree
I imagined it as a tag union of a spec defining the type, but I really haven't thought through it in detail. For the full libroc use case, it needs to be enough information that the interpreter can call a dynamic function with type variables.
I think maybe something worth establishing (because I'm not sure if we're on the same page about it) is that the only benefit of passing a type spec is that it would allow for an extra runtime check which could either give an error or do nothing
like it wouldn't allow any useful amount of introspection, or performance (would be a slight perf downside but negligible if we pass a hash)
wouldn't be necessary for correctness, etc.
and a totally reasonable alternative design would be to just not do a typespec at all, and everything would work exactly the same way except that in the specific case where you have a correct typespec for what you're passing but that typespec doesn't line up with what main.roc expects, the typespec would have let you get a runtime error instead of UB
does that all sound right?
The interpreter will just end up creating the type spec anyway if the platform doesn't. It will be required internally to run the interpretter. Let me put up an example.
oh for sure!
no disagreement there
I'm just talking about the specific question of whether the host should construct its own type spec and pass it to libroc
In the default use case, the shim constructs the type spec
but of note, I'm saying the interpreter has to create its own typespec in either cae
*case
oh wait, are you thinking it doesn't do validation at the boundary?
like it just accepts whatever it was given as truth, and then starts interpreting on that, and if that results in a runtime type mismatch at some point, so be it?
Ok, I feel like this just got mixed up do to talking about libroc and about standard roc calling.
Let me try to anchor it.
Normal platforms only see a single interface. That interface is:
Platform -> Roc standard FFI
- A pointer to write the return data to
- A record of function pointers (only allocators functions and
roc_load
)- N pointers, one for each arg.
That is all they see period. Anything libroc is an implementation detail and not exposed to the platform.
The shim library will deal with whatever is required to map from that interface to libroc.
Are we in agreement with this interface?
I agree that's the standard interface between for the host in a platform + application that's compiled into a single binary
but I don't think that libroc
needs to necessarily be in any way different from that interface
we can choose to have it be different, but it's optional
in other words, libroc
can expose a function which is exactly the same interface as :point_up: except for 3 extra arguments:
only allocators functions and
roc_load
dbg
thing for your information expect
failed, heres some information about thatand then I just give it the arguments exactly as normal
look at the last message in #compiler development > zig compiler - libroc exploration and lets move the conversation there. That is why libroc needs a different interface.
separate thought about calling roc in general
we have the rule that only Box
ed closures can be sent to the host
and in the interpreter, all closures are boxed
I guess that means we have an extra layer of boxing on them
A record of function pointers (only allocators functions and
roc_load
)
edit: I think this just needs to be the allocator functions. The rest can be linked in like normal effects.
Edit to my edit. Given we want to support roc code as a shared library, we actually do want to require passing in all effects period. Forgot about this use case.
Otherwise you get into all of the rdynamic pain and into brittle code symbol code that may not even consistently work cross platform.
Can glue (in future) generate the full type for our struct that needs to be passed in? -- with all the allocators, and entry-points, and effects etc
So if I'm writing a platform in zig or rust for example, it's pretty hard to pass in the wrong things.
Just a record of function pointers. Sounds doable.
I'm thinking of stubbing out roc glue
to take a fake glue script, and generates some zig/rust/go/c (not sure which) glue for a test platform using the new calling roc shape.
I would keep everything in zig for now.
And sure, though I'm not sure how valuable having the cart this far before the horse is.
For glue, maybe we should just focus on the roc side
We are going to the effort of engineering this thing top-down... it's all cart before horse
It needs to be expanding to support telling about all of the effects and such
Luke Boswell said:
We are going to the effort of engineering this thing top-down... it's all cart before horse
I think this is useful for broad strokes, but it often is just wrong for specific details that depend on an unknown implementation. So I would focus more on concrete API than on stubbing all the interfaces loosely
I think a lot of these pieces will be easier to get right once with have a slim slice of the compiler from parser down to interpreter
But don't let me stop your stubbing if you think it is useful
You have outlined a specific API above for calling roc. I am proposing making a test platform that uses this new convention, and stubbing out the roc side of things -- to see it all working together, even though our roc side hasn't been implemented yet.
I'm also particularly interested in validating the fully-embedded roc use cases, which is more covered in that libroc
exploration thread.
For fully embedded roc, I don't think this API is required. Or even recommended. You have direct access to roc internals. So you want a different API. I tried to mention the two apis above though.
My other reason, is I was hoping to figure out how to get our bundled lld
linking thing working, so roc build
produces an executable.
@Brendan Hansknecht and I have been cooking... we've put together a working prototype of roc build
with an example platform, and a fake roc app.o
object file.
All the files are sitting on this branch. We will probably migrate into the roc repo -- now that it's less of a hack. I've included some explanation and notes in the README in case that helps.
https://github.com/lukewilliamboswell/roc-platform-template-zig/tree/calling-roc
If you want to try it out...
$ zig build-obj host/app.zig
$ zig build-lib host/main.zig
$ zig build-exe app.o libmain.a
$ ./app
info: Running Roc APP
Hello
There are a few things different from the current way we do things, and we've tried to implement the ideal based on our previous design discussions.
Now would be a good time to discuss the API, and if there are ways we can improve things.
I wouldn't advise looking closely at host/app.zig
-- it's a real hack and some scary internal things that would never be exposed to the general public.
It's better to look at platform/app.roc
which is the example and pretend the compiler generated our app.o
object file and linked it with the prebuilt-host libmain.a
all behind the scenes.
Two immediate painpoints I noticed (carry overs from the old api):
One silly thing I noticed:
Always using pointers means that a number of functions with trivial args or return types have ABIs that look really poor. Look at the string functions especially.
The pain points I've ran into trying to write a Zig platform and using a Zig allocator for Roc memory allocation. For Zig platform code it'd be really nice if the roc allocation API were ziggier, saves quite a bit of work! I guess that comes at the cost of writing platforms in other languages, though it does seem like it might be easier to implement malloc on top of a zig allocator then the other way around.
Yeah, I think We will always be anchoring to cffi. Looking at the new example, apparate from passing the size to dealloc, I'm not sure we could make it ziggier.
We can't give it a comptime alignment cause cffi can't comptime
You can directly use an arena or other zig allocator now.
We give enough info for most of the calls....free is the only pain point I think.
And I currently don't know how to solve free except my making every list and string allocation 4 to 8 bytes larger to store the size.
Instead with the example platform, we end up making all allocations 4 to 8 bytes larger (including recursive tags and boxes).
Note, with arena allocation, this extra size would go away.
When using an allocator that implicitly tracks the size, allocations are not made any larger at all.
I wonder if we can somehow get some allocator internal info on size information and avoid this explicit tracking
Ah, thanks for explaining, that makes a lot of sense.
I guess it's not so bad, we can write a Zig->Roc allocator helper once and then anyone writing a Zig platform can use that. Maybe it'd be the kind of thing to include in Luke's platform examples repo.
Yeah, I think we could easily make something that wraps a zig allocator to make it track size. And as mentioned above for arenas and for allocators that already internally track size (if you can get access to the internals), you can avoid adding the size to the allocation.
I have written one here:
https://github.com/ostcar/roc-aoc-platform/blob/main/host/RocAllocator.zig
It probably can be improved
over the years we've gone back and forth regarding whether it would be possible for Roc to pass the total allocation size to dealloc automatically
I think whether or not it was possible depended on some seamless slice implementation details?
and I thought we ended up in a place where we actually did always have the info on the Roc side and could pass it to dealloc, but maybe I'm misremembering :sweat_smile:
Yeah, I always end up thinking we can. Then I realize that we store the number of live elements on the heap for refcounted seamless slices. That is not the size. The size is the full capacity, which a seamless slice has no way to get.
ahhh right
yeah I remember thinking this changed when we changed the representation of how that's stored in memory
but I think we actually ended up with a representation where we don't know it
Luke Boswell said:
Brendan Hansknecht and I have been cooking... we've put together a working prototype of
roc build
with an example platform, and a fake rocapp.o
object file.All the files are sitting on this branch. We will probably migrate into the roc repo -- now that it's less of a hack. I've included some explanation and notes in the README in case that helps.
https://github.com/lukewilliamboswell/roc-platform-template-zig/tree/calling-roc
If you want to try it out...
$ zig build-obj host/app.zig $ zig build-lib host/main.zig $ zig build-exe app.o libmain.a $ ./app info: Running Roc APP Hello
There are a few things different from the current way we do things, and we've tried to implement the ideal based on our previous design discussions.
Now would be a good time to discuss the API, and if there are ways we can improve things.
I'm looking at Richard's PR https://github.com/roc-lang/roc/pull/7795 and thinking about this work we did. I wonder if our design has evolved much from this and if it's worth updating this again so we have a test platform ready to go.
I think it is mostly the same, but yeah, some minor differences
@Brendan Hansknecht -- I started writing this as a comment in that other thread... :smiley:
We're fast approaching the point where I'd like to pick your brains on how to actually implement this interpreter shim thing. We're really close to having single module/expression evaluation working in a simple form.
What comes next is fuzzy for me, particularly with the platform integration.
I'm of the opinion that the platform supplies a host executable that has embedded a roc compiler/interpreter (libroc) which can be used to load, compile, and execute .roc
code and call relevant platform provided IO/effects.
Running an app using the roc cli, i.e roc my_demo_app.roc
, roc is then calling this platform supplied executable and passing through the arguments (my_demo_app.roc
etc) for the interpreter to run.
Alternatively, I think is the intended design is that the roc cli compiles the app into a canonical IR (including comp-time eval etc) then produces a static library (with the builtin bitcode included) that represents the app.o
part.
From a platform's perspective they could be calling a fully compiled app (and optimised using LLVM) app.o
, however internally it is using an interpreter at runtime to evaluate the code when the host calls into roc.
Yeah, it will be interesting making the shim and getting this all working together
I have ideas, but will be interesting to see if it works how I expect when we implement it in practice
Last updated: Jul 06 2025 at 12:14 UTC