I started thinking about how we could make roc glue
for Ruby.
It would build on some recent experience at work of making an FFI between Ruby and Haskell.
Just want to post some of my thoughts here so that people can comment.
Ruby is implemented in C, and they provide a whole library of C functions for FFI that expose some of the internals of the interpreter. For example there's one called rb_ary_new
that creates a new Array and rb_ary_push
that pushes a value into it. There are also things like rb_define_module
, rb_define_class
, rb_define_method
, etc.
When your extension is loaded into the Ruby runtime, it runs an initialisation function where you define all of your modules and classes.
What I realised immediately is that before we can even generate any glue, we first need to implement things like List
and Str
from the Roc standard library, similar to crates/roc_std
in the repo. Probably a Ruby Roc
module containing Roc::List
, Roc::Dict
and so on.
I started playing around with some C code for that. I'm not sure where to put it in the directory structure but I guess we can figure that out.
One design decision is how to deal with generics when the host language is dynamically typed.
I see two possible approaches:
ListI64
and ListStr
depending on what your app needs.List
could contain some metadata about the type of its elements and how to convert them to and from Ruby.So (1) does most of the hard work at glue generation time and (2) does more at runtime.
But the idea of monomorphic classes in Ruby feels very alien. I'm not sure the experience would be good.
(2) feels more Rubyish to me.
Should this go in the Roc repo or my own repo?
Good question, it doesn't feel connected enough to the repo to be in there but at the same time it may also get out of sync pretty fast if you put it in its own repo.
I've been doing the Node and TypeScript stuff in a separate repo, but I'd say go with whichever you prefer!
Ok. Is your TypeScript repo public?
part of the reason I'm doing it in a separate repo is because there's an npm package that goes with it, which might not apply in your case
yeah http://github.com/vendrinc/roc-esbuild
I haven't started doing the C glue yet though, not beyond Str
Oh I see
So it's JSON based?
we're doing the JSON intermediary right now, and I'm in the middle of generating TypeScript type definitions
yeah it just does roc string to node string in C, and then both sides use that to send json back and forth
An update on this:
I started looking into how to model Roc data structures in Ruby.
Ruby values are represented as a tagged union of 28 possible C data structures.
The most suitable one for us is RTypedData
. It's actually intended for C extensions to expose their own C data structures to Ruby, so that's perfect for us.
RTypedData
contains a few fields
RBasic
header that's shared by all Ruby objectsvoid*
pointer to your custom C typerb_data_type_struct
that describes your custom data typeThe rb_data_type_struct
contains
.inspect
, etc.)dsize
(total memory usage), dmark
, dfree
, dcompact
I'll talk through some of the challenges I've identified.
Firstly: For Rust glue, we were able to create native Rust structures that have exactly the same byte-level representation as our Roc data structures. That means you can transform Rust to Roc and vice-versa just by type casting.
That's just not possible in Ruby. The language doesn't have the capability to describe arbitrary byte layouts without adding a bunch of overhead.
(This will likely also be true of Python, JavaScript, etc.)
So that raises the question of how you represent something like List Str
.
It's easy enough to make a Ruby module called Roc
that contains a class Roc::List
, which uses the RTypedData
representation. Its pointer can point at a real Roc List that we are going to pass to a Roc app, or that we just received as a result from a Roc app.
And we can also have a class Roc::Str
that again uses the RTypedData
representation but its pointer points at a real Roc Str
.
But there's just an extra layer of wrapping and unwrapping that we don't have with Rust.
Because if you want to access one of the strings in the list like my_list[3]
, you have to first wrap it in the RTypedData
to make it a Ruby value.
And vice-versa when you want to push a string into the list. You have to unwrap it to turn it into a real Roc Str
before you can put it into the List
.
This isn't a real issue in Rust because the Str
is the same size in Rust and Roc. We don't have to add or remove extra fields, we basically just cast pointers.
For memory management, I think it probably makes sense to use the Ruby allocator, which is a wrapper around malloc
. If you have a Rails app and you want to rewrite parts of it in Roc, you most likely want to use the same allocator rather than a separate one, to keep things simple.
Using that allocator doesn't automatically mean the objects are garbage-collected. That's a separate thing. So when the Roc program allocates values internally, Ruby GC doesn't need to know. But we do need to give info to the GC about Roc objects that are returned to Ruby. The Ruby code will do some stuff with those objects, then leave them for the GC to clean up later.
When the GC does come along to clean up later, we need to be able to somehow free all of our Roc allocations.
We have to provide a dfree
function for every value type that Ruby could have a reference to. The List Str
version of that function will have to traverse all the strings, free them, and then free the list itself.
This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call decrement
from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.
By the way, does this come up in the Rust glue and Rust roc_std
crate? Do we have a Drop
that knows how to traverse all the elements of a List
and drop them all?
At this point I've decided that I need to break this up into smaller steps!
And the first step is to create a C++ equivalent of our Rust roc_std
crate.
Ruby extensions can be built from C++ as well as C. And C++ has generics, so it's much easier to describe things like List Str
. Maybe we can translate it to C at some point, but it seems easier to start with C++.
I'm assuming we can also use C++ to extend other dynamic language runtimes like CPython, Node.js, etc.
https://github.com/brian-carroll/roc-std-cpp
After we have C++ versions of the Roc data structures, we'll be in a position to write C++ wrappers around them that the Ruby runtime will understand (based on RTypedData
as described above).
Brian Carroll said:
This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call
decrement
from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.
I think it would be really good to expose at least some of these (maybe all). For example, boxed data and closures are both common examples where exposing the refcounting function would be super useful.
Yeah it would be really useful
In fact, I think it would be very helpful for creating glue for new languages in general to enable roc to expose more functions instead of requiring everything to be reimplemented in glue. Of course they also can be reimplemented, but some are very complex to reimplement (cough, cough RocDict). Maybe add a flag for generated specific types of extra wrappers. We also, could conditionally generate it based on exposed types.
In general exposing some standard library functions and refcounting functions would be super useful. It would be less optimized than reimplementing in the host, but I think correctness and ease of implementation is much more important currently. Maybe we could just enable defining a bunch of function in the platform that all get exposed that generate wrappers (though maybe that would be too inconvenient manually write all exposed functions in roc).
Hmm there's a tradeoff between convenience and bloat though.
Say your platform requires mainForHost
to return a Dict
. If the roc compiler then automatically exposes the entire Dict
module with all of its functions, that's a lot of dead code and I'm not sure if we have a good way to eliminate it.
Right now I feel like the refcounting functions are more important than the std lib functions. Because you can always write platform-side Roc code.
For sure, maybe it should be specific requests from the platform. That said, if the application depends on the code anyway, it is just extra symbols and not dead code (also, I guess if it is automatic, I was thinking about just exposing the essentials not everything).
And yeah, refcounting is more complex and important in general, though you technically can also deal with it once we enabling exposing many functions to the platform. Just make a function that returns 2 of the same object or takes 2 and returns one. That is refcount Inc and Dec.
once we enabling exposing many functions to the platform
Yeah this would solve a lot of stuff, good point
Seems hard to define something that's automatic but also just the essentials. Hard to get agreement on what the essentials are, in a way that makes sense for different platforms.
What is the blocker for exposing many functions to the platform?
I think it just never got implemented and no one has dug into it. @Folkert de Vries would know more.
Also, currently you can technically expose multiple but it has to be done via a record of closures and that has other drawbacks.
I think if we exposed the refcounting functions we're already generating in the Roc application, and we did it in a way where they're each in their own section, then a linker step that cleans up dead sections could eliminate those, right?
that would be the best for code sharing and minimizing binary bloat
as Folkert and I learned when working on effect interpreters, one of the challenges with this is naming - as in, specifically what name do you choose for the refcounting function that's exposed to the host?
the name needs to be:
refcount1
, refcount2
, etc. because then if you have a complicated record coming across from the platform to the host, how do you know which part of the record it's referring to?)Last updated: Jul 05 2025 at 12:14 UTC