Ruby glue · contributing · Zulip Chat Archive

I started thinking about how we could make roc glue for Ruby.
It would build on some recent experience at work of making an FFI between Ruby and Haskell.
Just want to post some of my thoughts here so that people can comment.

Brian Carroll (Nov 13 2023 at 12:39):

Ruby is implemented in C, and they provide a whole library of C functions for FFI that expose some of the internals of the interpreter. For example there's one called rb_ary_new that creates a new Array and rb_ary_push that pushes a value into it. There are also things like rb_define_module, rb_define_class, rb_define_method, etc.
When your extension is loaded into the Ruby runtime, it runs an initialisation function where you define all of your modules and classes.

Brian Carroll (Nov 13 2023 at 12:39):

What I realised immediately is that before we can even generate any glue, we first need to implement things like List and Str from the Roc standard library, similar to crates/roc_std in the repo. Probably a Ruby Roc module containing Roc::List, Roc::Dict and so on.
I started playing around with some C code for that. I'm not sure where to put it in the directory structure but I guess we can figure that out.

Brian Carroll (Nov 13 2023 at 12:40):

One design decision is how to deal with generics when the host language is dynamically typed.
I see two possible approaches:

Generate monomorphised classes like ListI64 and ListStr depending on what your app needs.
The List could contain some metadata about the type of its elements and how to convert them to and from Ruby.

Brian Carroll (Nov 13 2023 at 12:44):

So (1) does most of the hard work at glue generation time and (2) does more at runtime.
But the idea of monomorphic classes in Ruby feels very alien. I'm not sure the experience would be good.
(2) feels more Rubyish to me.

Brian Carroll (Nov 13 2023 at 14:37):

Should this go in the Roc repo or my own repo?

Anton (Nov 13 2023 at 14:48):

Good question, it doesn't feel connected enough to the repo to be in there but at the same time it may also get out of sync pretty fast if you put it in its own repo.

Richard Feldman (Nov 13 2023 at 15:26):

I've been doing the Node and TypeScript stuff in a separate repo, but I'd say go with whichever you prefer!

Brian Carroll (Nov 13 2023 at 15:29):

Ok. Is your TypeScript repo public?

Richard Feldman (Nov 13 2023 at 15:29):

part of the reason I'm doing it in a separate repo is because there's an npm package that goes with it, which might not apply in your case

Richard Feldman (Nov 13 2023 at 15:30):

yeah http://github.com/vendrinc/roc-esbuild

Richard Feldman (Nov 13 2023 at 15:30):

I haven't started doing the C glue yet though, not beyond Str

Brian Carroll (Nov 13 2023 at 15:30):

Oh I see

Brian Carroll (Nov 13 2023 at 15:30):

So it's JSON based?

Richard Feldman (Nov 13 2023 at 15:30):

we're doing the JSON intermediary right now, and I'm in the middle of generating TypeScript type definitions

Richard Feldman (Nov 13 2023 at 15:31):

yeah it just does roc string to node string in C, and then both sides use that to send json back and forth

Brian Carroll (Nov 24 2023 at 08:11):

An update on this:

I started looking into how to model Roc data structures in Ruby.
Ruby values are represented as a tagged union of 28 possible C data structures.
The most suitable one for us is RTypedData. It's actually intended for C extensions to expose their own C data structures to Ruby, so that's perfect for us.

RTypedData contains a few fields

An RBasic header that's shared by all Ruby objects
A void* pointer to your custom C type
A pointer to a shared singleton rb_data_type_struct that describes your custom data type

The rb_data_type_struct contains

A name string (for stack traces, debug, .inspect, etc.)
4 function pointer callbacks for the garbage collector: dsize (total memory usage), dmark, dfree, dcompact
Some flags you can use for whatever you want
An optional pointer to the parent class of your custom class, if you want to define that

Brian Carroll (Nov 24 2023 at 08:17):

I'll talk through some of the challenges I've identified.

Brian Carroll (Nov 24 2023 at 08:17):

Firstly: For Rust glue, we were able to create native Rust structures that have exactly the same byte-level representation as our Roc data structures. That means you can transform Rust to Roc and vice-versa just by type casting.
That's just not possible in Ruby. The language doesn't have the capability to describe arbitrary byte layouts without adding a bunch of overhead.

Brian Carroll (Nov 24 2023 at 08:18):

(This will likely also be true of Python, JavaScript, etc.)

Brian Carroll (Nov 24 2023 at 08:22):

So that raises the question of how you represent something like List Str.
It's easy enough to make a Ruby module called Roc that contains a class Roc::List, which uses the RTypedData representation. Its pointer can point at a real Roc List that we are going to pass to a Roc app, or that we just received as a result from a Roc app.

Brian Carroll (Nov 24 2023 at 08:23):

And we can also have a class Roc::Str that again uses the RTypedData representation but its pointer points at a real Roc Str.

Brian Carroll (Nov 24 2023 at 08:27):

But there's just an extra layer of wrapping and unwrapping that we don't have with Rust.
Because if you want to access one of the strings in the list like my_list[3], you have to first wrap it in the RTypedData to make it a Ruby value.
And vice-versa when you want to push a string into the list. You have to unwrap it to turn it into a real Roc Str before you can put it into the List.

Brian Carroll (Nov 24 2023 at 08:28):

This isn't a real issue in Rust because the Str is the same size in Rust and Roc. We don't have to add or remove extra fields, we basically just cast pointers.

Brian Carroll (Nov 24 2023 at 08:40):

For memory management, I think it probably makes sense to use the Ruby allocator, which is a wrapper around malloc. If you have a Rails app and you want to rewrite parts of it in Roc, you most likely want to use the same allocator rather than a separate one, to keep things simple.

Brian Carroll (Nov 24 2023 at 08:43):

Using that allocator doesn't automatically mean the objects are garbage-collected. That's a separate thing. So when the Roc program allocates values internally, Ruby GC doesn't need to know. But we do need to give info to the GC about Roc objects that are returned to Ruby. The Ruby code will do some stuff with those objects, then leave them for the GC to clean up later.

Brian Carroll (Nov 24 2023 at 08:45):

When the GC does come along to clean up later, we need to be able to somehow free all of our Roc allocations.
We have to provide a dfree function for every value type that Ruby could have a reference to. The List Str version of that function will have to traverse all the strings, free them, and then free the list itself.

Brian Carroll (Nov 24 2023 at 08:47):

This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call decrement from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.

Brian Carroll (Nov 24 2023 at 08:49):

By the way, does this come up in the Rust glue and Rust roc_std crate? Do we have a Drop that knows how to traverse all the elements of a List and drop them all?

Brian Carroll (Nov 24 2023 at 08:55):

At this point I've decided that I need to break this up into smaller steps!
And the first step is to create a C++ equivalent of our Rust roc_std crate.
Ruby extensions can be built from C++ as well as C. And C++ has generics, so it's much easier to describe things like List Str. Maybe we can translate it to C at some point, but it seems easier to start with C++.
I'm assuming we can also use C++ to extend other dynamic language runtimes like CPython, Node.js, etc.

Brian Carroll (Nov 24 2023 at 08:55):

https://github.com/brian-carroll/roc-std-cpp

Brian Carroll (Nov 24 2023 at 09:01):

After we have C++ versions of the Roc data structures, we'll be in a position to write C++ wrappers around them that the Ruby runtime will understand (based on RTypedData as described above).

Brendan Hansknecht (Nov 24 2023 at 16:13):

Brian Carroll said:

This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call decrement from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.

I think it would be really good to expose at least some of these (maybe all). For example, boxed data and closures are both common examples where exposing the refcounting function would be super useful.

Brian Carroll (Nov 24 2023 at 16:18):

Yeah it would be really useful

Brendan Hansknecht (Nov 24 2023 at 16:25):

In fact, I think it would be very helpful for creating glue for new languages in general to enable roc to expose more functions instead of requiring everything to be reimplemented in glue. Of course they also can be reimplemented, but some are very complex to reimplement (cough, cough RocDict). Maybe add a flag for generated specific types of extra wrappers. We also, could conditionally generate it based on exposed types.

In general exposing some standard library functions and refcounting functions would be super useful. It would be less optimized than reimplementing in the host, but I think correctness and ease of implementation is much more important currently. Maybe we could just enable defining a bunch of function in the platform that all get exposed that generate wrappers (though maybe that would be too inconvenient manually write all exposed functions in roc).

Brian Carroll (Nov 24 2023 at 16:35):

Hmm there's a tradeoff between convenience and bloat though.
Say your platform requires mainForHost to return a Dict. If the roc compiler then automatically exposes the entire Dict module with all of its functions, that's a lot of dead code and I'm not sure if we have a good way to eliminate it.

Brian Carroll (Nov 24 2023 at 16:40):

Right now I feel like the refcounting functions are more important than the std lib functions. Because you can always write platform-side Roc code.

Brendan Hansknecht (Nov 24 2023 at 16:40):

For sure, maybe it should be specific requests from the platform. That said, if the application depends on the code anyway, it is just extra symbols and not dead code (also, I guess if it is automatic, I was thinking about just exposing the essentials not everything).

Brendan Hansknecht (Nov 24 2023 at 16:41):

And yeah, refcounting is more complex and important in general, though you technically can also deal with it once we enabling exposing many functions to the platform. Just make a function that returns 2 of the same object or takes 2 and returns one. That is refcount Inc and Dec.

Brian Carroll (Nov 24 2023 at 16:42):

once we enabling exposing many functions to the platform

Yeah this would solve a lot of stuff, good point

Brian Carroll (Nov 24 2023 at 16:44):

Seems hard to define something that's automatic but also just the essentials. Hard to get agreement on what the essentials are, in a way that makes sense for different platforms.

Brian Carroll (Nov 24 2023 at 16:45):

What is the blocker for exposing many functions to the platform?

Brendan Hansknecht (Nov 24 2023 at 16:47):

I think it just never got implemented and no one has dug into it. @Folkert de Vries would know more.

Brendan Hansknecht (Nov 24 2023 at 16:47):

Also, currently you can technically expose multiple but it has to be done via a record of closures and that has other drawbacks.

Richard Feldman (Dec 07 2023 at 17:01):

I think if we exposed the refcounting functions we're already generating in the Roc application, and we did it in a way where they're each in their own section, then a linker step that cleans up dead sections could eliminate those, right?

Richard Feldman (Dec 07 2023 at 17:01):

that would be the best for code sharing and minimizing binary bloat

Richard Feldman (Dec 07 2023 at 17:37):

as Folkert and I learned when working on effect interpreters, one of the challenges with this is naming - as in, specifically what name do you choose for the refcounting function that's exposed to the host?

the name needs to be:

unique
human-readable enough that the host author can correctly know what calling it will do (e.g. we can't just call them refcount1, refcount2, etc. because then if you have a complicated record coming across from the platform to the host, how do you know which part of the record it's referring to?)
deterministic from the type (this turns out to be tricky to do quickly, since the integer IDs we have internally for monomorphized types can vary depending on what order modules happen to get worked on in parallel)

Last updated: Aug 17 2025 at 12:14 UTC

Stream: contributing

Topic: Ruby glue

Brian Carroll (Nov 13 2023 at 12:39):

Brian Carroll (Nov 13 2023 at 12:39):

Brian Carroll (Nov 13 2023 at 12:39):

Brian Carroll (Nov 13 2023 at 12:40):

Brian Carroll (Nov 13 2023 at 12:44):

Brian Carroll (Nov 13 2023 at 14:37):

Anton (Nov 13 2023 at 14:48):

Richard Feldman (Nov 13 2023 at 15:26):

Brian Carroll (Nov 13 2023 at 15:29):

Richard Feldman (Nov 13 2023 at 15:29):

Richard Feldman (Nov 13 2023 at 15:30):

Richard Feldman (Nov 13 2023 at 15:30):

Brian Carroll (Nov 13 2023 at 15:30):

Brian Carroll (Nov 13 2023 at 15:30):

Richard Feldman (Nov 13 2023 at 15:30):

Richard Feldman (Nov 13 2023 at 15:31):

Brian Carroll (Nov 24 2023 at 08:11):

Brian Carroll (Nov 24 2023 at 08:17):

Brian Carroll (Nov 24 2023 at 08:17):

Brian Carroll (Nov 24 2023 at 08:18):

Brian Carroll (Nov 24 2023 at 08:22):

Brian Carroll (Nov 24 2023 at 08:23):

Brian Carroll (Nov 24 2023 at 08:27):

Brian Carroll (Nov 24 2023 at 08:28):

Brian Carroll (Nov 24 2023 at 08:40):

Brian Carroll (Nov 24 2023 at 08:43):

Brian Carroll (Nov 24 2023 at 08:45):

Brian Carroll (Nov 24 2023 at 08:47):

Brian Carroll (Nov 24 2023 at 08:49):

Brian Carroll (Nov 24 2023 at 08:55):

Brian Carroll (Nov 24 2023 at 08:55):

Brian Carroll (Nov 24 2023 at 09:01):

Brendan Hansknecht (Nov 24 2023 at 16:13):

Brian Carroll (Nov 24 2023 at 16:18):

Brendan Hansknecht (Nov 24 2023 at 16:25):

Brian Carroll (Nov 24 2023 at 16:35):

Brian Carroll (Nov 24 2023 at 16:40):

Brendan Hansknecht (Nov 24 2023 at 16:40):

Brendan Hansknecht (Nov 24 2023 at 16:41):

Brian Carroll (Nov 24 2023 at 16:42):

Brian Carroll (Nov 24 2023 at 16:44):

Brian Carroll (Nov 24 2023 at 16:45):

Brendan Hansknecht (Nov 24 2023 at 16:47):

Brendan Hansknecht (Nov 24 2023 at 16:47):

Richard Feldman (Dec 07 2023 at 17:01):

Richard Feldman (Dec 07 2023 at 17:01):

Richard Feldman (Dec 07 2023 at 17:37):