Stream: contributing

Topic: Ruby glue


view this post on Zulip Brian Carroll (Nov 13 2023 at 12:39):

I started thinking about how we could make roc glue for Ruby.
It would build on some recent experience at work of making an FFI between Ruby and Haskell.
Just want to post some of my thoughts here so that people can comment.

view this post on Zulip Brian Carroll (Nov 13 2023 at 12:39):

Ruby is implemented in C, and they provide a whole library of C functions for FFI that expose some of the internals of the interpreter. For example there's one called rb_ary_new that creates a new Array and rb_ary_push that pushes a value into it. There are also things like rb_define_module, rb_define_class, rb_define_method, etc.
When your extension is loaded into the Ruby runtime, it runs an initialisation function where you define all of your modules and classes.

view this post on Zulip Brian Carroll (Nov 13 2023 at 12:39):

What I realised immediately is that before we can even generate any glue, we first need to implement things like List and Str from the Roc standard library, similar to crates/roc_std in the repo. Probably a Ruby Roc module containing Roc::List, Roc::Dict and so on.
I started playing around with some C code for that. I'm not sure where to put it in the directory structure but I guess we can figure that out.

view this post on Zulip Brian Carroll (Nov 13 2023 at 12:40):

One design decision is how to deal with generics when the host language is dynamically typed.
I see two possible approaches:

  1. Generate monomorphised classes like ListI64 and ListStr depending on what your app needs.
  2. The List could contain some metadata about the type of its elements and how to convert them to and from Ruby.

view this post on Zulip Brian Carroll (Nov 13 2023 at 12:44):

So (1) does most of the hard work at glue generation time and (2) does more at runtime.
But the idea of monomorphic classes in Ruby feels very alien. I'm not sure the experience would be good.
(2) feels more Rubyish to me.

view this post on Zulip Brian Carroll (Nov 13 2023 at 14:37):

Should this go in the Roc repo or my own repo?

view this post on Zulip Anton (Nov 13 2023 at 14:48):

Good question, it doesn't feel connected enough to the repo to be in there but at the same time it may also get out of sync pretty fast if you put it in its own repo.

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:26):

I've been doing the Node and TypeScript stuff in a separate repo, but I'd say go with whichever you prefer!

view this post on Zulip Brian Carroll (Nov 13 2023 at 15:29):

Ok. Is your TypeScript repo public?

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:29):

part of the reason I'm doing it in a separate repo is because there's an npm package that goes with it, which might not apply in your case

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:30):

yeah http://github.com/vendrinc/roc-esbuild

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:30):

I haven't started doing the C glue yet though, not beyond Str

view this post on Zulip Brian Carroll (Nov 13 2023 at 15:30):

Oh I see

view this post on Zulip Brian Carroll (Nov 13 2023 at 15:30):

So it's JSON based?

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:30):

we're doing the JSON intermediary right now, and I'm in the middle of generating TypeScript type definitions

view this post on Zulip Richard Feldman (Nov 13 2023 at 15:31):

yeah it just does roc string to node string in C, and then both sides use that to send json back and forth

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:11):

An update on this:

I started looking into how to model Roc data structures in Ruby.
Ruby values are represented as a tagged union of 28 possible C data structures.
The most suitable one for us is RTypedData. It's actually intended for C extensions to expose their own C data structures to Ruby, so that's perfect for us.

RTypedData contains a few fields

The rb_data_type_struct contains

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:17):

I'll talk through some of the challenges I've identified.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:17):

Firstly: For Rust glue, we were able to create native Rust structures that have exactly the same byte-level representation as our Roc data structures. That means you can transform Rust to Roc and vice-versa just by type casting.
That's just not possible in Ruby. The language doesn't have the capability to describe arbitrary byte layouts without adding a bunch of overhead.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:18):

(This will likely also be true of Python, JavaScript, etc.)

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:22):

So that raises the question of how you represent something like List Str.
It's easy enough to make a Ruby module called Roc that contains a class Roc::List, which uses the RTypedData representation. Its pointer can point at a real Roc List that we are going to pass to a Roc app, or that we just received as a result from a Roc app.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:23):

And we can also have a class Roc::Str that again uses the RTypedData representation but its pointer points at a real Roc Str.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:27):

But there's just an extra layer of wrapping and unwrapping that we don't have with Rust.
Because if you want to access one of the strings in the list like my_list[3], you have to first wrap it in the RTypedData to make it a Ruby value.
And vice-versa when you want to push a string into the list. You have to unwrap it to turn it into a real Roc Str before you can put it into the List.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:28):

This isn't a real issue in Rust because the Str is the same size in Rust and Roc. We don't have to add or remove extra fields, we basically just cast pointers.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:40):

For memory management, I think it probably makes sense to use the Ruby allocator, which is a wrapper around malloc. If you have a Rails app and you want to rewrite parts of it in Roc, you most likely want to use the same allocator rather than a separate one, to keep things simple.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:43):

Using that allocator doesn't automatically mean the objects are garbage-collected. That's a separate thing. So when the Roc program allocates values internally, Ruby GC doesn't need to know. But we do need to give info to the GC about Roc objects that are returned to Ruby. The Ruby code will do some stuff with those objects, then leave them for the GC to clean up later.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:45):

When the GC does come along to clean up later, we need to be able to somehow free all of our Roc allocations.
We have to provide a dfree function for every value type that Ruby could have a reference to. The List Str version of that function will have to traverse all the strings, free them, and then free the list itself.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:47):

This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call decrement from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:49):

By the way, does this come up in the Rust glue and Rust roc_std crate? Do we have a Drop that knows how to traverse all the elements of a List and drop them all?

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:55):

At this point I've decided that I need to break this up into smaller steps!
And the first step is to create a C++ equivalent of our Rust roc_std crate.
Ruby extensions can be built from C++ as well as C. And C++ has generics, so it's much easier to describe things like List Str. Maybe we can translate it to C at some point, but it seems easier to start with C++.
I'm assuming we can also use C++ to extend other dynamic language runtimes like CPython, Node.js, etc.

view this post on Zulip Brian Carroll (Nov 24 2023 at 08:55):

https://github.com/brian-carroll/roc-std-cpp

view this post on Zulip Brian Carroll (Nov 24 2023 at 09:01):

After we have C++ versions of the Roc data structures, we'll be in a position to write C++ wrappers around them that the Ruby runtime will understand (based on RTypedData as described above).

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:13):

Brian Carroll said:

This is the same thing we do in the refcount decrement functions that we generate in the Roc binary. So it would be nice if I could just call decrement from Ruby/C. But currently there doesn't seem to be any way for us to call the refcount functions from the platform! So it has to be re-implemented in the glue code and standard library... I think.

I think it would be really good to expose at least some of these (maybe all). For example, boxed data and closures are both common examples where exposing the refcounting function would be super useful.

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:18):

Yeah it would be really useful

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:25):

In fact, I think it would be very helpful for creating glue for new languages in general to enable roc to expose more functions instead of requiring everything to be reimplemented in glue. Of course they also can be reimplemented, but some are very complex to reimplement (cough, cough RocDict). Maybe add a flag for generated specific types of extra wrappers. We also, could conditionally generate it based on exposed types.

In general exposing some standard library functions and refcounting functions would be super useful. It would be less optimized than reimplementing in the host, but I think correctness and ease of implementation is much more important currently. Maybe we could just enable defining a bunch of function in the platform that all get exposed that generate wrappers (though maybe that would be too inconvenient manually write all exposed functions in roc).

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:35):

Hmm there's a tradeoff between convenience and bloat though.
Say your platform requires mainForHost to return a Dict. If the roc compiler then automatically exposes the entire Dict module with all of its functions, that's a lot of dead code and I'm not sure if we have a good way to eliminate it.

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:40):

Right now I feel like the refcounting functions are more important than the std lib functions. Because you can always write platform-side Roc code.

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:40):

For sure, maybe it should be specific requests from the platform. That said, if the application depends on the code anyway, it is just extra symbols and not dead code (also, I guess if it is automatic, I was thinking about just exposing the essentials not everything).

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:41):

And yeah, refcounting is more complex and important in general, though you technically can also deal with it once we enabling exposing many functions to the platform. Just make a function that returns 2 of the same object or takes 2 and returns one. That is refcount Inc and Dec.

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:42):

once we enabling exposing many functions to the platform

Yeah this would solve a lot of stuff, good point

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:44):

Seems hard to define something that's automatic but also just the essentials. Hard to get agreement on what the essentials are, in a way that makes sense for different platforms.

view this post on Zulip Brian Carroll (Nov 24 2023 at 16:45):

What is the blocker for exposing many functions to the platform?

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:47):

I think it just never got implemented and no one has dug into it. @Folkert de Vries would know more.

view this post on Zulip Brendan Hansknecht (Nov 24 2023 at 16:47):

Also, currently you can technically expose multiple but it has to be done via a record of closures and that has other drawbacks.

view this post on Zulip Richard Feldman (Dec 07 2023 at 17:01):

I think if we exposed the refcounting functions we're already generating in the Roc application, and we did it in a way where they're each in their own section, then a linker step that cleans up dead sections could eliminate those, right?

view this post on Zulip Richard Feldman (Dec 07 2023 at 17:01):

that would be the best for code sharing and minimizing binary bloat

view this post on Zulip Richard Feldman (Dec 07 2023 at 17:37):

as Folkert and I learned when working on effect interpreters, one of the challenges with this is naming - as in, specifically what name do you choose for the refcounting function that's exposed to the host?

the name needs to be:


Last updated: Jul 05 2025 at 12:14 UTC