zig compiler - libroc exploration · compiler development

Stream: compiler development

Topic: zig compiler - libroc exploration

Luke Boswell (Feb 02 2025 at 23:04):

I would like to throw some ideas around for libroc -- specifically for the usecase of embedding roc.

I've discussed some of these usecases previously, but for the sake of discussion -- let's say I'm building a roc playground. I'd like to build a WASM module that runs in the browser. It accepts roc source code and the name of a top level ident e.g. main! and any arguments. The playground then uses libroc to parse, typecheck, and interpret the program. The platform is simple and the only effect available is print! which the playground uses to append to a textarea.

From this example I have a few questions...

How does roc load modules in a WASM vm?
- libroc probably needs to abstracts the file system and not be calling OS syscalls directly.
- Roc's "module loader" should probably take some function that it uses to load files e.g. soemthing like Str -> Result (List U8) [NotFound] (using roc syntax).
- then for our WASM playground, we could have the "files" in small UI text editors in the browser, and JS can pass use those in as required.
How does libroc expose the API for working with the compiler stages.
- Is there a function for calling parse, check_types, interpret and passing the IR between these stages?
- Are the IR's easy to work with? maybe these are serialised and deserialised to a common format somehow?
Could libroc be a Zig package? or should it be a C library?
Does the libroc compile nicely to WASM 32-bit target?
Does libroc have an interpreter in it? or does the playground just have a canonicalised IR and need to do it's own interpreting? What about builtin things like working with Dec or Dict?

Does this use case sound reasonable? Is there anything obvious here I'm missing?

Luke Boswell (Feb 02 2025 at 23:09):

There is a lot here that is new to me, so I would appreciate any feedback. I may be off on a random tangent hallucinating ways to doing things.

Richard Feldman (Feb 02 2025 at 23:11):

hm, so I just realized that actually libroc might be the wrong way to think about this :thinking:

Richard Feldman (Feb 02 2025 at 23:11):

I don't think we'd want to produce a single library for this

Richard Feldman (Feb 02 2025 at 23:12):

rather, I think we'd want to have the platform be able to build a custom library which includes both the interpreter and the platform-specific entrypoint functions

Richard Feldman (Feb 02 2025 at 23:13):

hmm

Richard Feldman (Feb 02 2025 at 23:14):

actually maybe it doesn't matter after all, nm it can work either way

Brendan Hansknecht (Feb 02 2025 at 23:19):

I assume it would be a zig/c library that can compile to wasm (or any other target). It also would be exposed by the roc executable if you load it as a shared library (assuming we can make it work).

Brendan Hansknecht (Feb 02 2025 at 23:19):

I am assuming it will be how the shim works. The shim will load the roc compiler as a shared library and launch the interpreter. So It will directly use libroc (I guess that forces it to be a cffi library)

Luke Boswell (Feb 02 2025 at 23:20):

Lol, is this the library version of inception?

Brendan Hansknecht (Feb 02 2025 at 23:41):

Its a really nice technique to bundle roc with libroc assuming it works (I feel like I have seen this before, but not sure it works on all platforms)

Richard Feldman (Feb 02 2025 at 23:41):

ok so let's say there's a libroc which exposes a C function which:

accepts the binary contents of the app module to run (e.g. the bytes found in main.roc, or if it's wasm, some bytes in memory)
accepts the usual struct of function pointers for the allocator etc
accepts the Roc argument as a pointer (aside: I think we decided elsewhere that we were going to always have the Roc functions accept a single arg from the host as a pointer, and then tuple them up if desired, in order to simplify the ABI - right?)
accepts a pointer to the return value, which the roc function will populate

Brendan Hansknecht (Feb 02 2025 at 23:43):

accepts the binary contents of the app module to run (e.g. the bytes found in main.roc, or if it's wasm, some bytes in memory)

Not this. It will get the file data by calling roc_load which will be in the struct of function pointers with the allocator. Cause it will need roc_load anyway to load other files.

Richard Feldman (Feb 02 2025 at 23:44):

and then it also accepts a function pointer which takes a path and returns the source bytes associated with that path, or an error if they couldn't be read

Richard Feldman (Feb 02 2025 at 23:45):

yeah :point_up: seems necessary to allow loading other modules

Richard Feldman (Feb 02 2025 at 23:45):

and I guess we could say give me a starting point path and I'll go look it up in there, but kinda seems like unnecessary indirection

Brendan Hansknecht (Feb 02 2025 at 23:45):

Yeah, just need the function, shouldn't need main.roc cause you can use the function to get main.roc

Richard Feldman (Feb 02 2025 at 23:45):

but that works too, sure

Brendan Hansknecht (Feb 02 2025 at 23:45):

and I guess we could say give me a starting point path and I'll go look it up in there, but kinda seems like unnecessary indirection

I think we need this for the shim

Brendan Hansknecht (Feb 02 2025 at 23:46):

Cause the shim will get compiled once, but main.roc source may change between calls

Richard Feldman (Feb 02 2025 at 23:46):

ah sure

Richard Feldman (Feb 02 2025 at 23:46):

ok fair enough!

Richard Feldman (Feb 02 2025 at 23:47):

anyway, so then the other piece of this is getting help from glue to correctly translate the Roc args and return value to/from the host language

Brendan Hansknecht (Feb 02 2025 at 23:48):

aside: I think we decided elsewhere that we were going to always have the Roc functions accept a single arg from the host as a pointer, and then tuple them up if desired, in order to simplify the ABI - right?

I would rather fix c abi, but either is ultimately fine.

For libroc I think it should take a list of tags to specify the types and a list of pointers to specify the args.

Brendan Hansknecht (Feb 02 2025 at 23:48):

The shim would deal with filling in that information (maps stadard cffi like we use with llvm to this interpreter form). Otherwise, the platform author is required to fill in the info if they want to use libroc directly.

Brendan Hansknecht (Feb 02 2025 at 23:50):

Taking types as a list separate from the actual args avoids the nesting problem where you have to box everything. Instead it can use the flat representation, but have a nested spec that explains the underlying type layout.

Richard Feldman (Feb 02 2025 at 23:55):

Brendan Hansknecht said:

aside: I think we decided elsewhere that we were going to always have the Roc functions accept a single arg from the host as a pointer, and then tuple them up if desired, in order to simplify the ABI - right?

I would rather fix c abi, but either is ultimately fine.

we can always do that later and relax the restriction, but this is astronomically easier to make correct :big_smile:

Richard Feldman (Feb 02 2025 at 23:56):

Brendan Hansknecht said:

For libroc I think it should take a list of tags to specify the types and a list of pointers to specify the args.

hm, so what's the benefit of this compared to using glue to just generate the correct calls? :thinking:

Richard Feldman (Feb 02 2025 at 23:56):

a downside is the runtime validation on every call

Anthony Bullard (Feb 03 2025 at 00:11):

Would libroc allow for it to be fully embeddable? I.e., could the control be inverted?

Brendan Hansknecht (Feb 03 2025 at 00:28):

Yes, that is exactly the plan. Fully embedded and control inverted

Brendan Hansknecht (Feb 03 2025 at 00:28):

So not necessarilly any glue when using libroc

Brendan Hansknecht (Feb 03 2025 at 00:30):

In my mind, using libroc directly should be as nice as using embedded python or Lua interpreters.

Anthony Bullard (Feb 03 2025 at 00:32):

That would be awesome, would make it easier to make my Love2D for Roc port...

Anthony Bullard (Feb 03 2025 at 00:33):

Been doing Love2D with my daughter for a month, and while I don't mind Lua, I like Roc a lot more. :-)

Brendan Hansknecht (Feb 03 2025 at 00:35):

In a perfect world for libroc, I don't even need a main.roc. I can load any module and call onto it directly and even run a string of roc code directly.

Richard Feldman (Feb 03 2025 at 00:57):

hm, I don't think that works

Richard Feldman (Feb 03 2025 at 00:57):

it's more common than not for a module to refer to package shorthands like cli.

Richard Feldman (Feb 03 2025 at 00:57):

to know what those resolve to, you have to have loaded a main.roc

Brendan Hansknecht (Feb 03 2025 at 00:58):

Assuming we want to eventually enable this:

In my mind, using libroc directly should be as nice as using embedded python or Lua interpreters.

I think something more dynamic is required

Richard Feldman (Feb 03 2025 at 00:58):

so that could only possibly work in the specific scenario where I'm loading a module which only imports other local modules and none of them ever tries to import from any package whatsoever

Brendan Hansknecht (Feb 03 2025 at 00:58):

Though maybe it needs to be at the package boundary, not sure

Brendan Hansknecht (Feb 03 2025 at 00:59):

Like I should be able to load some random roc library and call code in it

Richard Feldman (Feb 03 2025 at 00:59):

it seems like in practice ~100% of use cases for this will want a package

Richard Feldman (Feb 03 2025 at 00:59):

sure, that's fine

Richard Feldman (Feb 03 2025 at 00:59):

well, except for the restrictions we have on host-exposed functions :sweat_smile:

Richard Feldman (Feb 03 2025 at 00:59):

like closures have to be boxed

Brendan Hansknecht (Feb 03 2025 at 01:00):

I assume all closures in the interpretter will be boxed, so that should be fine

Richard Feldman (Feb 03 2025 at 01:00):

hmmm interesting

Brendan Hansknecht (Feb 03 2025 at 01:01):

I think they have to be cause we won't have run any form of specialization

Richard Feldman (Feb 03 2025 at 01:01):

yeah for sure, I'm just trying to think of the layout implications

Richard Feldman (Feb 03 2025 at 01:01):

in the non-libroc case

Richard Feldman (Feb 03 2025 at 01:01):

I'm gonna put it on the other thread haha

Luke Boswell (Feb 03 2025 at 01:02):

I'm very glad. I brought this whole topic up... :smiley:

Richard Feldman (Feb 03 2025 at 01:03):

yeah I guess loading any package or app should work

Richard Feldman (Feb 03 2025 at 01:03):

and then once you've loaded it, you can call anything in any of its exposed modules

Richard Feldman (Feb 03 2025 at 01:04):

but I still don't think that affects this:

Richard Feldman said:

in other words, libroc can expose a function which is exactly the same interface as :point_up: except for 3 extra arguments:

The path to main.roc

The function to go from a path to a .roc file to its source bytes

The name of the entrypoint function within main.roc that I want to call

Richard Feldman (Feb 03 2025 at 01:04):

for reference, the :point_up: was referring to:

Brendan Hansknecht said:

Normal platforms only see a single interface. That interface is:

Platform -> Roc standard FFI

A pointer to write the return data to

A record of function pointers (only allocators functions and roc_load)

N pointers, one for each arg.

That is all they see period. Anything libroc is an implementation detail and not exposed to the platform.

Brendan Hansknecht (Feb 03 2025 at 01:04):

Via libroc, I should be able to call a function with a type variable. So that requires specifying the type somehow

Richard Feldman (Feb 03 2025 at 01:05):

ahh interesting

Richard Feldman (Feb 03 2025 at 01:05):

ok yeah that's something hosts can't do

Richard Feldman (Feb 03 2025 at 01:05):

but seems reasonable to do when loading a package or something at runtime

Brendan Hansknecht (Feb 03 2025 at 01:05):

:100:

Richard Feldman (Feb 03 2025 at 01:05):

cool, that makes sense to me then! :thumbs_up:

Richard Feldman (Feb 03 2025 at 01:06):

btw I do think in general that if I'm embedding Roc into a larger program, I'm going to want to use glue to generate the bindings anyway

Richard Feldman (Feb 03 2025 at 01:06):

just because that makes it easier to get the types right

Brendan Hansknecht (Feb 03 2025 at 01:07):

That's fair, though the interpreter has to get types right somehow without glue. So it can't be that bad to use

Luke Boswell (Feb 03 2025 at 01:12):

Brendan Hansknecht said:

Via libroc, I should be able to call a function with a type variable. So that requires specifying the type somehow

Is this definitely something we want to support? would this be used for building a REPL or similar thing around libroc?

I thought the "interface" of a roc program was defined in the platform's main.roc file with the exposed entry-points. (and anything crossing the roc-host boundary has a fixed known size and concrete type)

Brendan Hansknecht (Feb 03 2025 at 01:14):

If we don't support it, I don't think there is much of a point to supporting the libroc use case. Just use the standard flow instead.

Brendan Hansknecht (Feb 03 2025 at 01:14):

When embedding Lua or python, one of the huge gains is the dynamic ability to interact with anything

Brendan Hansknecht (Feb 03 2025 at 01:17):

Python makes this possible by making everything a pyobject. That encodes all of the type info.

Brendan Hansknecht (Feb 03 2025 at 01:19):

I think something similar will be needed for the repl flow. At any breakpoint in the repl, I should be able to query a variable for all methods it has and then call one. That call might have a type variable in it.

Brendan Hansknecht (Feb 03 2025 at 01:20):

I should be able to return from the repl to the platform at any point. Then the platform should be able to do something with the object I return (whatever type it may be).

Brendan Hansknecht (Feb 03 2025 at 01:20):

I highly suggest playing around with embedded Lua or python. There is a lot of flexibility (though often also verbosity in generating objects of specific tagged types)

Luke Boswell (Feb 04 2025 at 09:53):

I thought I'd make a PR to get some feedback
https://github.com/roc-lang/roc/pull/7575

Brendan Hansknecht (Feb 04 2025 at 16:51):

I personally wouldn't setup a libroc now. I think libroc will be tailored around the interpreter and cut out a lot of the rest of the compiler. So don't really want random stuff going in now before we know exactly what it needs.

Brendan Hansknecht (Feb 04 2025 at 16:51):

Especially given libroc is more an experimental idea than something we know will work out.

Brendan Hansknecht (Feb 04 2025 at 16:52):

Probably will naturally get setup when trying to hook up the first platform to the interpreter and that will probably lead to first a static config and then a lot of learnings

Luke Boswell (Feb 04 2025 at 19:50):

@Brendan Hansknecht said

If we make a full featured lib roc, we should probably make main.zig strictly build roc via the same interfaces as libroc. That ensures they stay in sync.

That said, I'm not sold we want a full featured lib roc. I think we likely want a super small shim lib roc that only has the ability to interact with the intepreter.

Is there any reason we wouldn't want a full featured libroc?

I assumed we would implement the cli, repl, formatter, LSP etc using it.

Brendan Hansknecht (Feb 04 2025 at 20:01):

In my mind, Libroc is a c library. Those would all just be zig libraries and part of the regular code base (with only the exception being the lsp I guess).

Brendan Hansknecht (Feb 04 2025 at 20:02):

Even for the lsp, it would not use Libroc with the proposed plan. It would work like glue where the compiler is the platform

Brendan Hansknecht (Feb 04 2025 at 20:02):

And the compiler loads a shared library that is the lsp (or runs it via the interpreter)

Luke Boswell (Feb 04 2025 at 22:11):

Libroc is a c library

Even if it's just a super simple implementation. I was thinking of making a platform/host example of fully embedding roc using rust/zig.

I was thinking I could start on things like the playground, or LSP, even if most of it is stubbed out... so we can get a feeling for how it will all come together in future.

Last updated: Jul 26 2025 at 12:14 UTC