So far I've been using old RustGlue.roc
for hints as to how types are laid out, but TagUnion (Recursive _)
just seems to reuse TagUnion (NullableWrapped _)
implementation, which uses pointer tagging to store the discriminant. Hower, it seemse to still try to do this when there's more variants than can be stored in the pointer tag. I still have no idea how to figure out these data layouts from the compiler itself. Does anyone know how the Roc compiler actually lays a type like this out, or how I could find out?
Generate an app LLVM IR and the you can see it.
So write a fake or test app, preferably with a super simple platform API, no host even required, then generate the LLVM IR using roc build --no-link --emit-llvm-ir app.roc
I usually cheat and make the platform mainForHost or an effect take the type I'm inspecting as an argument so it's easy to search for in the IR and see.
Sometimes I'll just copy paste the slab of LLVM IR for the effect or at a callsite and give it to claude who seems to be happy to parse it and tell me any details I'm interested in.
I followed your advice! I've been avoiding LLVM IR till now cause I figured it would be too opaque to read without prior experience, but getting an LLM involved really did help!
The data structure in question was:
Expr : [
String Str,
Concat Expr Expr,
Tag3,
Tag4,
Tag5,
Tag6,
Tag7,
Tag8,
Tag9,
Tag11,
Tag12,
Tag13,
Tag14,
Tag15,
Tag16,
Tag17,
Tag18,
Tag19,
]
ChatGPT pointed me to this: { [3 x i64], i8 }
, which looks like space reserved for a RocStr/whatever Concat
's payload looks like followed by a tag union discriminant
It doesn't seem to be a very exact science, as I still have to guess which bytes mean what, but it certainly helps!
I figured if I wrote a function that matches on the tag union, it'd tell me more about the data layout of each variant:
app [main] {
test: platform "platform.roc",
}
main = \expr ->
when expr is
Concat _e1 e2 -> e2
_ -> crash "uh oh"
define internal fastcc {} @_mainForHost_c610e85212d0697cb161d4ba431ba63f273feee7dcb7927c9ff5d74ae6cbfa3() !dbg !12 {
entry:
ret {} zeroinitializer, !dbg !17
}
but this function appears to be a no-op. There's no sign of the "uh oh" string anywhere either. Taking out the crash or moving the function to platform.roc
doesn't appear to make a difference
replacing the string with a longer string that doesn't get SSO'd, or returning that string from the function rather than crashing with it doesn't make it appear in the llvm bytecode either
But having main
be a value with a long string in it rather than a function does seem to make the string appear
I have a a lot to learn here it seems
Is that the right mainForHost function in the IR? There's usually a few generated. Can you share the full IR in a gist? Also what is the platform API that your using here for main?
As in, what does the platform do with the main provided by the app, so how does that relate back to mainForHost or the entrypoint into roc
@Sven van Caem here's my attempt at that Expr showing my method of using the free Claude in Zed Assistant
https://gist.github.com/lukewilliamboswell/0adb8b1e95e4bf44c65a93b634713812
Challenging Claude a bit more on this one, and he's sounding pretty confident. :smiley:
All Expr values are handled through pointers (note all the ptr
return types)
When the String
variant is created (in previously shown IR), it:
When the Concat
variant is created (in previously shown IR), it:
The exposed size function confirms that Exprs are passed by pointer:
define i64 @roc__mainForHost_1_exposed_size() {
ret i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64)
}
This returns 8 (on 64-bit), confirming we're dealing with pointer-sized values in the interface.
And notably:
define void @roc__mainForHost_1_exposed_generic(ptr %0)
Shows that the exposed interface takes/returns Expr values as pointers, which matches our C representation where we'd pass Expr*
around rather than the full 32-byte structure directly.
So yes, this IR fully supports our earlier C representation:
typedef struct Expr {
union {
RocStr string;
struct {
struct Expr* left;
struct Expr* right;
} concat;
} payload;
uint8_t tag;
uint8_t padding[7];
} Expr;
Then digging around in gen_llvm specifically in crates/compiler/gen_llvm/src/llvm/convert.rs
it looks like struct_type_from_union_layout
is related... we make a struct layout from the tag union and then when we generate the LLVM we cast that to a ptr. So I think this is probably also a useful reference for understanding the different kinds of layouts with tag unions.
Oh wow, this really useful!
The platform file I was using looks like this:
platform "test-platform"
requires {} { main : _ }
exposes []
packages {}
imports []
provides [mainForHost]
Expr : [
String Str,
Concat Expr Expr,
Tag3,
Tag4,
Tag5,
Tag6,
Tag7,
Tag8,
Tag9,
Tag11,
Tag12,
Tag13,
Tag14,
Tag15,
Tag16,
Tag17,
Tag18,
Tag19,
]
mainForHost : Expr -> Expr
mainForHost = main
And here's the generated llvm
Luke Boswell said:
Sven van Caem here's my attempt at that Expr showing my method of using the free Claude in Zed Assistant
https://gist.github.com/lukewilliamboswell/0adb8b1e95e4bf44c65a93b634713812
I wonder how you got it to generate what look like constructors for the Concat
and String
variants? I feel like that's what must've allowed Claude to figure out the layouts of those
You can see the way I setup the platform.roc in that gist. I made a few top level functions to do exactly that and isolate the Expr values I wanted to see being constructed.
Oh sweet! So it'll just generate the top level functions so long as they're used
Thanks! This helps a lot
Glad to help.
Last updated: Jul 06 2025 at 12:14 UTC