Stream: compiler development

Topic: `TagUnion (Recursive _)` memory layout


view this post on Zulip Sven van Caem (Nov 06 2024 at 10:25):

So far I've been using old RustGlue.roc for hints as to how types are laid out, but TagUnion (Recursive _) just seems to reuse TagUnion (NullableWrapped _) implementation, which uses pointer tagging to store the discriminant. Hower, it seemse to still try to do this when there's more variants than can be stored in the pointer tag. I still have no idea how to figure out these data layouts from the compiler itself. Does anyone know how the Roc compiler actually lays a type like this out, or how I could find out?

view this post on Zulip Luke Boswell (Nov 06 2024 at 10:49):

Generate an app LLVM IR and the you can see it.

view this post on Zulip Luke Boswell (Nov 06 2024 at 10:50):

So write a fake or test app, preferably with a super simple platform API, no host even required, then generate the LLVM IR using roc build --no-link --emit-llvm-ir app.roc

view this post on Zulip Luke Boswell (Nov 06 2024 at 10:51):

I usually cheat and make the platform mainForHost or an effect take the type I'm inspecting as an argument so it's easy to search for in the IR and see.

view this post on Zulip Luke Boswell (Nov 06 2024 at 10:53):

Sometimes I'll just copy paste the slab of LLVM IR for the effect or at a callsite and give it to claude who seems to be happy to parse it and tell me any details I'm interested in.

view this post on Zulip Sven van Caem (Nov 07 2024 at 08:22):

I followed your advice! I've been avoiding LLVM IR till now cause I figured it would be too opaque to read without prior experience, but getting an LLM involved really did help!

view this post on Zulip Sven van Caem (Nov 07 2024 at 08:25):

The data structure in question was:

Expr : [
    String Str,
    Concat Expr Expr,
    Tag3,
    Tag4,
    Tag5,
    Tag6,
    Tag7,
    Tag8,
    Tag9,
    Tag11,
    Tag12,
    Tag13,
    Tag14,
    Tag15,
    Tag16,
    Tag17,
    Tag18,
    Tag19,
]

ChatGPT pointed me to this: { [3 x i64], i8 }, which looks like space reserved for a RocStr/whatever Concat's payload looks like followed by a tag union discriminant

view this post on Zulip Sven van Caem (Nov 07 2024 at 08:25):

It doesn't seem to be a very exact science, as I still have to guess which bytes mean what, but it certainly helps!

view this post on Zulip Sven van Caem (Nov 07 2024 at 08:54):

I figured if I wrote a function that matches on the tag union, it'd tell me more about the data layout of each variant:

app [main] {
    test: platform "platform.roc",
}

main = \expr ->
    when expr is
        Concat _e1 e2 -> e2
        _ -> crash "uh oh"
define internal fastcc {} @_mainForHost_c610e85212d0697cb161d4ba431ba63f273feee7dcb7927c9ff5d74ae6cbfa3() !dbg !12 {
entry:
  ret {} zeroinitializer, !dbg !17
}

but this function appears to be a no-op. There's no sign of the "uh oh" string anywhere either. Taking out the crash or moving the function to platform.roc doesn't appear to make a difference

view this post on Zulip Sven van Caem (Nov 07 2024 at 09:57):

replacing the string with a longer string that doesn't get SSO'd, or returning that string from the function rather than crashing with it doesn't make it appear in the llvm bytecode either

view this post on Zulip Sven van Caem (Nov 07 2024 at 09:58):

But having main be a value with a long string in it rather than a function does seem to make the string appear

view this post on Zulip Sven van Caem (Nov 07 2024 at 09:59):

I have a a lot to learn here it seems

view this post on Zulip Luke Boswell (Nov 07 2024 at 11:01):

Is that the right mainForHost function in the IR? There's usually a few generated. Can you share the full IR in a gist? Also what is the platform API that your using here for main?

view this post on Zulip Luke Boswell (Nov 07 2024 at 11:02):

As in, what does the platform do with the main provided by the app, so how does that relate back to mainForHost or the entrypoint into roc

view this post on Zulip Luke Boswell (Nov 07 2024 at 11:23):

@Sven van Caem here's my attempt at that Expr showing my method of using the free Claude in Zed Assistant

https://gist.github.com/lukewilliamboswell/0adb8b1e95e4bf44c65a93b634713812

view this post on Zulip Luke Boswell (Nov 07 2024 at 11:40):

Challenging Claude a bit more on this one, and he's sounding pretty confident. :smiley:

  1. All Expr values are handled through pointers (note all the ptr return types)

  2. When the String variant is created (in previously shown IR), it:

  3. When the Concat variant is created (in previously shown IR), it:

  4. The exposed size function confirms that Exprs are passed by pointer:

define i64 @roc__mainForHost_1_exposed_size() {
  ret i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64)
}

This returns 8 (on 64-bit), confirming we're dealing with pointer-sized values in the interface.

And notably:

define void @roc__mainForHost_1_exposed_generic(ptr %0)

Shows that the exposed interface takes/returns Expr values as pointers, which matches our C representation where we'd pass Expr* around rather than the full 32-byte structure directly.

So yes, this IR fully supports our earlier C representation:

typedef struct Expr {
    union {
        RocStr string;
        struct {
            struct Expr* left;
            struct Expr* right;
        } concat;
    } payload;
    uint8_t tag;
    uint8_t padding[7];
} Expr;

view this post on Zulip Luke Boswell (Nov 07 2024 at 11:52):

Then digging around in gen_llvm specifically in crates/compiler/gen_llvm/src/llvm/convert.rs it looks like struct_type_from_union_layout is related... we make a struct layout from the tag union and then when we generate the LLVM we cast that to a ptr. So I think this is probably also a useful reference for understanding the different kinds of layouts with tag unions.

view this post on Zulip Sven van Caem (Nov 07 2024 at 13:26):

Oh wow, this really useful!

view this post on Zulip Sven van Caem (Nov 07 2024 at 13:26):

The platform file I was using looks like this:

platform "test-platform"
    requires {} { main : _ }
    exposes []
    packages {}
    imports []
    provides [mainForHost]

Expr : [
    String Str,
    Concat Expr Expr,
    Tag3,
    Tag4,
    Tag5,
    Tag6,
    Tag7,
    Tag8,
    Tag9,
    Tag11,
    Tag12,
    Tag13,
    Tag14,
    Tag15,
    Tag16,
    Tag17,
    Tag18,
    Tag19,
]

mainForHost : Expr -> Expr
mainForHost = main

view this post on Zulip Sven van Caem (Nov 07 2024 at 13:27):

And here's the generated llvm

view this post on Zulip Sven van Caem (Nov 07 2024 at 13:28):

Luke Boswell said:

Sven van Caem here's my attempt at that Expr showing my method of using the free Claude in Zed Assistant

https://gist.github.com/lukewilliamboswell/0adb8b1e95e4bf44c65a93b634713812

I wonder how you got it to generate what look like constructors for the Concat and String variants? I feel like that's what must've allowed Claude to figure out the layouts of those

view this post on Zulip Luke Boswell (Nov 07 2024 at 17:16):

You can see the way I setup the platform.roc in that gist. I made a few top level functions to do exactly that and isolate the Expr values I wanted to see being constructed.

view this post on Zulip Sven van Caem (Nov 07 2024 at 20:44):

Oh sweet! So it'll just generate the top level functions so long as they're used

view this post on Zulip Sven van Caem (Nov 07 2024 at 20:44):

Thanks! This helps a lot

view this post on Zulip Luke Boswell (Nov 07 2024 at 20:48):

Glad to help.


Last updated: Jul 06 2025 at 12:14 UTC