When exactly can tag unions merge? · beginners

Since the tag union [Green, Blue] is a subset of [Red, Green, Blue], I expected the following code to work, but the compiler complains. Is this working as intended? If so, what's the rationale?

f: U8 -> [Red, Green, Blue]
f = |n| {
    match n {
        0 => Red
        _ => g(n)
    }
}

g: U8 -> [Green, Blue]
g = |n| {
    match n {
        1 => Green
        _ => Blue
    }
}

main! = |_args| {
    dbg f(2)
    Ok({})
}

-- TYPE MISMATCH ---------------------------------

The second branch of this match does not match the previous branch :
   ┌─ /var/folders/74/7_g_vgn57nzgjh9mw5smp6rc0000gn/T/roc/debug-f081a87b/yFTm8aktdFZvhh9IG3uovOGfNvuQlGPk/main.roc:30:14
   │
30 │         _ => g(n)
   │              ^^^^

The second branch is:

    [Blue, Green]

But the previous branch results in:

    [Blue, Green, Red]

All branches in a match must have compatible types.
Note: You can wrap branches values in a tag to make them compatible.
To learn about tags, see <https://www.roc-lang.org/tutorial#tags>

Found 1 error(s) and 0 warning(s) for /var/folders/74/7_g_vgn57nzgjh9mw5smp6rc0000gn/T/roc/debug-f081a87b/yFTm8aktdFZvhh9IG3uovOGfNvuQlGPk/main.roc.

Of course the problem disappears if I make both unions open ([Red, Green, Blue, ..] and [Green, Blue, ..]), but I don't see why I need to do that in this case.

Brendan Hansknecht (Jun 14 2026 at 07:36):

Off the top of my head, I would expect [Green, Blue] to need to be open, but not the other one.

Aurélien Geron (Jun 14 2026 at 07:40):

Plus, I'm not sure why g() would have to return an open union. When writing this function, I might not know who is going to use it and how. It's weird to have to make it open just in case someone needs to use it and return a superset.

Aurélien Geron (Jun 14 2026 at 07:44):

f: U8 -> [Red, Green, Blue]
f = |n| {
    match n {
        0 => Red
        _ => {
            match g(n) {
                Green => Green
                Blue => Blue
            }
        }
    }
}

f: U8 -> [Red, Green, Blue]
f = |n| {
    match n {
        0 => Red
        _ => {
            match g(n) {
                x => x
            }
        }
    }
}

It's as if the Green and Blue returned by g are not the same as the Green and Blue returned by f. Pretty surprising tbh.

Brendan Hansknecht (Jun 14 2026 at 07:44):

Brendan Hansknecht (Jun 14 2026 at 07:45):

Brendan Hansknecht (Jun 14 2026 at 07:46):

Your odd example is not surprising to me. By being explicit you decoupled the types. In x => x both sides must be the same type. In Blue => Blue both sides came be totally different and unrelated types.

Brendan Hansknecht (Jun 14 2026 at 07:46):

Aurélien Geron (Jun 14 2026 at 07:55):

I can see why [A, B] and [B, C] should be incompatible types, because neither is a subset of the other. But I don't see why [A, B] should not be compatible with [A, B, C] (but not the other way round). I guess I'm saying that the compiler should automatically cast [A, B] to [A, B, C] when needed. Other than the complexity of implementing that, is there a downside I'm missing?

Aurélien Geron (Jun 14 2026 at 08:26):

After more thought, it's no different than the fact that U8 doesn't automatically get turned into U64 when needed.

But I can write a.to_u64(), whereas I don't see an easy way to promote a tag union to a superset. Perhaps tag unions could support something like to_superset(), so I could write g().to_superset() rather than match g() { Green => Green, Blue => Blue }.

Aurélien Geron (Jun 14 2026 at 09:16):

main! = |_args| {
    f : [Red, Green, ..]
    f = Red

    g : [Red, Green, Blue, ..]
    g = f

    dbg g

    Ok({})
}

Luke Boswell (Jun 14 2026 at 09:38):

Is the the polarity thing that @Jared Ramirez has mentioned isn't implemented yet? it sounds familiar

Jared Ramirez (Jun 14 2026 at 15:10):

in this case both are closed unions, so the type is saying “i contain exactly [A, B]”, so it can be auto extended to also contain C. if the type was [A, B, ..], then it should be able to be extended

this case is complicated, and it’s actually not realated to polarity, but it’s about what’s called generalization/instantiation. when a type is generalized, it means that each place that uses the type gets instantiated to a fresh copy of it where all the lowercase letters in the type signature get fresh, local copies, that are inferred at the call site. then note that f: [A, B, ..] becomes [A, B, ..c] during compilation. so when f would be instantiated, c is instantiated to a “flex” variable.

but here’s the wrinkle: in the zig compiler, we decided ONLY to generalize/instantiate functions! so here f is NOT generalized. because of this, the implicit c is not generalized, and stays a “rigid” var. and rigid vars cannot be extended, so the compiler rejects g = f

this begs a broader question about how to handle generalization in cases like this. we went this path of only generalizing functions because if you generalize everything, you can end of up with some surprising performance issues. (see https://github.com/seanpm2001/Roc-Lang_RFCs/blob/main/0010-let-generalization-lets-not.md). but, the above not working is def confusing

Richard Feldman (Jun 14 2026 at 15:14):

I have thought about expanding the generalization strategy to allow some literals that we know are "harmless" (won't cause perf surprises) such as number literals, tags, records, tuples

Richard Feldman (Jun 14 2026 at 15:15):

the original proposal suggested number literals but I figured in the new compiler we should try the bare minimum (just lambdas) and see if there was demand in practice for the others

Richard Feldman (Jun 14 2026 at 15:16):

there's still the potential surprise of "if you change it to f = dbg Red then it will debug-print Red twice (once for f and once for g)

Richard Feldman (Jun 14 2026 at 15:17):

but arguably that's good because it accurately informs you about what's happening at runtime

Brendan Hansknecht (Jun 14 2026 at 15:22):

This has been discussed a lot in the past. I wish I remembered it all. I think most people found that once functions assumed open tags for outputs, must of the problems went away and it was not to bad. I guess we could generically assume open unions, but I really want a compiler error to be reported if someone accidentally adds tags to my union.

Jared Ramirez (Jun 14 2026 at 15:23):

yeah, that would be reasonable i think. but for complex tags with polymorphic payloads, the perf issue may still be possible

then also was thinking as i typed out the above that maybe if there’s a type sig we should always generalize, since from the user perspective the same syntactically as defining a function, which does generalize. but maybe we still never generalize for let defs without annos.

Brendan Hansknecht (Jun 14 2026 at 15:24):

Yeah, could definitely explode memory. The one saving grace is that at least they are unified at compile time and it is not a cast that could be shifting around tons of data.

Richard Feldman (Jun 14 2026 at 16:33):

Richard Feldman (Jun 14 2026 at 16:34):

I guess I'm just wondering if this is an actual perf footgun vs something that could happen theoretically but actually wouldn't in practice

Richard Feldman (Jun 14 2026 at 16:34):

like calling arbitrary functions can obviously explode in practice, like the example in Ayaz's writeup

Richard Feldman (Jun 14 2026 at 16:35):

Jared Ramirez (Jun 14 2026 at 18:52):

f : [Red, Green, ..]

thinking out loud here: in the case where we don't generalize (ie binding a let value like f), maybe instead of desugaring [A, B, ..] as [A, B, ..c] (rigid), we desugar it as [A, B, .._] (weak). that way, it can be refined in based on it's usage. but since functions are generalize & are instantiated at callsites anyways, they keep the existing semantics

Jared Ramirez (Jun 14 2026 at 18:54):

    f : [Red, Green, ..]
    f = Red

    g : [Red, Green, Blue, ..]
    g = f

Richard Feldman (Jun 14 2026 at 18:54):

Richard Feldman (Jun 14 2026 at 18:55):

he talked about the problem with them being that they can break cross-module caching, which is a great point, and his idea was to prohibit exporting them

Richard Feldman (Jun 14 2026 at 18:55):

but I think an even easier idea is just to not generalize them at the top level (just like today)

Richard Feldman (Jun 14 2026 at 18:56):

which prevents the exposing problem while making cases like this one work, and also while keeping generalization rules decoupled from the concept of module boundaries

Jared Ramirez (Jun 14 2026 at 18:57):

so here f would generalize, but the same def at top-level would not? would the top-level version error if you tried to annotate it with a rigid/..?

Richard Feldman (Jun 14 2026 at 19:18):

Richard Feldman (Jun 14 2026 at 19:20):

maybe the rule could be that top-level values can't have unbound type variables unless they're functions? :thinking:

Richard Feldman (Jun 14 2026 at 19:20):

Aurélien Geron (Jun 14 2026 at 19:45):

Richard Feldman (Jun 14 2026 at 21:26):

Aurélien Geron (Jun 14 2026 at 21:50):

Brendan Hansknecht (Jun 15 2026 at 06:15):

For most users and most cases probably not. Definitely hit some pain in rocci bird due to the equivalent when we still had tasks. Each capture was a slightly different layout and constantly was being shuffled around

Brendan Hansknecht (Jun 15 2026 at 06:16):

And the question here is are stricter types that give stronger guarantees or more flexible types more ergonomic

Brendan Hansknecht (Jun 15 2026 at 06:17):

I think I'm still in the camp that most of the time, stricter is the preferred answer and if you don't want stricter, you can just avoid adding type a notions and it will infer something relaxed

Brendan Hansknecht (Jun 15 2026 at 06:18):

That said, I totally see the reserve argument of loose by default and if you want stricter, define a nominal type.

Stream: beginners

Topic: When exactly can tag unions merge?

Aurélien Geron (Jun 14 2026 at 07:35):

Brendan Hansknecht (Jun 14 2026 at 07:36):

Aurélien Geron (Jun 14 2026 at 07:40):

Aurélien Geron (Jun 14 2026 at 07:44):

Brendan Hansknecht (Jun 14 2026 at 07:44):

Brendan Hansknecht (Jun 14 2026 at 07:45):

Brendan Hansknecht (Jun 14 2026 at 07:46):

Brendan Hansknecht (Jun 14 2026 at 07:46):

Aurélien Geron (Jun 14 2026 at 07:55):

Aurélien Geron (Jun 14 2026 at 08:26):

Aurélien Geron (Jun 14 2026 at 09:16):

Luke Boswell (Jun 14 2026 at 09:38):

Jared Ramirez (Jun 14 2026 at 15:10):

Richard Feldman (Jun 14 2026 at 15:14):

Richard Feldman (Jun 14 2026 at 15:15):

Richard Feldman (Jun 14 2026 at 15:16):

Richard Feldman (Jun 14 2026 at 15:17):

Brendan Hansknecht (Jun 14 2026 at 15:22):

Jared Ramirez (Jun 14 2026 at 15:23):

Brendan Hansknecht (Jun 14 2026 at 15:24):

Richard Feldman (Jun 14 2026 at 16:33):

Richard Feldman (Jun 14 2026 at 16:34):

Richard Feldman (Jun 14 2026 at 16:34):

Richard Feldman (Jun 14 2026 at 16:35):

Richard Feldman (Jun 14 2026 at 16:35):

Jared Ramirez (Jun 14 2026 at 18:52):

Jared Ramirez (Jun 14 2026 at 18:54):

Richard Feldman (Jun 14 2026 at 18:54):

Richard Feldman (Jun 14 2026 at 18:55):

Richard Feldman (Jun 14 2026 at 18:55):

Richard Feldman (Jun 14 2026 at 18:56):

Richard Feldman (Jun 14 2026 at 18:56):

Jared Ramirez (Jun 14 2026 at 18:57):

Richard Feldman (Jun 14 2026 at 19:18):

Richard Feldman (Jun 14 2026 at 19:20):

Richard Feldman (Jun 14 2026 at 19:20):

Aurélien Geron (Jun 14 2026 at 19:45):

Richard Feldman (Jun 14 2026 at 21:26):

Aurélien Geron (Jun 14 2026 at 21:50):

Brendan Hansknecht (Jun 15 2026 at 06:15):

Brendan Hansknecht (Jun 15 2026 at 06:16):

Brendan Hansknecht (Jun 15 2026 at 06:16):

Brendan Hansknecht (Jun 15 2026 at 06:16):

Brendan Hansknecht (Jun 15 2026 at 06:16):

Brendan Hansknecht (Jun 15 2026 at 06:17):

Brendan Hansknecht (Jun 15 2026 at 06:18):