Some time ago on some GitHub issue I can't find now (@Richard Feldman and @Folkert de Vries may know), we noted the behavior of the following code:
» \x -> when x is
… A -> B
… C -> D
… a -> a
<function> : [ A, B, C, D ]a -> [ A, B, C, D ]a
With a certain perspective, the ideal type for this function would be [A, C]a -> [B, D]a. In order to do this, at the branch a -> a, we need to "remove" the variants A and C from the set of tags that might be in the type of a. This kind of typing-based-on-control-flow is generally known as flow typing, and the removal of types that cannot appear in a branch is called "narrowing".
Here are two more (somewhat contrived) examples where type narrowing might be useful. Suppose that error is a function that immediately yields a runtime error and stops the active thread.
printAddError = \m, n, err ->
when err is
...
Overflow | Underflow ->
overUnder = when err is
Overflow -> "over"
Underflow -> "under"
# Note that we need this branch because `err` is not narrowed!
_ -> error "unreachable"
"Adding \(m) and \(n) results in a value that \(overUnder)flows the type"
...
retryWithDelay : [ RateLimited Request Delay ] -> Response
retryUnauthorized : [ RateLimited Request ] -> Response
when response is
Ok body -> printBody body
# The below two would result in a type error today because `reponse` is not
# narrowed in the respective branches
RateLimited _ _ -> retryWithDelay reponse
NotAuthorized _ -> retryUnauthorized response
With type narrowing in when branches, the issues in both examples above would be resolved. So the proposal here has two components:
when x is
A -> x # x has type [A] here
...
f : [A, B, C, D]a -> _
f = \x -> when x is
A -> x # x has type [A] here
B -> .. # x has type [B] here
o -> o # o has type [C, D]a here
This idea is not new; TypeScript does it, and Java byte code also does a version of it. It turns out it's tricky to do this in a language with subtyping and type inference, but it's doable - and in Roc it would be even more tractable, because for us what it means to "subtract" a type is very easy to define.
Anyway this is a tangent. So why don't other languages with pattern matching usually support this? I think the biggest reason is the performance cost - by changing the type of a variable in a branch, Roc would also have to change its runtime representation; that is, its value. In the last example above, when x goes from [A,B,C,D]a to [B], we must change the value referenced by x to have the layout of the type [B] (which is different from [A,B,C,D]a)! This is basically involves a few memory access and memory stores, not any data modification. This means the CPU/RAM usage isn't a lot, but it's something.
I'm curious what others think of this and if you all think it's worth implementing. The way I see it, there are three options:
when expressionswhen, but that supports type narrowing. For example, when <e> is could be narrow <e> to. This means the user gets to choose whether they want to pay the runtime cost of type narrowing, but it also means there are two very similar concepts in the language that could get confusing to look at.Personally I like (3) the most.
hm, I wonder what the workarounds look like in these cases?
e.g. printAddError could use a helper function instead:
printAddError = \m, n, err ->
when err is
...
Overflow -> toMessage m n "over"
Underflow -> toMessage m n "under"
...
toMessage = \m, n, overUnder ->
"Adding \(m) and \(n) results in a value that \(overUnder)flows the type"
personally I prefer the way this reads to the Overflow | Underflow -> ... approach, and I'd also expect it to run faster because it has 1 when instead of 2 - so in this case it seems like it would be better to encourage the helper function style!
in the retryWithDelay example, I'd make the two helper functions take normal arguments instead of a single-tag union:
retryWithDelay : Request, Delay -> Response
retryUnauthorized : Request -> Response
when response is
Ok body -> printBody body
# The below two would result in a type error today because `reponse` is not
# narrowed in the respective branches
RateLimited req delay -> retryWithDelay req delay
NotAuthorized req -> retryUnauthorized req
I personally prefer this style - it's a little more verbose to call, but the types are more concise, and I'd feel weird manually calling retryWithDelay (RateLimited req delay) if I ever wanted to call it from somewhere else - e.g. from a test :sweat_smile:
I forgot to follow up on this until @Tommy Graves reminded me yesterday :) Here is a more complicated, realistic example of where having type narrowing may be useful, in Roc's codebase: https://github.com/rtfeldman/roc/blob/5374f6d6b83e6665e363750d6ea251b3441514ac/compiler/gen_llvm/src/llvm/build.rs#L5682-L5716, and in particular the call of https://github.com/rtfeldman/roc/blob/5374f6d6b83e6665e363750d6ea251b3441514ac/compiler/gen_llvm/src/llvm/build.rs#L7065-L7138 from that function
The workarounds still apply, they're just bulky to write ): and not narrowing the type is unpleasant because you must manually expect that you covered all the cases you expected to
I also thought about this some more and I've realized the performance issue need not be a concern. Indeed we don't need to do a type conversion at all. During type checking and exhaustiveness checking we can do the type narrowing on the surface syntax, so that the programmer is guaranteed the program is type-correct, but during code generation (in particular monomorphization) we can use the "shadow" type representation which is the larger, unnarrowed type. This is still sound because we'll have already proved the unused branches of that type are never inhabited, and we avoid having to do any conversion.
wow that's awesome!
so that means this is the clear winner then, yeah?
- Support this in all when expressions
in other words, we can make it Just Work the way people expect
Ayaz Hafiz said:
I also thought about this some more and I've realized the performance issue need not be a concern. Indeed we don't need to do a type conversion at all. During type checking and exhaustiveness checking we can do the type narrowing on the surface syntax, so that the programmer is guaranteed the program is type-correct, but during code generation (in particular monomorphization) we can use the "shadow" type representation which is the larger, unnarrowed type. This is still sound because we'll have already proved the unused branches of that type are never inhabited, and we avoid having to do any conversion.
I totally forgot about this lol, I need to think about how I came to this conclusion. I'm not sure it's correct :sweat_smile: Perhaps the poor wisdom of me a year and half ago
My intuition (now) is that the old message is not possible unless we also embrace subtyping but that is a huge change
yeah one counterexample to that old message is suppose that main is defined as follows and escapes to the host
main : [A, B] -> [A]
main = \t ->
when t is
A -> t
_ -> A
having t shadow and be compiled as [A,B] in the output position cannot work without subtyping semantics and the appropriate compilation model change
this extends to non-host-exposed functions as well, imagine that the branch A -> t is instead A -> f t where f : [A] -> [A]. Now we again need subtyping semantics for compilation, or else compile f as we do polymorphic functions (even though f was not declared as polymorphic, it is relative to this use site)
Maybe a reasonable idea is to have semantics like
when ... is
A -> ... # no narrows in this branch
other -> ... # if `other` is referenced in this branch, it is narrowed at runtime, otherwise it is unaffected
since presumably, if you are naming a catchall branch like this, you are intending to opt in to narrowing. So this has no runtime effect for anything where the compiler does not assume you want to narrow (and it will type check that you indeed wanted to narrow, because otherwise you'll get a type error)
Fundamentally, changing from [A, B SomeData, ...] to [B SomeData], would just be changing a tag number, correct? CauseB SomeData with all of it's data is guaranteed to fit into [A, B SomeData, ...]. Of course, you need to copy to update the tag number if the orginal var is ever used elsewhere.
What do you mean by tag number?
As a simple example, [A Str, B Str] is represented equivalently to the struct { tag: 1 byte, payload: RocStr}. But [A Str] is represented as "just" RocStr. Since there is a difference in the size of the runtime representation there is a runtime conversion to do.
Oh yeah, single tag means no index at all.
even in the case where there is a tag index, if the narrowed union type has a different payload size, the conversion is necessary
so how much runtime overhead are we talking about here, in practice?
that said (I think) the conversion will mostly be trivial in cost since it’s only moving top-level wrappers around
yeah
yeah i think almost zero Richard
my default feeling here is that we should have it Just Work even though there's nonzero runtime cost
like one load and store
because it's so low that it's not worth making a distinction in userspace
Yeah, would just be moving some bytes on the stack.
And tags generally should not be too large.
i guess you may have to update some refcounts which is random loads and that would be the slowest part.
So only bad if you have a list of strings in a tag. Then it may update every refcount in the entire list.
Hmm..
Actually, i guess you have to do that either way if you keep around the original var. So that isn't special to this case
So yeah, small in terms of extra per cost.
I think it’s also important to note that I believe this to be a zero-cost abstraction
because if you indeed wanted to narrow you would have to do all this anyway
and if you didn’t then you will end up passing a smaller type to a larger context and the compiler will error to you because the narrowed type will be closed.
Is the extra cost only incurred when you refine to a union with a single tag? I wouldn’t expect refining to a single tag to be as common
no, it would be any refinement (at least this is my proposal)
Gotcha
Would both of these work?
a : [ A, B ,C]
when a is
A -> B
x -> x
# is the final output type here a [B, C] or an [A, B, C].
a : [ A I32, B ,C]
when a is
A val -> A (val + 1)
x -> x
for the first one it would be [B,C]
the second we should probably discuss but we could make it [A I32, B, C], or a type error
I think it would be quite surprising if the second one was an error.
Like, in the second we might assume that the bottom branch wanted to narrow to [B, C] exactly, but then you tried to add the A variant to it. We can either make that work, or say that if you did not actually want to narrow the return value, change that branch to be _ -> a instead
that's fair.
I guess a very basic work around.
making it “just work” won’t catch unnecessary narrows but it does make seamless type expansion possible (https://ayazhafiz.com/articles/22/simple-flow-refinement-of-anonymous-sum-types#type-expansion)
I don’t know how useful that actually is though
the simple answer to type expansion is open unions so maybe it’s always possible to work around that way actually
the type error does mean though that you couldn't write things like
f : [A]a -> [B]a
f = \t -> when t is
A -> B
x -> x
anymore though
oh nevermind, that doesn't work today either lol
thinking about it more, maybe it would be best to never have it type error if you pass in the narrowed type to a larger context, and instead have the narrowing scheme perform type expansion as appropriate if needed. In the case of
a : [ A I32, B ,C]
when a is
A val -> A (val + 1)
x -> x
we would see that the "narrowed" type of x is actually [A I32, B, C], and so actually no conversion at all is necessary, and can be eliminated
If you had
f : [B, C]* -> {}
a : [ A I32, B ,C]
when a is
A val -> A (val + 1)
x -> f x
this would perform a conversion when we didn't need to, so that is not zero-cost, but this seems like a contrived case (and the cost is negligible anyway). maybe the compiler could be smarter here too
I agree it would be nice if it never errored. For example, this should not error
fancyColorName : [ Red, Green, Blue ] -> Str
fancyColorName = \c -> when c is
Red -> "Scarlet"
_ -> defaultColorName c
whether defaultColorName is defined as
defaultColorName : [ Red, Green, Blue ] -> Str
defaultColorName = \c -> when c is
Red -> "Red"
Green -> "Green"
Blue -> "Blue"
or
defaultColorName : [ Green, Blue ] -> Str
defaultColorName = \c -> when c is
Green -> "Green"
Blue -> "Blue"
and whether the _ case is written as _ -> defaultColorName c, gb -> defaultColorName gb, or Green | Blue -> defaultColorName c.
Those would all be normal things for a programmer to write, and it would be frustrating if they behaved differently.
I think there is an important difference between _ -> defaultColorName c and gb -> defaultColorName gb. The former is a form of shadowing (which Roc doesn't currently support), and the latter makes it more explicit " in this branch, here is how I narrow this type "
maybe we could give an error message to tell you what to change it to
I don't think it would be a significant inconvenience in that case, although without an error message like that, it could be surprising/confusing and people might not know what to do to solve it :big_smile:
Seeing _ -> defaultColorName c I assume you are intentially re-expanding the type for some reason.
Here is a concrete proposal for this feature: https://rwx.notion.site/Union-Refinement-5bef5072afb84675948f89525a25dbb7
(Part of it is a bit technical, but I wanted to describe the exact mechanics under proposal.) I think this aligns with the previous ideas discussed in this thread, for the most part. There is a section of multiple examples at the bottom of the document - let me know if there are others that should be discussed!
This is probably due to my lack of understanding of the current type system, but I'm kind of confused about the last section ("Catch-all named variable has larger type than condition"), particularly this part:
Since tag unions are open by default in output position, the value of this is nebulous - you could instead get the conversion automatically just because of how tag unions work. In fact, the compiler would see that! Nevertheless, this is noted for completeness.
Just yesterday, I ran into a situation like this:
toString : [Io Str, Net Str] -> Str
toString = \e ->
when e is
Io str -> str
Net str -> str
doStuff : [Io Str] -> Str
doStuff = \e ->
toString e # fails to type-check
The workaround I know of is the following. (The paragraph quoted above kind of makes it sound like maybe there's another easy way? But I'm not sure what that would be.)
toString : [Io Str, Net Str] -> Str
toString = \e ->
when e is
Io str -> str
Net str -> str
doStuff : [Io Str] -> Str
doStuff = \e ->
toString (expandError e)
# (The type can also just be written [Io Str] -> [Io Str])
expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e ->
when e is
Io str -> Io str
Apparently with this change, I could write expandError like this instead:
expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e ->
when e is
error -> error
That seems pretty useful to me! It doesn't look like a big difference here, but it could save a few lines if there were more tags in the original union. And more importantly, any time I ran into a situation like this, wouldn't I be able to write the exact same function?
# Because of type inference, this could expand any tag union I want, right?
# Though I don't know whether I would need to write a different `expand` function for each type.
expand = \x ->
when x is
a -> a
I'm really confusing myself over this, and I'm probably misunderstanding something.
(I guess in that specific example, you could just write doStuff = \Io str -> toString (Io str). But as far as I know, you still need a when expression if there's more than one tag in the original union.)
I think you should also be able to make doStuff take an [Io Str]*
Or just make it take the full type instead of just half of the type.
Your interpretation is correct
In your specific example, you are right that the only way to do it currently is via the conversion function you would have and the proposal would make the alternative you suggested possible.
Do you have a concrete example of what you ran into? I am surprised you ran into this, I can only think of one case I have before. Unfortunately, changing doStuff to take an [Io Str]* would not work, because then you also need to make toString take an open tag union.
Sure! Now that I'm looking at it, my case was probably unusual. I think I just assumed it was common just because it was one of the first things I ran into. It was pretty easy to work around, and this feature would only have made it slightly more convenient if at all.
I've been working through the first few chapters of the book Types and Programming Languages, and I was implementing big-step evaluation of the untyped lambda calculus with De Bruijn indexing.
Code
By the way, I really like the proposal! Sorry to fill up this topic with my weird example. Part of the reason I'm studying types right now is because I want to be able to think in more detail about Roc's type system and type inference implementation!
de Bruijn indeces.. i remember how painful those were :sweat_smile: I never had a great intuition for them; like it’s fine to implement, but never “pleasant”. Thankfully in practice I’ve found it’s not really needed unless you need to test for higher-kind or dependent type equivalence.
(if you want to talk about any of this stuff feel free to DM me or post in #contributing about possible type-related projects for Roc- we have a few!)
What is the status of this proposal? Are we comfortable with it? Are there outstanding questions or concerns?
I have no concerns, :thumbs_up: from me!
Here's an issue for implementing this proposal: https://github.com/roc-lang/roc/issues/5504
If you'd like to work on it, feel free to message in #contributing and we'd be happy to help as much as is wanted! This is kind of a larger change, but it's well scoped and will give you a feel for the entirety of Roc's type system implementation, if you are interested in that.
@Ayaz Hafiz Under this proposal, would this type check? Because the type of bar would get expanded to [Bar1, Bar2, Bar3, Bar4, Bar5]?
Foo : [A, B Bar]
Bar : [Bar1, Bar2, Bar3, Bar4]
f : Foo -> I64
f = \foo ->
when foo is
A -> 1
B bar ->
when bar is
Bar1 -> 10
Bar2 -> 20
Bar3 -> 30
Bar4 -> 40
Bar5 -> 50
#^ including the last branch (Bar5) was almost certainly a mistake
Yeah I think so. It's a more general problem though. I have an idea for it, bidirectional exhaustiveness checking, but it's an orthogonal proposal I think.
Ayaz Hafiz said:
Here is a concrete proposal for this feature: https://rwx.notion.site/Union-Refinement-5bef5072afb84675948f89525a25dbb7
(Part of it is a bit technical, but I wanted to describe the exact mechanics under proposal.) I think this aligns with the previous ideas discussed in this thread, for the most part. There is a section of multiple examples at the bottom of the document - let me know if there are others that should be discussed!
I was thinking about working on this again, but I can't seem to access the proposal on Notion anymore
Thanks!
Last updated: Jun 16 2026 at 16:19 UTC