Stream: ideas

Topic: Narrowing types in when expressions


view this post on Zulip Ayaz Hafiz (Feb 07 2022 at 01:14):

Some time ago on some GitHub issue I can't find now (@Richard Feldman and @Folkert de Vries may know), we noted the behavior of the following code:

» \x -> when x is
…   A -> B
…   C -> D
…   a -> a

<function> : [ A, B, C, D ]a -> [ A, B, C, D ]a

With a certain perspective, the ideal type for this function would be [A, C]a -> [B, D]a. In order to do this, at the branch a -> a, we need to "remove" the variants A and C from the set of tags that might be in the type of a. This kind of typing-based-on-control-flow is generally known as flow typing, and the removal of types that cannot appear in a branch is called "narrowing".

Here are two more (somewhat contrived) examples where type narrowing might be useful. Suppose that error is a function that immediately yields a runtime error and stops the active thread.

printAddError = \m, n, err ->
  when err is
    ...
    Overflow | Underflow ->
      overUnder = when err is
        Overflow -> "over"
        Underflow -> "under"
        # Note that we need this branch because `err` is not narrowed!
        _ -> error "unreachable"

      "Adding \(m) and \(n) results in a value that \(overUnder)flows the type"
    ...
retryWithDelay : [ RateLimited Request Delay ] -> Response

retryUnauthorized : [ RateLimited Request ] -> Response

when response is
  Ok body -> printBody body
  # The below two would result in a type error today because `reponse` is not
  # narrowed in the respective branches
  RateLimited _ _ -> retryWithDelay reponse
  NotAuthorized _ -> retryUnauthorized response

With type narrowing in when branches, the issues in both examples above would be resolved. So the proposal here has two components:

  1. Branches associated with a specific type have the matched expression narrowed to that type. So for example,
when x is
  A -> x # x has type [A] here
  ...
  1. Default branches have types covered in other branches excluded from them. So for example,
f : [A, B, C, D]a -> _
f = \x -> when x is
  A -> x # x has type [A] here
  B -> .. # x has type [B] here
  o -> o # o has type [C, D]a here

This idea is not new; TypeScript does it, and Java byte code also does a version of it. It turns out it's tricky to do this in a language with subtyping and type inference, but it's doable - and in Roc it would be even more tractable, because for us what it means to "subtract" a type is very easy to define.

Anyway this is a tangent. So why don't other languages with pattern matching usually support this? I think the biggest reason is the performance cost - by changing the type of a variable in a branch, Roc would also have to change its runtime representation; that is, its value. In the last example above, when x goes from [A,B,C,D]a to [B], we must change the value referenced by x to have the layout of the type [B] (which is different from [A,B,C,D]a)! This is basically involves a few memory access and memory stores, not any data modification. This means the CPU/RAM usage isn't a lot, but it's something.

I'm curious what others think of this and if you all think it's worth implementing. The way I see it, there are three options:

  1. Don't support this
  2. Support this in all when expressions
  3. Introduce a new expression like when, but that supports type narrowing. For example, when <e> is could be narrow <e> to. This means the user gets to choose whether they want to pay the runtime cost of type narrowing, but it also means there are two very similar concepts in the language that could get confusing to look at.

Personally I like (3) the most.

view this post on Zulip Richard Feldman (Feb 07 2022 at 01:53):

hm, I wonder what the workarounds look like in these cases?

e.g. printAddError could use a helper function instead:

printAddError = \m, n, err ->
  when err is
    ...
    Overflow -> toMessage m n "over"
    Underflow -> toMessage m n "under"
    ...

toMessage = \m, n, overUnder ->
    "Adding \(m) and \(n) results in a value that \(overUnder)flows the type"

personally I prefer the way this reads to the Overflow | Underflow -> ... approach, and I'd also expect it to run faster because it has 1 when instead of 2 - so in this case it seems like it would be better to encourage the helper function style!

in the retryWithDelay example, I'd make the two helper functions take normal arguments instead of a single-tag union:

retryWithDelay : Request, Delay -> Response

retryUnauthorized : Request -> Response

when response is
  Ok body -> printBody body
  # The below two would result in a type error today because `reponse` is not
  # narrowed in the respective branches
  RateLimited req delay -> retryWithDelay req delay
  NotAuthorized req -> retryUnauthorized req

I personally prefer this style - it's a little more verbose to call, but the types are more concise, and I'd feel weird manually calling retryWithDelay (RateLimited req delay) if I ever wanted to call it from somewhere else - e.g. from a test :sweat_smile:

view this post on Zulip Ayaz Hafiz (Feb 18 2022 at 02:18):

I forgot to follow up on this until @Tommy Graves reminded me yesterday :) Here is a more complicated, realistic example of where having type narrowing may be useful, in Roc's codebase: https://github.com/rtfeldman/roc/blob/5374f6d6b83e6665e363750d6ea251b3441514ac/compiler/gen_llvm/src/llvm/build.rs#L5682-L5716, and in particular the call of https://github.com/rtfeldman/roc/blob/5374f6d6b83e6665e363750d6ea251b3441514ac/compiler/gen_llvm/src/llvm/build.rs#L7065-L7138 from that function

view this post on Zulip Ayaz Hafiz (Feb 18 2022 at 02:19):

The workarounds still apply, they're just bulky to write ): and not narrowing the type is unpleasant because you must manually expect that you covered all the cases you expected to

view this post on Zulip Ayaz Hafiz (Feb 18 2022 at 02:21):

I also thought about this some more and I've realized the performance issue need not be a concern. Indeed we don't need to do a type conversion at all. During type checking and exhaustiveness checking we can do the type narrowing on the surface syntax, so that the programmer is guaranteed the program is type-correct, but during code generation (in particular monomorphization) we can use the "shadow" type representation which is the larger, unnarrowed type. This is still sound because we'll have already proved the unused branches of that type are never inhabited, and we avoid having to do any conversion.

view this post on Zulip Richard Feldman (May 24 2023 at 20:10):

wow that's awesome!

view this post on Zulip Richard Feldman (May 24 2023 at 20:10):

so that means this is the clear winner then, yeah?

  1. Support this in all when expressions

view this post on Zulip Richard Feldman (May 24 2023 at 20:10):

in other words, we can make it Just Work the way people expect

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:12):

Ayaz Hafiz said:

I also thought about this some more and I've realized the performance issue need not be a concern. Indeed we don't need to do a type conversion at all. During type checking and exhaustiveness checking we can do the type narrowing on the surface syntax, so that the programmer is guaranteed the program is type-correct, but during code generation (in particular monomorphization) we can use the "shadow" type representation which is the larger, unnarrowed type. This is still sound because we'll have already proved the unused branches of that type are never inhabited, and we avoid having to do any conversion.

I totally forgot about this lol, I need to think about how I came to this conclusion. I'm not sure it's correct :sweat_smile: Perhaps the poor wisdom of me a year and half ago

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:13):

My intuition (now) is that the old message is not possible unless we also embrace subtyping but that is a huge change

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:17):

yeah one counterexample to that old message is suppose that main is defined as follows and escapes to the host

main : [A, B] -> [A]
main = \t ->
  when t is
    A -> t
    _ -> A

having t shadow and be compiled as [A,B] in the output position cannot work without subtyping semantics and the appropriate compilation model change

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:19):

this extends to non-host-exposed functions as well, imagine that the branch A -> t is instead A -> f t where f : [A] -> [A]. Now we again need subtyping semantics for compilation, or else compile f as we do polymorphic functions (even though f was not declared as polymorphic, it is relative to this use site)

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:22):

Maybe a reasonable idea is to have semantics like

when ... is
  A -> ... # no narrows in this branch
  other -> ... # if `other` is referenced in this branch, it is narrowed at runtime, otherwise it is unaffected

since presumably, if you are naming a catchall branch like this, you are intending to opt in to narrowing. So this has no runtime effect for anything where the compiler does not assume you want to narrow (and it will type check that you indeed wanted to narrow, because otherwise you'll get a type error)

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:25):

Fundamentally, changing from [A, B SomeData, ...] to [B SomeData], would just be changing a tag number, correct? CauseB SomeData with all of it's data is guaranteed to fit into [A, B SomeData, ...]. Of course, you need to copy to update the tag number if the orginal var is ever used elsewhere.

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:27):

What do you mean by tag number?
As a simple example, [A Str, B Str] is represented equivalently to the struct { tag: 1 byte, payload: RocStr}. But [A Str] is represented as "just" RocStr. Since there is a difference in the size of the runtime representation there is a runtime conversion to do.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:29):

Oh yeah, single tag means no index at all.

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:30):

even in the case where there is a tag index, if the narrowed union type has a different payload size, the conversion is necessary

view this post on Zulip Richard Feldman (May 24 2023 at 20:31):

so how much runtime overhead are we talking about here, in practice?

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:32):

that said (I think) the conversion will mostly be trivial in cost since it’s only moving top-level wrappers around

view this post on Zulip Richard Feldman (May 24 2023 at 20:32):

yeah

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:32):

yeah i think almost zero Richard

view this post on Zulip Richard Feldman (May 24 2023 at 20:32):

my default feeling here is that we should have it Just Work even though there's nonzero runtime cost

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:32):

like one load and store

view this post on Zulip Richard Feldman (May 24 2023 at 20:32):

because it's so low that it's not worth making a distinction in userspace

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:36):

Yeah, would just be moving some bytes on the stack.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:36):

And tags generally should not be too large.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:36):

i guess you may have to update some refcounts which is random loads and that would be the slowest part.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:37):

So only bad if you have a list of strings in a tag. Then it may update every refcount in the entire list.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:37):

Hmm..
Actually, i guess you have to do that either way if you keep around the original var. So that isn't special to this case

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:37):

So yeah, small in terms of extra per cost.

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:40):

I think it’s also important to note that I believe this to be a zero-cost abstraction

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:40):

because if you indeed wanted to narrow you would have to do all this anyway

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:40):

and if you didn’t then you will end up passing a smaller type to a larger context and the compiler will error to you because the narrowed type will be closed.

view this post on Zulip Agus Zubiaga (May 24 2023 at 20:41):

Is the extra cost only incurred when you refine to a union with a single tag? I wouldn’t expect refining to a single tag to be as common

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:42):

no, it would be any refinement (at least this is my proposal)

view this post on Zulip Agus Zubiaga (May 24 2023 at 20:42):

Gotcha

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:43):

Would both of these work?

a : [ A, B ,C]
when a is
    A -> B
    x -> x

# is the final output type here a [B, C] or an [A, B, C].
a : [ A I32, B ,C]
when a is
    A val -> A (val + 1)
    x -> x

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:44):

for the first one it would be [B,C]

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:45):

the second we should probably discuss but we could make it [A I32, B, C], or a type error

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:47):

I think it would be quite surprising if the second one was an error.

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:47):

Like, in the second we might assume that the bottom branch wanted to narrow to [B, C] exactly, but then you tried to add the A variant to it. We can either make that work, or say that if you did not actually want to narrow the return value, change that branch to be _ -> a instead

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:47):

that's fair.

view this post on Zulip Brendan Hansknecht (May 24 2023 at 20:47):

I guess a very basic work around.

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:50):

making it “just work” won’t catch unnecessary narrows but it does make seamless type expansion possible (https://ayazhafiz.com/articles/22/simple-flow-refinement-of-anonymous-sum-types#type-expansion)

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:50):

I don’t know how useful that actually is though

view this post on Zulip Ayaz Hafiz (May 24 2023 at 20:51):

the simple answer to type expansion is open unions so maybe it’s always possible to work around that way actually

view this post on Zulip Ayaz Hafiz (May 24 2023 at 21:08):

the type error does mean though that you couldn't write things like

f : [A]a -> [B]a
f = \t -> when t is
  A -> B
  x -> x

anymore though

view this post on Zulip Ayaz Hafiz (May 24 2023 at 21:10):

oh nevermind, that doesn't work today either lol

view this post on Zulip Ayaz Hafiz (May 24 2023 at 21:13):

thinking about it more, maybe it would be best to never have it type error if you pass in the narrowed type to a larger context, and instead have the narrowing scheme perform type expansion as appropriate if needed. In the case of

a : [ A I32, B ,C]
when a is
    A val -> A (val + 1)
    x -> x

we would see that the "narrowed" type of x is actually [A I32, B, C], and so actually no conversion at all is necessary, and can be eliminated

view this post on Zulip Ayaz Hafiz (May 24 2023 at 21:22):

If you had

f : [B, C]* -> {}

a : [ A I32, B ,C]
when a is
    A val -> A (val + 1)
    x -> f x

this would perform a conversion when we didn't need to, so that is not zero-cost, but this seems like a contrived case (and the cost is negligible anyway). maybe the compiler could be smarter here too

view this post on Zulip Sky Rose (May 26 2023 at 12:30):

I agree it would be nice if it never errored. For example, this should not error

fancyColorName : [ Red, Green, Blue ] -> Str
fancyColorName = \c -> when c is
  Red -> "Scarlet"
  _ -> defaultColorName c

whether defaultColorName is defined as

defaultColorName : [ Red, Green, Blue ] -> Str
defaultColorName = \c -> when c is
  Red -> "Red"
  Green -> "Green"
  Blue -> "Blue"

or

defaultColorName : [ Green, Blue ] -> Str
defaultColorName = \c -> when c is
  Green -> "Green"
  Blue -> "Blue"

and whether the _ case is written as _ -> defaultColorName c, gb -> defaultColorName gb, or Green | Blue -> defaultColorName c.

view this post on Zulip Sky Rose (May 26 2023 at 12:31):

Those would all be normal things for a programmer to write, and it would be frustrating if they behaved differently.

view this post on Zulip Ayaz Hafiz (May 26 2023 at 16:49):

I think there is an important difference between _ -> defaultColorName c and gb -> defaultColorName gb. The former is a form of shadowing (which Roc doesn't currently support), and the latter makes it more explicit " in this branch, here is how I narrow this type "

view this post on Zulip Richard Feldman (May 26 2023 at 16:50):

maybe we could give an error message to tell you what to change it to

view this post on Zulip Richard Feldman (May 26 2023 at 16:51):

I don't think it would be a significant inconvenience in that case, although without an error message like that, it could be surprising/confusing and people might not know what to do to solve it :big_smile:

view this post on Zulip Brendan Hansknecht (May 26 2023 at 17:10):

Seeing _ -> defaultColorName c I assume you are intentially re-expanding the type for some reason.

view this post on Zulip Ayaz Hafiz (May 26 2023 at 20:14):

Here is a concrete proposal for this feature: https://rwx.notion.site/Union-Refinement-5bef5072afb84675948f89525a25dbb7

(Part of it is a bit technical, but I wanted to describe the exact mechanics under proposal.) I think this aligns with the previous ideas discussed in this thread, for the most part. There is a section of multiple examples at the bottom of the document - let me know if there are others that should be discussed!

view this post on Zulip Ajai Nelson (May 28 2023 at 19:59):

This is probably due to my lack of understanding of the current type system, but I'm kind of confused about the last section ("Catch-all named variable has larger type than condition"), particularly this part:

Since tag unions are open by default in output position, the value of this is nebulous - you could instead get the conversion automatically just because of how tag unions work. In fact, the compiler would see that! Nevertheless, this is noted for completeness.

Just yesterday, I ran into a situation like this:

toString : [Io Str, Net Str] -> Str
toString = \e ->
    when e is
        Io str -> str
        Net str -> str

doStuff : [Io Str] -> Str
doStuff = \e ->
    toString e # fails to type-check

The workaround I know of is the following. (The paragraph quoted above kind of makes it sound like maybe there's another easy way? But I'm not sure what that would be.)

toString : [Io Str, Net Str] -> Str
toString = \e ->
    when e is
        Io str -> str
        Net str -> str

doStuff : [Io Str] -> Str
doStuff = \e ->
    toString (expandError e)

# (The type can also just be written [Io Str] -> [Io Str])
expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e ->
    when e is
        Io str -> Io str

Apparently with this change, I could write expandError like this instead:

expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e ->
    when e is
        error -> error

That seems pretty useful to me! It doesn't look like a big difference here, but it could save a few lines if there were more tags in the original union. And more importantly, any time I ran into a situation like this, wouldn't I be able to write the exact same function?

# Because of type inference, this could expand any tag union I want, right?
# Though I don't know whether I would need to write a different `expand` function for each type.
expand = \x ->
    when x is
        a -> a

I'm really confusing myself over this, and I'm probably misunderstanding something.

view this post on Zulip Ajai Nelson (May 28 2023 at 20:05):

(I guess in that specific example, you could just write doStuff = \Io str -> toString (Io str). But as far as I know, you still need a when expression if there's more than one tag in the original union.)

view this post on Zulip Brendan Hansknecht (May 28 2023 at 21:38):

I think you should also be able to make doStuff take an [Io Str]*

view this post on Zulip Brendan Hansknecht (May 28 2023 at 21:39):

Or just make it take the full type instead of just half of the type.

view this post on Zulip Ayaz Hafiz (May 29 2023 at 01:25):

Your interpretation is correct

view this post on Zulip Ayaz Hafiz (May 29 2023 at 01:29):

In your specific example, you are right that the only way to do it currently is via the conversion function you would have and the proposal would make the alternative you suggested possible.

Do you have a concrete example of what you ran into? I am surprised you ran into this, I can only think of one case I have before. Unfortunately, changing doStuff to take an [Io Str]* would not work, because then you also need to make toString take an open tag union.

view this post on Zulip Ajai Nelson (May 29 2023 at 04:10):

Sure! Now that I'm looking at it, my case was probably unusual. I think I just assumed it was common just because it was one of the first things I ran into. It was pretty easy to work around, and this feature would only have made it slightly more convenient if at all.

I've been working through the first few chapters of the book Types and Programming Languages, and I was implementing big-step evaluation of the untyped lambda calculus with De Bruijn indexing.

Code

view this post on Zulip Ajai Nelson (May 29 2023 at 04:17):

By the way, I really like the proposal! Sorry to fill up this topic with my weird example. Part of the reason I'm studying types right now is because I want to be able to think in more detail about Roc's type system and type inference implementation!

view this post on Zulip Ayaz Hafiz (May 31 2023 at 05:05):

de Bruijn indeces.. i remember how painful those were :sweat_smile: I never had a great intuition for them; like it’s fine to implement, but never “pleasant”. Thankfully in practice I’ve found it’s not really needed unless you need to test for higher-kind or dependent type equivalence.

view this post on Zulip Ayaz Hafiz (May 31 2023 at 05:06):

(if you want to talk about any of this stuff feel free to DM me or post in #contributing about possible type-related projects for Roc- we have a few!)

view this post on Zulip Ayaz Hafiz (May 31 2023 at 05:07):

What is the status of this proposal? Are we comfortable with it? Are there outstanding questions or concerns?

view this post on Zulip Richard Feldman (May 31 2023 at 11:21):

I have no concerns, :thumbs_up: from me!

view this post on Zulip Ayaz Hafiz (Jun 03 2023 at 01:13):

Here's an issue for implementing this proposal: https://github.com/roc-lang/roc/issues/5504

If you'd like to work on it, feel free to message in #contributing and we'd be happy to help as much as is wanted! This is kind of a larger change, but it's well scoped and will give you a feel for the entirety of Roc's type system implementation, if you are interested in that.

view this post on Zulip Ajai Nelson (Jun 07 2023 at 05:25):

@Ayaz Hafiz Under this proposal, would this type check? Because the type of bar would get expanded to [Bar1, Bar2, Bar3, Bar4, Bar5]?

Foo : [A, B Bar]
Bar : [Bar1, Bar2, Bar3, Bar4]

f : Foo -> I64
f = \foo ->
    when foo is
        A -> 1
        B bar ->
            when bar is
                Bar1 -> 10
                Bar2 -> 20
                Bar3 -> 30
                Bar4 -> 40
                Bar5 -> 50
                #^ including the last branch (Bar5) was almost certainly a mistake

view this post on Zulip Ayaz Hafiz (Jun 07 2023 at 13:56):

Yeah I think so. It's a more general problem though. I have an idea for it, bidirectional exhaustiveness checking, but it's an orthogonal proposal I think.

view this post on Zulip Ajai Nelson (Jun 18 2024 at 09:09):

Ayaz Hafiz said:

Here is a concrete proposal for this feature: https://rwx.notion.site/Union-Refinement-5bef5072afb84675948f89525a25dbb7

(Part of it is a bit technical, but I wanted to describe the exact mechanics under proposal.) I think this aligns with the previous ideas discussed in this thread, for the most part. There is a section of multiple examples at the bottom of the document - let me know if there are others that should be discussed!

I was thinking about working on this again, but I can't seem to access the proposal on Notion anymore

view this post on Zulip Ayaz Hafiz (Jun 19 2024 at 01:57):

https://github.com/roc-lang/rfcs/blob/092d53e4a26f34154c7583717183510e483f7f36/0011-union-refinement.md

view this post on Zulip Ajai Nelson (Jun 19 2024 at 02:09):

Thanks!


Last updated: Jun 16 2026 at 16:19 UTC