Narrowing types in when expressions · ideas

Some time ago on some GitHub issue I can't find now (@Richard Feldman and @Folkert de Vries may know), we noted the behavior of the following code:

With a certain perspective, the ideal type for this function would be [A, C]a -> [B, D]a. In order to do this, at the branch a -> a, we need to "remove" the variants A and C from the set of tags that might be in the type of a. This kind of typing-based-on-control-flow is generally known as flow typing, and the removal of types that cannot appear in a branch is called "narrowing".

Here are two more (somewhat contrived) examples where type narrowing might be useful. Suppose that error is a function that immediately yields a runtime error and stops the active thread.

With type narrowing in when branches, the issues in both examples above would be resolved. So the proposal here has two components:

This idea is not new; TypeScript does it, and Java byte code also does a version of it. It turns out it's tricky to do this in a language with subtyping and type inference, but it's doable - and in Roc it would be even more tractable, because for us what it means to "subtract" a type is very easy to define.

Anyway this is a tangent. So why don't other languages with pattern matching usually support this? I think the biggest reason is the performance cost - by changing the type of a variable in a branch, Roc would also have to change its runtime representation; that is, its value. In the last example above, when x goes from [A,B,C,D]a to [B], we must change the value referenced by x to have the layout of the type [B] (which is different from [A,B,C,D]a)! This is basically involves a few memory access and memory stores, not any data modification. This means the CPU/RAM usage isn't a lot, but it's something.

I'm curious what others think of this and if you all think it's worth implementing. The way I see it, there are three options:

Richard Feldman (Feb 07 2022 at 01:53):

personally I prefer the way this reads to the Overflow | Underflow -> ... approach, and I'd also expect it to run faster because it has 1 when instead of 2 - so in this case it seems like it would be better to encourage the helper function style!

in the retryWithDelay example, I'd make the two helper functions take normal arguments instead of a single-tag union:

I personally prefer this style - it's a little more verbose to call, but the types are more concise, and I'd feel weird manually calling retryWithDelay (RateLimited req delay) if I ever wanted to call it from somewhere else - e.g. from a test :sweat_smile:

Ayaz Hafiz (Feb 18 2022 at 02:18):

Ayaz Hafiz (Feb 18 2022 at 02:19):

The workarounds still apply, they're just bulky to write ): and not narrowing the type is unpleasant because you must manually expect that you covered all the cases you expected to

Ayaz Hafiz (Feb 18 2022 at 02:21):

I also thought about this some more and I've realized the performance issue need not be a concern. Indeed we don't need to do a type conversion at all. During type checking and exhaustiveness checking we can do the type narrowing on the surface syntax, so that the programmer is guaranteed the program is type-correct, but during code generation (in particular monomorphization) we can use the "shadow" type representation which is the larger, unnarrowed type. This is still sound because we'll have already proved the unused branches of that type are never inhabited, and we avoid having to do any conversion.

Richard Feldman (May 24 2023 at 20:10):

Ayaz Hafiz (May 24 2023 at 20:12):

I totally forgot about this lol, I need to think about how I came to this conclusion. I'm not sure it's correct :sweat_smile: Perhaps the poor wisdom of me a year and half ago

Ayaz Hafiz (May 24 2023 at 20:13):

My intuition (now) is that the old message is not possible unless we also embrace subtyping but that is a huge change

Ayaz Hafiz (May 24 2023 at 20:17):

yeah one counterexample to that old message is suppose that main is defined as follows and escapes to the host

having t shadow and be compiled as [A,B] in the output position cannot work without subtyping semantics and the appropriate compilation model change

Ayaz Hafiz (May 24 2023 at 20:19):

this extends to non-host-exposed functions as well, imagine that the branch A -> t is instead A -> f t where f : [A] -> [A]. Now we again need subtyping semantics for compilation, or else compile f as we do polymorphic functions (even though f was not declared as polymorphic, it is relative to this use site)

Ayaz Hafiz (May 24 2023 at 20:22):

since presumably, if you are naming a catchall branch like this, you are intending to opt in to narrowing. So this has no runtime effect for anything where the compiler does not assume you want to narrow (and it will type check that you indeed wanted to narrow, because otherwise you'll get a type error)

Brendan Hansknecht (May 24 2023 at 20:25):

Fundamentally, changing from [A, B SomeData, ...] to [B SomeData], would just be changing a tag number, correct? CauseB SomeData with all of it's data is guaranteed to fit into [A, B SomeData, ...]. Of course, you need to copy to update the tag number if the orginal var is ever used elsewhere.

Ayaz Hafiz (May 24 2023 at 20:27):

What do you mean by tag number?
As a simple example, [A Str, B Str] is represented equivalently to the struct { tag: 1 byte, payload: RocStr}. But [A Str] is represented as "just" RocStr. Since there is a difference in the size of the runtime representation there is a runtime conversion to do.

Brendan Hansknecht (May 24 2023 at 20:29):

Ayaz Hafiz (May 24 2023 at 20:30):

even in the case where there is a tag index, if the narrowed union type has a different payload size, the conversion is necessary

Richard Feldman (May 24 2023 at 20:31):

Ayaz Hafiz (May 24 2023 at 20:32):

that said (I think) the conversion will mostly be trivial in cost since it’s only moving top-level wrappers around

Richard Feldman (May 24 2023 at 20:32):

Ayaz Hafiz (May 24 2023 at 20:32):

Richard Feldman (May 24 2023 at 20:32):

my default feeling here is that we should have it Just Work even though there's nonzero runtime cost

Ayaz Hafiz (May 24 2023 at 20:32):

Richard Feldman (May 24 2023 at 20:32):

Brendan Hansknecht (May 24 2023 at 20:36):

i guess you may have to update some refcounts which is random loads and that would be the slowest part.

Brendan Hansknecht (May 24 2023 at 20:37):

So only bad if you have a list of strings in a tag. Then it may update every refcount in the entire list.

Brendan Hansknecht (May 24 2023 at 20:37):

Hmm..
Actually, i guess you have to do that either way if you keep around the original var. So that isn't special to this case

Brendan Hansknecht (May 24 2023 at 20:37):

Ayaz Hafiz (May 24 2023 at 20:40):

I think it’s also important to note that I believe this to be a zero-cost abstraction

Ayaz Hafiz (May 24 2023 at 20:40):

and if you didn’t then you will end up passing a smaller type to a larger context and the compiler will error to you because the narrowed type will be closed.

Agus Zubiaga (May 24 2023 at 20:41):

Is the extra cost only incurred when you refine to a union with a single tag? I wouldn’t expect refining to a single tag to be as common

Ayaz Hafiz (May 24 2023 at 20:42):

Agus Zubiaga (May 24 2023 at 20:42):

Brendan Hansknecht (May 24 2023 at 20:43):

Ayaz Hafiz (May 24 2023 at 20:44):

Ayaz Hafiz (May 24 2023 at 20:45):

the second we should probably discuss but we could make it [A I32, B, C], or a type error

Brendan Hansknecht (May 24 2023 at 20:47):

Ayaz Hafiz (May 24 2023 at 20:47):

Like, in the second we might assume that the bottom branch wanted to narrow to [B, C] exactly, but then you tried to add the A variant to it. We can either make that work, or say that if you did not actually want to narrow the return value, change that branch to be _ -> a instead

Brendan Hansknecht (May 24 2023 at 20:47):

Ayaz Hafiz (May 24 2023 at 20:50):

Ayaz Hafiz (May 24 2023 at 20:51):

the simple answer to type expansion is open unions so maybe it’s always possible to work around that way actually

Ayaz Hafiz (May 24 2023 at 21:08):

Ayaz Hafiz (May 24 2023 at 21:10):

Ayaz Hafiz (May 24 2023 at 21:13):

thinking about it more, maybe it would be best to never have it type error if you pass in the narrowed type to a larger context, and instead have the narrowing scheme perform type expansion as appropriate if needed. In the case of

we would see that the "narrowed" type of x is actually [A I32, B, C], and so actually no conversion at all is necessary, and can be eliminated

Ayaz Hafiz (May 24 2023 at 21:22):

this would perform a conversion when we didn't need to, so that is not zero-cost, but this seems like a contrived case (and the cost is negligible anyway). maybe the compiler could be smarter here too

Sky Rose (May 26 2023 at 12:30):

I agree it would be nice if it never errored. For example, this should not error

and whether the _ case is written as _ -> defaultColorName c, gb -> defaultColorName gb, or Green | Blue -> defaultColorName c.

Sky Rose (May 26 2023 at 12:31):

Those would all be normal things for a programmer to write, and it would be frustrating if they behaved differently.

Ayaz Hafiz (May 26 2023 at 16:49):

I think there is an important difference between _ -> defaultColorName c and gb -> defaultColorName gb. The former is a form of shadowing (which Roc doesn't currently support), and the latter makes it more explicit " in this branch, here is how I narrow this type "

Richard Feldman (May 26 2023 at 16:50):

Richard Feldman (May 26 2023 at 16:51):

I don't think it would be a significant inconvenience in that case, although without an error message like that, it could be surprising/confusing and people might not know what to do to solve it :big_smile:

Brendan Hansknecht (May 26 2023 at 17:10):

Seeing _ -> defaultColorName c I assume you are intentially re-expanding the type for some reason.

Ayaz Hafiz (May 26 2023 at 20:14):

(Part of it is a bit technical, but I wanted to describe the exact mechanics under proposal.) I think this aligns with the previous ideas discussed in this thread, for the most part. There is a section of multiple examples at the bottom of the document - let me know if there are others that should be discussed!

Ajai Nelson (May 28 2023 at 19:59):

This is probably due to my lack of understanding of the current type system, but I'm kind of confused about the last section ("Catch-all named variable has larger type than condition"), particularly this part:

The workaround I know of is the following. (The paragraph quoted above kind of makes it sound like maybe there's another easy way? But I'm not sure what that would be.)

That seems pretty useful to me! It doesn't look like a big difference here, but it could save a few lines if there were more tags in the original union. And more importantly, any time I ran into a situation like this, wouldn't I be able to write the exact same function?

I'm really confusing myself over this, and I'm probably misunderstanding something.

Ajai Nelson (May 28 2023 at 20:05):

(I guess in that specific example, you could just write doStuff = \Io str -> toString (Io str). But as far as I know, you still need a when expression if there's more than one tag in the original union.)

Brendan Hansknecht (May 28 2023 at 21:38):

Brendan Hansknecht (May 28 2023 at 21:39):

Ayaz Hafiz (May 29 2023 at 01:25):

Ayaz Hafiz (May 29 2023 at 01:29):

In your specific example, you are right that the only way to do it currently is via the conversion function you would have and the proposal would make the alternative you suggested possible.

Do you have a concrete example of what you ran into? I am surprised you ran into this, I can only think of one case I have before. Unfortunately, changing doStuff to take an [Io Str]* would not work, because then you also need to make toString take an open tag union.

Ajai Nelson (May 29 2023 at 04:10):

Sure! Now that I'm looking at it, my case was probably unusual. I think I just assumed it was common just because it was one of the first things I ran into. It was pretty easy to work around, and this feature would only have made it slightly more convenient if at all.

Code

app "lambda"
    packages { pf: "https://github.com/roc-lang/basic-cli/releases/download/0.3.2/tE4xS_zLdmmxmHwHih9kHWQ7fsXtJr7W7h3425-eZFk.tar.br" }
    imports [pf.Stdout]
    provides [main] to pf

Term : [
    Var I64, # variable, uses a De Bruijn index
    Abs Term, # lambda
    App Term Term, # application
]

# This could be just `Term`, but because `bigStepEval` should always return `Abs`s (lambdas),
# I wanted to be more specific so I wouldn't have to handle impossible recursive cases.
# I ended up changing it back to `Term` because of the very minor inconveniences this caused.
#                            ||||||||||
bigStepEval : Term -> Result [Abs Term] [NoRuleApplies]
bigStepEval = \term ->
    when term is
        # v ⇓ v
        Abs body ->
            # value
            Ok (Abs body)

        App t1 t2 ->
            when bigStepEval t1 is
                # Without De Bruijn:
                # t1 ⇓ λx.t12      t2 ⇓ t2p      [x ↦ t2p]t12 ⇓ t'
                # ------------------------------------------------
                #                  t1 t2 ⇓ t'
                Ok (Abs t12) ->
                    # I originally wanted to do the following:
                    #     t2p <- Result.try (bigStepEval t2)
                    #     bigStepEval (termSubst t12 0 t2p)
                    # but I couldn't because `t2p` is a `Result [Abs Term] [NoRuleApplies]`,
                    # and `termSubst` expects a `Result Term [NoRuleApplies]`. It was easy
                    # to fix by destructuring in this case, though.
                    Abs t2pBody <- Result.try (bigStepEval t2)
                    bigStepEval (apply t12 (Abs t2pBody))

                # Ok (Var _) | Ok (App _ _) -> Err NoRuleApplies
                # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                # Because bigStepEval never returns a `Var` or `App`,
                # this case will never happen, so I thought it would
                # be cool for the type system to recognize that.

                Err NoRuleApplies -> Err NoRuleApplies

        _ ->
            Err NoRuleApplies

main =
    # https://en.wikipedia.org/wiki/De_Bruijn_index example (shifted to use 0-indexing)
    a = Abs (App (App (Var 3) (Var 1)) (Abs (App (Var 0) (Var 2))))
    b = Abs (App (Var 4) (Var 0))
    c = Abs (App (App (Var 2) (Abs (App (Var 5) (Var 0)))) (Abs (App (Var 0) (Abs (App (Var 6) (Var 0))))))

    [
        "Expected value:",
        termToStr c,
        # I originally wanted to do the following
        #     resultToStr (bigStepEval (App (Abs a) b)),
        # Of course, it's not that hard to fix:
        resultToStr (
            when bigStepEval (App (Abs a) b) is
                Ok (Abs t) -> Ok (Abs t)
                Err NoRuleApplies -> Err NoRuleApplies
        ),
        "Actual value^"
    ]
    |> Str.joinWith "\n"
    |> Stdout.line

######### Helper functions (implementations not particularly relevant)
# I had already written the following functions as part of the implementation of
# small-step evaluation, so I couldn't change their signatures.

## Shift the indices of every free variable by the specified amount
termShift : Term, I64 -> Term
termShift = \term, by ->
    walk : Term, I64 -> Term
    walk = \t, cutoff ->
        when t is
            Var n ->
                isBound = n < cutoff
                if isBound then
                    Var n
                else
                    Var (n + by)

            Abs t1 ->
                Abs (walk t1 (cutoff + 1))

            App t1 t2 ->
                App (walk t1 cutoff) (walk t2 cutoff)

    walk term 0

expect termShift (Abs (App (Var 4) (Var 0))) 1 == Abs (App (Var 5) (Var 0))
expect termShift (Abs (App (Var 4) (Var 0))) 2 == Abs (App (Var 6) (Var 0))

## Substitute `beforeVar` with `afterExpr` in `term`
termSubst : Term, I64, Term -> Term
termSubst = \term, beforeVar, afterExpr ->
    when term is
        Var n if n == beforeVar ->
            afterExpr

        Var n ->
            Var n

        App t1 t2 ->
            App (termSubst t1 beforeVar afterExpr) (termSubst t2 beforeVar afterExpr)

        Abs t1 ->
            Abs (termSubst t1 (beforeVar + 1) (termShift afterExpr 1))

## Helper function for applying a function to an argument (given the function body)
apply : Term, Term -> Term
apply = \functionBody, arg ->
    functionBody
    |> termSubst 0 (termShift arg 1)
    |> termShift -1

resultToStr : Result Term [NoRuleApplies] -> Str
resultToStr = \res ->
    when res is
        Ok t -> termToStr t
        Err NoRuleApplies -> "No rule applies"

termToStr : Term -> Str
termToStr = \term ->
    when term is
        Var n ->
            str = Num.toStr n
            "\(str)"

        Abs t ->
            str = termToStr t
            "λ.\(str)"

        App t1 t2 ->
            str1 = termToStr t1
            str1Paren =
                when t1 is
                    Var _ -> str1
                    Abs _ -> "(\(str1))"
                    App _ _ -> str1
            str2 = termToStr t2
            str2Paren =
                when t1 is
                    Var _ -> str2
                    Abs _ -> "(\(str2))"
                    App _ _ -> "(\(str2))"
            "\(str1Paren) \(str2Paren)"

Ajai Nelson (May 29 2023 at 04:17):

By the way, I really like the proposal! Sorry to fill up this topic with my weird example. Part of the reason I'm studying types right now is because I want to be able to think in more detail about Roc's type system and type inference implementation!

Ayaz Hafiz (May 31 2023 at 05:05):

de Bruijn indeces.. i remember how painful those were :sweat_smile: I never had a great intuition for them; like it’s fine to implement, but never “pleasant”. Thankfully in practice I’ve found it’s not really needed unless you need to test for higher-kind or dependent type equivalence.

Ayaz Hafiz (May 31 2023 at 05:06):

(if you want to talk about any of this stuff feel free to DM me or post in #contributing about possible type-related projects for Roc- we have a few!)

Ayaz Hafiz (May 31 2023 at 05:07):

What is the status of this proposal? Are we comfortable with it? Are there outstanding questions or concerns?

Richard Feldman (May 31 2023 at 11:21):

Ayaz Hafiz (Jun 03 2023 at 01:13):

If you'd like to work on it, feel free to message in #contributing and we'd be happy to help as much as is wanted! This is kind of a larger change, but it's well scoped and will give you a feel for the entirety of Roc's type system implementation, if you are interested in that.

Ajai Nelson (Jun 07 2023 at 05:25):

@Ayaz Hafiz Under this proposal, would this type check? Because the type of bar would get expanded to [Bar1, Bar2, Bar3, Bar4, Bar5]?

Ayaz Hafiz (Jun 07 2023 at 13:56):

Yeah I think so. It's a more general problem though. I have an idea for it, bidirectional exhaustiveness checking, but it's an orthogonal proposal I think.

Ajai Nelson (Jun 18 2024 at 09:09):

I was thinking about working on this again, but I can't seem to access the proposal on Notion anymore

Stream: ideas

Topic: Narrowing types in when expressions

Ayaz Hafiz (Feb 07 2022 at 01:14):

Richard Feldman (Feb 07 2022 at 01:53):

Ayaz Hafiz (Feb 18 2022 at 02:18):

Ayaz Hafiz (Feb 18 2022 at 02:19):

Ayaz Hafiz (Feb 18 2022 at 02:21):

Richard Feldman (May 24 2023 at 20:10):

Richard Feldman (May 24 2023 at 20:10):

Richard Feldman (May 24 2023 at 20:10):

Ayaz Hafiz (May 24 2023 at 20:12):

Ayaz Hafiz (May 24 2023 at 20:13):

Ayaz Hafiz (May 24 2023 at 20:17):

Ayaz Hafiz (May 24 2023 at 20:19):

Ayaz Hafiz (May 24 2023 at 20:22):

Brendan Hansknecht (May 24 2023 at 20:25):

Ayaz Hafiz (May 24 2023 at 20:27):

Brendan Hansknecht (May 24 2023 at 20:29):

Ayaz Hafiz (May 24 2023 at 20:30):

Richard Feldman (May 24 2023 at 20:31):

Ayaz Hafiz (May 24 2023 at 20:32):

Richard Feldman (May 24 2023 at 20:32):

Ayaz Hafiz (May 24 2023 at 20:32):

Richard Feldman (May 24 2023 at 20:32):

Ayaz Hafiz (May 24 2023 at 20:32):

Richard Feldman (May 24 2023 at 20:32):

Brendan Hansknecht (May 24 2023 at 20:36):

Brendan Hansknecht (May 24 2023 at 20:36):

Brendan Hansknecht (May 24 2023 at 20:36):

Brendan Hansknecht (May 24 2023 at 20:37):

Brendan Hansknecht (May 24 2023 at 20:37):

Brendan Hansknecht (May 24 2023 at 20:37):

Ayaz Hafiz (May 24 2023 at 20:40):

Ayaz Hafiz (May 24 2023 at 20:40):

Ayaz Hafiz (May 24 2023 at 20:40):

Agus Zubiaga (May 24 2023 at 20:41):

Ayaz Hafiz (May 24 2023 at 20:42):

Agus Zubiaga (May 24 2023 at 20:42):

Brendan Hansknecht (May 24 2023 at 20:43):

Ayaz Hafiz (May 24 2023 at 20:44):

Ayaz Hafiz (May 24 2023 at 20:45):

Brendan Hansknecht (May 24 2023 at 20:47):

Ayaz Hafiz (May 24 2023 at 20:47):

Brendan Hansknecht (May 24 2023 at 20:47):

Brendan Hansknecht (May 24 2023 at 20:47):

Ayaz Hafiz (May 24 2023 at 20:50):

Ayaz Hafiz (May 24 2023 at 20:50):

Ayaz Hafiz (May 24 2023 at 20:51):

Ayaz Hafiz (May 24 2023 at 21:08):

Ayaz Hafiz (May 24 2023 at 21:10):

Ayaz Hafiz (May 24 2023 at 21:13):

Ayaz Hafiz (May 24 2023 at 21:22):

Sky Rose (May 26 2023 at 12:30):

Sky Rose (May 26 2023 at 12:31):

Ayaz Hafiz (May 26 2023 at 16:49):

Richard Feldman (May 26 2023 at 16:50):

Richard Feldman (May 26 2023 at 16:51):

Brendan Hansknecht (May 26 2023 at 17:10):

Ayaz Hafiz (May 26 2023 at 20:14):

Ajai Nelson (May 28 2023 at 19:59):

Ajai Nelson (May 28 2023 at 20:05):

Brendan Hansknecht (May 28 2023 at 21:38):

Brendan Hansknecht (May 28 2023 at 21:39):

Ayaz Hafiz (May 29 2023 at 01:25):

Ayaz Hafiz (May 29 2023 at 01:29):

Ajai Nelson (May 29 2023 at 04:10):

Ajai Nelson (May 29 2023 at 04:17):

Ayaz Hafiz (May 31 2023 at 05:05):

Ayaz Hafiz (May 31 2023 at 05:06):

Ayaz Hafiz (May 31 2023 at 05:07):

Richard Feldman (May 31 2023 at 11:21):

Ayaz Hafiz (Jun 03 2023 at 01:13):

Ajai Nelson (Jun 07 2023 at 05:25):

Ayaz Hafiz (Jun 07 2023 at 13:56):

Ajai Nelson (Jun 18 2024 at 09:09):

Ayaz Hafiz (Jun 19 2024 at 01:57):

Ajai Nelson (Jun 19 2024 at 02:09):