Refinement of tag unions · contributing

Stream: contributing

Topic: Refinement of tag unions

Ajai Nelson (Jun 03 2023 at 05:23):

@Ayaz Hafiz I think I'd be interested in helping out with implementing refinement of tag unions if I can. I haven't spent much time looking at the type system implementation yet, but I was hoping to do that soon anyway!

Also, a tiny thing in the proposal: I'm guessing the f : [User, Admin, SuperAdmin] signatures are missing a return type (or should be t : [User, Admin, SuperAdmin]).

Ayaz Hafiz (Jun 03 2023 at 13:45):

sweet! you're totally right, apologies for the typo, I've went ahead and fixed it. Please let us know if you have any questions about the description in the proposal/implementation.

Ajai Nelson (Jun 16 2023 at 04:11):

I would appreciate some help getting started with this.

My initial thought was that I could start with the type expansion part, since implementing the refinement without that could break working programs.

expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e -> when e is
  error -> error

My first idea was that for each identifier introduced in a pattern, I would create a new type variable to represent the possibly expanded version of it. So in this case, the goal is for error_unexpanded to be [Io Str] and error_expanded to be [Io Str, Net Str]. References to error on the right side of the arrow would actually refer to error_expanded. My idea was that I would use unify() with Mode::PRESENTin order to ensure all the tags in the unexpanded variable were present in the expanded variable.

So I tried that. But it turns out Mode::PRESENT isn't as magical as I naively assumed. It doesn't seem to continuously enforce a relationship between error_unexpanded and error_expanded; it just adds the tags in the error_unexpanded type to error_expanded at the point when unify() is called. (In this case, I think it just decided error_unexpandedand error_expanded were equal.)

So all that's to say, my initial idea didn't work out, and I'm not really sure how to approach this.

Ayaz Hafiz (Jun 16 2023 at 15:22):

I wonder if you can perform the inference scheme the following way:

Determine the type of error on the left-hand side (LHS) of the pattern-match error - as it already happens today. Then, as you describe, create a new type variable error_expanded based on the solved type of error in the LHS.
The instantiation of error_expanded proceeds in the following way:
- error_expanded = clone(error_unexpanded) (you don't want to unify them, since they are not equivalent).
- Then, if error_expanded is a tag union, it is "opened" (an unbound variable is added as its extension type, see the open_tag_union function).
- Now, in the example above, error_expanded should look like [Io Str]<fresh_ext> where fresh_ext is unbound.
Binding the RHS occurrence of error to error_expanded and then solving the RHS of the branches will then instantiate fresh_ext to [Net Str], yielding the expanded type you want.

Ajai Nelson (Jun 16 2023 at 19:16):

Would that still work if the RHS imposed more restrictions on e? For example

expandError = \e ->
  _ = f e
  # Now we know the type of e contains [A].
  when e is
    error ->
      # error_expanded is created before solving the RHS.
      _ = g e
      # Now we know that the type of e also contains [B].
      # But that restriction doesn't apply to error_expanded since the clone already happened.
      # So the following doesn't produce a type error, but it should!
      _ = h error
      error

f : [A]* -> {}
g : [B]* -> {}
h : [A] -> {}

Ayaz Hafiz (Jun 16 2023 at 20:40):

I believe that if the type of the branch pattern is non-exhaustive, then the extension variable of the branch pattern (if it exists) should be shared with the expanded type on the RHS. So in your example above we would have

expandError = \e ->
  _ = f e # e = [A]x
  when e is
    error -> # error_unexpanded = [A]x , error_expanded = [A]x
      _ = g e # [A]x ~ [B]y => error_unexpanded = [A, B]y, error_expanded = [A, B]y
      _ = h error # => type error
      error

then in a case where refinement happens, like

expandError : [A, B, C]a -> {}
expandError = \e ->
  when e is
    A -> {}
    error ->
      # first we solve : e = [A, B, C]a
      # then refinement: error_unexpanded = [B, C]a , error_expanded = [B, C]a
      {}

this is still okay. So the extension variable is only freshened if we can't find an extension variable for the unexpanded type.

expandError : [A, B, C] -> {}
expandError = \e ->
  when e is
    A -> {}
    error ->
      # first we solve : e = [A, B, C]
      # then refinement: error_unexpanded = [B, C] , error_expanded = [B, C]x
      # at this point only `x` can grow, so this is also sound
      {}

Ajai Nelson (Jun 16 2023 at 23:42):

Ok, I think I'm understanding better now. So effectively, that means the expansion only happens if we already know that the unexpanded tag union is closed (i.e., its extension variable is []).

Also, does this mean we can't always infer the most general type? Take this example:

expandError : [Io Str, Net Str] -> I64
expandError = \e ->
  when e is
    Io Str -> 1
    error ->
      # first we solve: e = [Io Str, Net Str]
      # then refinement: error_unexpanded = [Net Str], error_expanded = [Net Str]x
      # x is inferred to be [Io Str, Timeout Str]
      f error

f : [Io Str, Net Str, Timeout Str] -> I64

But if you remove the annotation, it would infer a different and incompatible type, right?

expandError = \e ->
  when e is
    Io Str -> 1
    error ->
      # first we solve: e = [Io Str]x
      # then refinement: error_unexpanded = []x, error_expanded = []x
      # x is inferred to be [Io Str, Net Str, Timeout Str]
      f error
# expandError is inferred to be [Io Str, Net Str, Timeout Str] -> I64

f : [Io Str, Net Str, Timeout Str] -> I64

Though I'm not sure if there even is a way to express the most general type of that function.

Ayaz Hafiz (Jun 17 2023 at 01:30):

that is a good example. I think you're right, it loses principality relative to control flow (since the principal type relative to control flow would be [Io Str] -> I64). But it doesn't lose principality relative to unification (the former function wouldn't type-check relative to unification). Maybe we can find a way to recover principality in general, relative to control flow, but I can't think of a way immediately

Ayaz Hafiz (Jun 17 2023 at 01:49):

actually, this function has no principal type right? All of [Io Str] -> I64, [Io Str, Net Str] -> I64, [Io Str, Net Str, Timeout Str] -> I64 are "equivalently principal"

Ayaz Hafiz (Jun 17 2023 at 01:49):

none of those types are compatible with any other

Ayaz Hafiz (Jun 17 2023 at 01:50):

so we infer "a" principal type, but "the" principal type doesn't exist..

Ajai Nelson (Jun 17 2023 at 01:50):

Yeah, I think so

Ajai Nelson (Jun 17 2023 at 02:50):

I think I've implemented the inference scheme you suggested, and it seems to be working: https://github.com/roc-lang/roc/pull/5559

Ajai Nelson (Jun 17 2023 at 02:51):

Though it's still pretty messy, and I still need to do the code gen part

Ajai Nelson (Jun 17 2023 at 03:01):

Do you know why the open_tag_union pattern match doesn't include EmptyTagUnion or FunctionOrTagUnion? https://github.com/roc-lang/roc/blob/0f327d9f38390070b8d774398813b8797897565a/crates/compiler/solve/src/solve.rs#L1910-L1943

Ayaz Hafiz (Jun 17 2023 at 03:34):

Missing FunctionOrTagUnion is probably an oversight. Missing EmptyTagUnion, I think it is an implicit assumption that EmptyTagUnion does not appear in non-extension-type positions, but that is probably an oversight too.

Ajai Nelson (Jun 18 2023 at 18:25):

Given the impact on type inference, do you think maybe we should discuss the proposal more before moving forward?

Ayaz Hafiz (Jun 18 2023 at 20:16):

Is the only concern w.r.t. the loss of a single principal type in some circumstances? If so, I personally don't think that's a big deal, as it does not materially impact expressivity, soundness, or correctness - and we retain principality relative to unification. That is, even though there is no principal type in that scenario, the three possible principal types are all incompatible. Moreover the inferred [Io Str, Net Str, Timeout Str]argument is in a sense the most "general" type - in the sense that it admits more instances of polymorphic types than either [Io Str] or [Io Str, Net Str] does. Also, there are other places in the language where a single principal type is undecidable (namely when ability usages are undecidable). All that to say I don't think this observation pokes any holes.

Richard Feldman (Jun 18 2023 at 20:17):

yeah I think that's fine too :thumbs_up:

Richard Feldman (Jun 18 2023 at 20:18):

the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"

Ajai Nelson (Jun 19 2023 at 01:04):

Richard Feldman said:

the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"

Here's an example where I think that might happen:

validate = \color ->
  when color is
    Black -> Err "not a color"
    c ->
      # first we solve: color = [Black]x
      # then refinement: c_unexpanded = []x, c_expanded = []x
      # x is inferred (because of isBluish) to be [Red, Blue, Teal, White]
      if isBluish color then
        Err "bluish colors not allowed"
      else
        Ok c
# validate is inferred to be [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White][] Str

isBluish : [Red, Blue, Teal, Black, White][] -> Bool

If we add a type annotation, validate can return an open type, which is more general:

validate : [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White]* Str
validate = \color ->
  when color is
    Black -> Err "not a color"
    c ->
      # first we solve: color = [Red, Blue, Teal, Black, White]
      # then refinement: c_unexpanded = [Red, Blue, Teal, White], error_expanded = [Red, Blue, Teal, White]x
      if isBluish color then
        Err "bluish colors not allowed"
      else
        Ok c

isBluish : [Red, Blue, Teal, Black, White][] -> Bool

I’m guessing there aren’t a lot of situations where this would be a problem, and it’s easy enough to just add a type annotation. But technically, I think it means adding an annotation can make the type more general.

Ayaz Hafiz (Jun 19 2023 at 01:23):

In this case, the latter program does not type-check. A closed tag union cannot become open in that position - unifying an open union with a closed one makes the union closed.

Ayaz Hafiz (Jun 19 2023 at 01:25):

More generally, the flow-based unification scheme can expand the number of material tags in a union (causing more than one principal type), but I don't think it can unduly close an open union, so we cannot lose generalization where it should be.

Ajai Nelson (Jun 19 2023 at 01:33):

unifying an open union with a closed one makes the union closed

I'm not sure if I understand. Are you referring to the unification of [Red, Blue, Teal, White]x (from the expansion) and [Red, Blue, Teal, White]* (from the type annotation)?

Ayaz Hafiz (Jun 19 2023 at 01:35):

oh, sorry, I misread your example as returning Ok color in the first case. My bad, okay, I see your point.. let me think about this

Ajai Nelson (Jun 19 2023 at 01:40):

Yeah, it's definitely pretty contrived that I used both color and c in that branch

Ajai Nelson (Jun 19 2023 at 01:41):

Though I think maybe this problem could still come up even if the expression that restricts color occurs after the entire when expression

Ayaz Hafiz (Jun 19 2023 at 01:47):

yeah that's true

Ayaz Hafiz (Jun 19 2023 at 02:31):

Thanks for bringing all this up. I think I've really internalized the issue now - which is that we have two possible kinds of branches in unification now, points of branches due to refinement control flow, and points of branches do to the order defs/uses of a variable are constrained in. If we fork in one, it might conflict with the other. With that in mind, some observations:

This issue still means that we always retain principality ("most general types") relative to unification. It is true that you can now type a variable to be more general due to control flow, but inference would not infer types any more generally than it would be today. This may seem like a trivial point, but I'll elaborate on it later on.
If we perform type refinement out-of-band of unification - that, solve e.g. a function definition via unification first, and then do a pass to refine and re-solve applicable variables - I believe that this issue disappears. The reason is that there are no additional constraints that are not already accounted for in the type if we go this route. There are some downsides, two are that (1) it may induce error during unification that refinement would admit, and (2) it is pretty complicated. In pathological cases it might require re-checking a whole function, but I suspect in practice it won't be costly at all. But this feels like a not-very-good option.
A long time ago I had following idea: suppose we are given condition cond with a branch pat -> ... We say that typeof(cond) = typeof(pat) = pat_unexpanded = []x, and then create a pat_expanded = [](x + y), where x is the extension variable used when cond is referenced, and y is used when pat is expanded. This has soundness issues and immediately breaks down, so it's not viable, but maybe there is some derivative design that works.
The current idiomatic usage of the language means that you cannot eagerly type open tag unions in return positions, and as such, it is possible to return closed tag unions even when it doesn't appear so. For example, given the signature {} -> [A, B], this is parsed as {} -> [A, B]<x> where x is unbound. The means a definition

foo : [A, B] -> [A, B]
foo = \x -> x

will indeed be the identity and return [A, B][], whereas the definition

foo : {} -> [A, B]
foo = \{} -> A

returns [A, B]*. This was a deliberate decision for developer ergonomics reasons.

With all that in mind, I think our options here are, if we cannot find another one-pass solving system, are

Drop the proposal in favor of keeping unrestricted principality
Figure out a two-pass solving system (strongly opposed, not for perf reasons but for the others listed above)
Keep the proposal, preserving principality relative to unification (I think this is completely unbroken, but let me know if I'm wrong), and expect that this case never occurs in practice. I am strongly in favor of this option, because (1) it is clear that refinement is the experience many developers prefer, and (2) it is already the case that you cannot explicitly type an open tag union (e.g. [...]*) in a positive position without getting a warning, because the extension is inferred by default.

Let me know if I'm missing anything here.

Ajai Nelson (Jun 19 2023 at 02:43):

What does principality "relative to unification" vs "relative to control flow" mean?

Ayaz Hafiz (Jun 19 2023 at 02:53):

My bad, i should have clarified. By "relative to unification" I mean, suppose our system is S. If we infer a function with our system S, do we get the same type that the system would if it only performed unification? (i.e. is the inferred type the same as the principal would be today) For "relative to control flow" I mean, is the type inferred by S the actually-principal type, considering the behavior of type refinement?

Ajai Nelson (Jun 19 2023 at 03:03):

Ok, that makes sense. Yeah, I'm pretty sure you're right that we still have principality relative to unification

Ajai Nelson (Jun 19 2023 at 03:11):

I think I agree with your comment above

Ajai Nelson (Jun 19 2023 at 03:33):

I haven't totally thought this through, but what if we did something like this:

As before, determine the condition type and the type of the unexpanded variable.
We instantiate the expanded variable to just be an unbound flex variable
Solve the RHS
Make sure the unexpanded variable is a "subtype" of the expanded variable:
- Assuming they're both tag unions, to do this, check that either
  - expanded is not closed, or
  - all the tags in unexpanded are in expanded
- If they're not both tag unions, just unify them

I'm not sure whether this would work, especially if there are multiple whens. If it does, my intuition is that it would solve some (but maybe not all) of these problems. Though maybe it would make it harder to generate good error messages

Ayaz Hafiz (Jun 19 2023 at 03:46):

yeah, I think this could work, but it's not totally clear to me (1) how we localize type errors when they do appear, and (2) if there are cases where in the direct solution we would have inferred a type, but in this model there is a type error (which can happen if expanded is refined, I believe), how to recover the non-erroring type.

There is also the note that this would not resolve the case in which the condition type is further constrained after the when expression, and to resolve that, we need the two-pass system, and I am afraid that winds up with the downsides previously described.

Ajai Nelson (Jun 19 2023 at 04:15):

Yeah, that all makes sense. Probably better to continue with your original plan then

Richard Feldman (Jun 19 2023 at 12:06):

Ayaz Hafiz said:

Keep the proposal, preserving principality relative to unification (I think this is completely unbroken, but let me know if I'm wrong), and expect that this case never occurs in practice. I am strongly in favor of this option, because (1) it is clear that refinement is the experience many developers prefer, and (2) it is already the case that you cannot explicitly type an open tag union (e.g. [...]*) in a positive position without getting a warning, because the extension is inferred by default.

what if we made that annotation a type mismatch instead of a warning?

Richard Feldman (Jun 19 2023 at 12:06):

(I'm not saying we should or shouldn't, just wondering what the downsides would be)

Ayaz Hafiz (Jun 19 2023 at 14:19):

what are the upsides of that? The type annotation may not be wrong (and can catch a bug, sometimes), but in general it is redundant - hence the warning. But if you do foo : [A] -> [A]*; foo = \x -> x, the type annotation catches a bug in your program.

Richard Feldman (Jun 19 2023 at 15:12):

hm, fair point

Richard Feldman (Jun 19 2023 at 15:12):

but if we think that's genuinely valuable, we probably shouldn't make it a warning

Richard Feldman (Jun 19 2023 at 20:58):

regardless, I agree with the point that it's worth proceeding with Ayaz's proposed plan (regardless of whether we make the annotation a warning, an error, or nothing)

Last updated: Jul 26 2025 at 12:14 UTC