Stream: contributing

Topic: Refinement of tag unions


view this post on Zulip Ajai Nelson (Jun 03 2023 at 05:23):

@Ayaz Hafiz I think I'd be interested in helping out with implementing refinement of tag unions if I can. I haven't spent much time looking at the type system implementation yet, but I was hoping to do that soon anyway!

Also, a tiny thing in the proposal: I'm guessing the f : [User, Admin, SuperAdmin] signatures are missing a return type (or should be t : [User, Admin, SuperAdmin]).

view this post on Zulip Ayaz Hafiz (Jun 03 2023 at 13:45):

sweet! you're totally right, apologies for the typo, I've went ahead and fixed it. Please let us know if you have any questions about the description in the proposal/implementation.

view this post on Zulip Ajai Nelson (Jun 16 2023 at 04:11):

I would appreciate some help getting started with this.

My initial thought was that I could start with the type expansion part, since implementing the refinement without that could break working programs.

expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e -> when e is
  error -> error

My first idea was that for each identifier introduced in a pattern, I would create a new type variable to represent the possibly expanded version of it. So in this case, the goal is for error_unexpanded to be [Io Str] and error_expanded to be [Io Str, Net Str]. References to error on the right side of the arrow would actually refer to error_expanded. My idea was that I would use unify() with Mode::PRESENTin order to ensure all the tags in the unexpanded variable were present in the expanded variable.

So I tried that. But it turns out Mode::PRESENT isn't as magical as I naively assumed. It doesn't seem to continuously enforce a relationship between error_unexpanded and error_expanded; it just adds the tags in the error_unexpanded type to error_expanded at the point when unify() is called. (In this case, I think it just decided error_unexpandedand error_expanded were equal.)

So all that's to say, my initial idea didn't work out, and I'm not really sure how to approach this.

view this post on Zulip Ayaz Hafiz (Jun 16 2023 at 15:22):

I wonder if you can perform the inference scheme the following way:

view this post on Zulip Ajai Nelson (Jun 16 2023 at 19:16):

Would that still work if the RHS imposed more restrictions on e? For example

expandError = \e ->
  _ = f e
  # Now we know the type of e contains [A].
  when e is
    error ->
      # error_expanded is created before solving the RHS.
      _ = g e
      # Now we know that the type of e also contains [B].
      # But that restriction doesn't apply to error_expanded since the clone already happened.
      # So the following doesn't produce a type error, but it should!
      _ = h error
      error

f : [A]* -> {}
g : [B]* -> {}
h : [A] -> {}

view this post on Zulip Ayaz Hafiz (Jun 16 2023 at 20:40):

I believe that if the type of the branch pattern is non-exhaustive, then the extension variable of the branch pattern (if it exists) should be shared with the expanded type on the RHS. So in your example above we would have

expandError = \e ->
  _ = f e # e = [A]x
  when e is
    error -> # error_unexpanded = [A]x , error_expanded = [A]x
      _ = g e # [A]x ~ [B]y => error_unexpanded = [A, B]y, error_expanded = [A, B]y
      _ = h error # => type error
      error

then in a case where refinement happens, like

expandError : [A, B, C]a -> {}
expandError = \e ->
  when e is
    A -> {}
    error ->
      # first we solve : e = [A, B, C]a
      # then refinement: error_unexpanded = [B, C]a , error_expanded = [B, C]a
      {}

this is still okay. So the extension variable is only freshened if we can't find an extension variable for the unexpanded type.

expandError : [A, B, C] -> {}
expandError = \e ->
  when e is
    A -> {}
    error ->
      # first we solve : e = [A, B, C]
      # then refinement: error_unexpanded = [B, C] , error_expanded = [B, C]x
      # at this point only `x` can grow, so this is also sound
      {}

view this post on Zulip Ajai Nelson (Jun 16 2023 at 23:42):

Ok, I think I'm understanding better now. So effectively, that means the expansion only happens if we already know that the unexpanded tag union is closed (i.e., its extension variable is []).

Also, does this mean we can't always infer the most general type? Take this example:

expandError : [Io Str, Net Str] -> I64
expandError = \e ->
  when e is
    Io Str -> 1
    error ->
      # first we solve: e = [Io Str, Net Str]
      # then refinement: error_unexpanded = [Net Str], error_expanded = [Net Str]x
      # x is inferred to be [Io Str, Timeout Str]
      f error

f : [Io Str, Net Str, Timeout Str] -> I64

But if you remove the annotation, it would infer a different and incompatible type, right?

expandError = \e ->
  when e is
    Io Str -> 1
    error ->
      # first we solve: e = [Io Str]x
      # then refinement: error_unexpanded = []x, error_expanded = []x
      # x is inferred to be [Io Str, Net Str, Timeout Str]
      f error
# expandError is inferred to be [Io Str, Net Str, Timeout Str] -> I64

f : [Io Str, Net Str, Timeout Str] -> I64

Though I'm not sure if there even is a way to express the most general type of that function.

view this post on Zulip Ayaz Hafiz (Jun 17 2023 at 01:30):

that is a good example. I think you're right, it loses principality relative to control flow (since the principal type relative to control flow would be [Io Str] -> I64). But it doesn't lose principality relative to unification (the former function wouldn't type-check relative to unification). Maybe we can find a way to recover principality in general, relative to control flow, but I can't think of a way immediately

view this post on Zulip Ayaz Hafiz (Jun 17 2023 at 01:49):

actually, this function has no principal type right? All of [Io Str] -> I64, [Io Str, Net Str] -> I64, [Io Str, Net Str, Timeout Str] -> I64 are "equivalently principal"

view this post on Zulip Ayaz Hafiz (Jun 17 2023 at 01:49):

none of those types are compatible with any other

view this post on Zulip Ayaz Hafiz (Jun 17 2023 at 01:50):

so we infer "a" principal type, but "the" principal type doesn't exist..

view this post on Zulip Ajai Nelson (Jun 17 2023 at 01:50):

Yeah, I think so

view this post on Zulip Ajai Nelson (Jun 17 2023 at 02:50):

I think I've implemented the inference scheme you suggested, and it seems to be working: https://github.com/roc-lang/roc/pull/5559

view this post on Zulip Ajai Nelson (Jun 17 2023 at 02:51):

Though it's still pretty messy, and I still need to do the code gen part

view this post on Zulip Ajai Nelson (Jun 17 2023 at 03:01):

Do you know why the open_tag_union pattern match doesn't include EmptyTagUnion or FunctionOrTagUnion? https://github.com/roc-lang/roc/blob/0f327d9f38390070b8d774398813b8797897565a/crates/compiler/solve/src/solve.rs#L1910-L1943

view this post on Zulip Ayaz Hafiz (Jun 17 2023 at 03:34):

Missing FunctionOrTagUnion is probably an oversight. Missing EmptyTagUnion, I think it is an implicit assumption that EmptyTagUnion does not appear in non-extension-type positions, but that is probably an oversight too.

view this post on Zulip Ajai Nelson (Jun 18 2023 at 18:25):

Given the impact on type inference, do you think maybe we should discuss the proposal more before moving forward?

view this post on Zulip Ayaz Hafiz (Jun 18 2023 at 20:16):

Is the only concern w.r.t. the loss of a single principal type in some circumstances? If so, I personally don't think that's a big deal, as it does not materially impact expressivity, soundness, or correctness - and we retain principality relative to unification. That is, even though there is no principal type in that scenario, the three possible principal types are all incompatible. Moreover the inferred [Io Str, Net Str, Timeout Str]argument is in a sense the most "general" type - in the sense that it admits more instances of polymorphic types than either [Io Str] or [Io Str, Net Str] does. Also, there are other places in the language where a single principal type is undecidable (namely when ability usages are undecidable). All that to say I don't think this observation pokes any holes.

view this post on Zulip Richard Feldman (Jun 18 2023 at 20:17):

yeah I think that's fine too :thumbs_up:

view this post on Zulip Richard Feldman (Jun 18 2023 at 20:18):

the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"

view this post on Zulip Ajai Nelson (Jun 19 2023 at 01:04):

Richard Feldman said:

the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"

Here's an example where I think that might happen:

validate = \color ->
  when color is
    Black -> Err "not a color"
    c ->
      # first we solve: color = [Black]x
      # then refinement: c_unexpanded = []x, c_expanded = []x
      # x is inferred (because of isBluish) to be [Red, Blue, Teal, White]
      if isBluish color then
        Err "bluish colors not allowed"
      else
        Ok c
# validate is inferred to be [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White][] Str

isBluish : [Red, Blue, Teal, Black, White][] -> Bool

If we add a type annotation, validate can return an open type, which is more general:

validate : [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White]* Str
validate = \color ->
  when color is
    Black -> Err "not a color"
    c ->
      # first we solve: color = [Red, Blue, Teal, Black, White]
      # then refinement: c_unexpanded = [Red, Blue, Teal, White], error_expanded = [Red, Blue, Teal, White]x
      if isBluish color then
        Err "bluish colors not allowed"
      else
        Ok c

isBluish : [Red, Blue, Teal, Black, White][] -> Bool

I’m guessing there aren’t a lot of situations where this would be a problem, and it’s easy enough to just add a type annotation. But technically, I think it means adding an annotation can make the type more general.

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 01:23):

In this case, the latter program does not type-check. A closed tag union cannot become open in that position - unifying an open union with a closed one makes the union closed.

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 01:25):

More generally, the flow-based unification scheme can expand the number of material tags in a union (causing more than one principal type), but I don't think it can unduly close an open union, so we cannot lose generalization where it should be.

view this post on Zulip Ajai Nelson (Jun 19 2023 at 01:33):

unifying an open union with a closed one makes the union closed

I'm not sure if I understand. Are you referring to the unification of [Red, Blue, Teal, White]x (from the expansion) and [Red, Blue, Teal, White]* (from the type annotation)?

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 01:35):

oh, sorry, I misread your example as returning Ok color in the first case. My bad, okay, I see your point.. let me think about this

view this post on Zulip Ajai Nelson (Jun 19 2023 at 01:40):

Yeah, it's definitely pretty contrived that I used both color and c in that branch

view this post on Zulip Ajai Nelson (Jun 19 2023 at 01:41):

Though I think maybe this problem could still come up even if the expression that restricts color occurs after the entire when expression

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 01:47):

yeah that's true

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 02:31):

Thanks for bringing all this up. I think I've really internalized the issue now - which is that we have two possible kinds of branches in unification now, points of branches due to refinement control flow, and points of branches do to the order defs/uses of a variable are constrained in. If we fork in one, it might conflict with the other. With that in mind, some observations:

foo : [A, B] -> [A, B]
foo = \x -> x

will indeed be the identity and return [A, B][], whereas the definition

foo : {} -> [A, B]
foo = \{} -> A

returns [A, B]*. This was a deliberate decision for developer ergonomics reasons.

With all that in mind, I think our options here are, if we cannot find another one-pass solving system, are

Let me know if I'm missing anything here.

view this post on Zulip Ajai Nelson (Jun 19 2023 at 02:43):

What does principality "relative to unification" vs "relative to control flow" mean?

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 02:53):

My bad, i should have clarified. By "relative to unification" I mean, suppose our system is S. If we infer a function with our system S, do we get the same type that the system would if it only performed unification? (i.e. is the inferred type the same as the principal would be today) For "relative to control flow" I mean, is the type inferred by S the actually-principal type, considering the behavior of type refinement?

view this post on Zulip Ajai Nelson (Jun 19 2023 at 03:03):

Ok, that makes sense. Yeah, I'm pretty sure you're right that we still have principality relative to unification

view this post on Zulip Ajai Nelson (Jun 19 2023 at 03:11):

I think I agree with your comment above

view this post on Zulip Ajai Nelson (Jun 19 2023 at 03:33):

I haven't totally thought this through, but what if we did something like this:

I'm not sure whether this would work, especially if there are multiple whens. If it does, my intuition is that it would solve some (but maybe not all) of these problems. Though maybe it would make it harder to generate good error messages

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 03:46):

yeah, I think this could work, but it's not totally clear to me (1) how we localize type errors when they do appear, and (2) if there are cases where in the direct solution we would have inferred a type, but in this model there is a type error (which can happen if expanded is refined, I believe), how to recover the non-erroring type.

There is also the note that this would not resolve the case in which the condition type is further constrained after the when expression, and to resolve that, we need the two-pass system, and I am afraid that winds up with the downsides previously described.

view this post on Zulip Ajai Nelson (Jun 19 2023 at 04:15):

Yeah, that all makes sense. Probably better to continue with your original plan then

view this post on Zulip Richard Feldman (Jun 19 2023 at 12:06):

Ayaz Hafiz said:

Keep the proposal, preserving principality relative to unification (I think this is completely unbroken, but let me know if I'm wrong), and expect that this case never occurs in practice. I am strongly in favor of this option, because (1) it is clear that refinement is the experience many developers prefer, and (2) it is already the case that you cannot explicitly type an open tag union (e.g. [...]*) in a positive position without getting a warning, because the extension is inferred by default.

what if we made that annotation a type mismatch instead of a warning?

view this post on Zulip Richard Feldman (Jun 19 2023 at 12:06):

(I'm not saying we should or shouldn't, just wondering what the downsides would be)

view this post on Zulip Ayaz Hafiz (Jun 19 2023 at 14:19):

what are the upsides of that? The type annotation may not be wrong (and can catch a bug, sometimes), but in general it is redundant - hence the warning. But if you do foo : [A] -> [A]*; foo = \x -> x, the type annotation catches a bug in your program.

view this post on Zulip Richard Feldman (Jun 19 2023 at 15:12):

hm, fair point

view this post on Zulip Richard Feldman (Jun 19 2023 at 15:12):

but if we think that's genuinely valuable, we probably shouldn't make it a warning

view this post on Zulip Richard Feldman (Jun 19 2023 at 20:58):

regardless, I agree with the point that it's worth proceeding with Ayaz's proposed plan (regardless of whether we make the annotation a warning, an error, or nothing)


Last updated: Jul 06 2025 at 12:14 UTC