@Ayaz Hafiz I think I'd be interested in helping out with implementing refinement of tag unions if I can. I haven't spent much time looking at the type system implementation yet, but I was hoping to do that soon anyway!
Also, a tiny thing in the proposal: I'm guessing the f : [User, Admin, SuperAdmin]
signatures are missing a return type (or should be t : [User, Admin, SuperAdmin]
).
sweet! you're totally right, apologies for the typo, I've went ahead and fixed it. Please let us know if you have any questions about the description in the proposal/implementation.
I would appreciate some help getting started with this.
My initial thought was that I could start with the type expansion part, since implementing the refinement without that could break working programs.
expandError : [Io Str] -> [Io Str, Net Str]
expandError = \e -> when e is
error -> error
My first idea was that for each identifier introduced in a pattern, I would create a new type variable to represent the possibly expanded version of it. So in this case, the goal is for error_unexpanded
to be [Io Str]
and error_expanded
to be [Io Str, Net Str]
. References to error
on the right side of the arrow would actually refer to error_expanded
. My idea was that I would use unify()
with Mode::PRESENT
in order to ensure all the tags in the unexpanded variable were present in the expanded variable.
So I tried that. But it turns out Mode::PRESENT
isn't as magical as I naively assumed. It doesn't seem to continuously enforce a relationship between error_unexpanded
and error_expanded
; it just adds the tags in the error_unexpanded
type to error_expanded
at the point when unify()
is called. (In this case, I think it just decided error_unexpanded
and error_expanded
were equal.)
So all that's to say, my initial idea didn't work out, and I'm not really sure how to approach this.
I wonder if you can perform the inference scheme the following way:
error
on the left-hand side (LHS) of the pattern-match error - as it already happens today. Then, as you describe, create a new type variable error_expanded
based on the solved type of error in the LHS.error_expanded
proceeds in the following way: error_expanded = clone(error_unexpanded)
(you don't want to unify them, since they are not equivalent).error_expanded
is a tag union, it is "opened" (an unbound variable is added as its extension type, see the open_tag_union
function).error_expanded
should look like [Io Str]<fresh_ext>
where fresh_ext
is unbound.error
to error_expanded
and then solving the RHS of the branches will then instantiate fresh_ext
to [Net Str]
, yielding the expanded type you want.Would that still work if the RHS imposed more restrictions on e
? For example
expandError = \e ->
_ = f e
# Now we know the type of e contains [A].
when e is
error ->
# error_expanded is created before solving the RHS.
_ = g e
# Now we know that the type of e also contains [B].
# But that restriction doesn't apply to error_expanded since the clone already happened.
# So the following doesn't produce a type error, but it should!
_ = h error
error
f : [A]* -> {}
g : [B]* -> {}
h : [A] -> {}
I believe that if the type of the branch pattern is non-exhaustive, then the extension variable of the branch pattern (if it exists) should be shared with the expanded type on the RHS. So in your example above we would have
expandError = \e ->
_ = f e # e = [A]x
when e is
error -> # error_unexpanded = [A]x , error_expanded = [A]x
_ = g e # [A]x ~ [B]y => error_unexpanded = [A, B]y, error_expanded = [A, B]y
_ = h error # => type error
error
then in a case where refinement happens, like
expandError : [A, B, C]a -> {}
expandError = \e ->
when e is
A -> {}
error ->
# first we solve : e = [A, B, C]a
# then refinement: error_unexpanded = [B, C]a , error_expanded = [B, C]a
{}
this is still okay. So the extension variable is only freshened if we can't find an extension variable for the unexpanded type.
expandError : [A, B, C] -> {}
expandError = \e ->
when e is
A -> {}
error ->
# first we solve : e = [A, B, C]
# then refinement: error_unexpanded = [B, C] , error_expanded = [B, C]x
# at this point only `x` can grow, so this is also sound
{}
Ok, I think I'm understanding better now. So effectively, that means the expansion only happens if we already know that the unexpanded tag union is closed (i.e., its extension variable is []
).
Also, does this mean we can't always infer the most general type? Take this example:
expandError : [Io Str, Net Str] -> I64
expandError = \e ->
when e is
Io Str -> 1
error ->
# first we solve: e = [Io Str, Net Str]
# then refinement: error_unexpanded = [Net Str], error_expanded = [Net Str]x
# x is inferred to be [Io Str, Timeout Str]
f error
f : [Io Str, Net Str, Timeout Str] -> I64
But if you remove the annotation, it would infer a different and incompatible type, right?
expandError = \e ->
when e is
Io Str -> 1
error ->
# first we solve: e = [Io Str]x
# then refinement: error_unexpanded = []x, error_expanded = []x
# x is inferred to be [Io Str, Net Str, Timeout Str]
f error
# expandError is inferred to be [Io Str, Net Str, Timeout Str] -> I64
f : [Io Str, Net Str, Timeout Str] -> I64
Though I'm not sure if there even is a way to express the most general type of that function.
that is a good example. I think you're right, it loses principality relative to control flow (since the principal type relative to control flow would be [Io Str] -> I64
). But it doesn't lose principality relative to unification (the former function wouldn't type-check relative to unification). Maybe we can find a way to recover principality in general, relative to control flow, but I can't think of a way immediately
actually, this function has no principal type right? All of [Io Str] -> I64
, [Io Str, Net Str] -> I64
, [Io Str, Net Str, Timeout Str] -> I64
are "equivalently principal"
none of those types are compatible with any other
so we infer "a" principal type, but "the" principal type doesn't exist..
Yeah, I think so
I think I've implemented the inference scheme you suggested, and it seems to be working: https://github.com/roc-lang/roc/pull/5559
Though it's still pretty messy, and I still need to do the code gen part
Do you know why the open_tag_union
pattern match doesn't include EmptyTagUnion
or FunctionOrTagUnion
? https://github.com/roc-lang/roc/blob/0f327d9f38390070b8d774398813b8797897565a/crates/compiler/solve/src/solve.rs#L1910-L1943
Missing FunctionOrTagUnion
is probably an oversight. Missing EmptyTagUnion, I think it is an implicit assumption that EmptyTagUnion does not appear in non-extension-type positions, but that is probably an oversight too.
Given the impact on type inference, do you think maybe we should discuss the proposal more before moving forward?
Is the only concern w.r.t. the loss of a single principal type in some circumstances? If so, I personally don't think that's a big deal, as it does not materially impact expressivity, soundness, or correctness - and we retain principality relative to unification. That is, even though there is no principal type in that scenario, the three possible principal types are all incompatible. Moreover the inferred [Io Str, Net Str, Timeout Str]
argument is in a sense the most "general" type - in the sense that it admits more instances of polymorphic types than either [Io Str]
or [Io Str, Net Str]
does. Also, there are other places in the language where a single principal type is undecidable (namely when ability usages are undecidable). All that to say I don't think this observation pokes any holes.
yeah I think that's fine too :thumbs_up:
the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"
Richard Feldman said:
the thing I care about is "you couldn't add a type annotation to make it more general than the type we inferred"
Here's an example where I think that might happen:
validate = \color ->
when color is
Black -> Err "not a color"
c ->
# first we solve: color = [Black]x
# then refinement: c_unexpanded = []x, c_expanded = []x
# x is inferred (because of isBluish) to be [Red, Blue, Teal, White]
if isBluish color then
Err "bluish colors not allowed"
else
Ok c
# validate is inferred to be [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White][] Str
isBluish : [Red, Blue, Teal, Black, White][] -> Bool
If we add a type annotation, validate
can return an open type, which is more general:
validate : [Red, Blue, Teal, Black, White][] -> Result [Red, Blue, Teal, White]* Str
validate = \color ->
when color is
Black -> Err "not a color"
c ->
# first we solve: color = [Red, Blue, Teal, Black, White]
# then refinement: c_unexpanded = [Red, Blue, Teal, White], error_expanded = [Red, Blue, Teal, White]x
if isBluish color then
Err "bluish colors not allowed"
else
Ok c
isBluish : [Red, Blue, Teal, Black, White][] -> Bool
I’m guessing there aren’t a lot of situations where this would be a problem, and it’s easy enough to just add a type annotation. But technically, I think it means adding an annotation can make the type more general.
In this case, the latter program does not type-check. A closed tag union cannot become open in that position - unifying an open union with a closed one makes the union closed.
More generally, the flow-based unification scheme can expand the number of material tags in a union (causing more than one principal type), but I don't think it can unduly close an open union, so we cannot lose generalization where it should be.
unifying an open union with a closed one makes the union closed
I'm not sure if I understand. Are you referring to the unification of [Red, Blue, Teal, White]x
(from the expansion) and [Red, Blue, Teal, White]*
(from the type annotation)?
oh, sorry, I misread your example as returning Ok color
in the first case. My bad, okay, I see your point.. let me think about this
Yeah, it's definitely pretty contrived that I used both color
and c
in that branch
Though I think maybe this problem could still come up even if the expression that restricts color
occurs after the entire when
expression
yeah that's true
Thanks for bringing all this up. I think I've really internalized the issue now - which is that we have two possible kinds of branches in unification now, points of branches due to refinement control flow, and points of branches do to the order defs/uses of a variable are constrained in. If we fork in one, it might conflict with the other. With that in mind, some observations:
cond
with a branch pat -> ..
. We say that typeof(cond) = typeof(pat) = pat_unexpanded = []x
, and then create a pat_expanded = [](x + y)
, where x
is the extension variable used when cond
is referenced, and y
is used when pat
is expanded. This has soundness issues and immediately breaks down, so it's not viable, but maybe there is some derivative design that works.{} -> [A, B]
, this is parsed as {} -> [A, B]<x>
where x
is unbound. The means a definitionfoo : [A, B] -> [A, B]
foo = \x -> x
will indeed be the identity and return [A, B][]
, whereas the definition
foo : {} -> [A, B]
foo = \{} -> A
returns [A, B]*
. This was a deliberate decision for developer ergonomics reasons.
With all that in mind, I think our options here are, if we cannot find another one-pass solving system, are
[...]*
) in a positive position without getting a warning, because the extension is inferred by default.Let me know if I'm missing anything here.
What does principality "relative to unification" vs "relative to control flow" mean?
My bad, i should have clarified. By "relative to unification" I mean, suppose our system is S. If we infer a function with our system S, do we get the same type that the system would if it only performed unification? (i.e. is the inferred type the same as the principal would be today) For "relative to control flow" I mean, is the type inferred by S the actually-principal type, considering the behavior of type refinement?
Ok, that makes sense. Yeah, I'm pretty sure you're right that we still have principality relative to unification
I think I agree with your comment above
I haven't totally thought this through, but what if we did something like this:
expanded
is not closed, orunexpanded
are in expanded
I'm not sure whether this would work, especially if there are multiple when
s. If it does, my intuition is that it would solve some (but maybe not all) of these problems. Though maybe it would make it harder to generate good error messages
yeah, I think this could work, but it's not totally clear to me (1) how we localize type errors when they do appear, and (2) if there are cases where in the direct solution we would have inferred a type, but in this model there is a type error (which can happen if expanded is refined, I believe), how to recover the non-erroring type.
There is also the note that this would not resolve the case in which the condition type is further constrained after the when
expression, and to resolve that, we need the two-pass system, and I am afraid that winds up with the downsides previously described.
Yeah, that all makes sense. Probably better to continue with your original plan then
Ayaz Hafiz said:
Keep the proposal, preserving principality relative to unification (I think this is completely unbroken, but let me know if I'm wrong), and expect that this case never occurs in practice. I am strongly in favor of this option, because (1) it is clear that refinement is the experience many developers prefer, and (2) it is already the case that you cannot explicitly type an open tag union (e.g.
[...]*
) in a positive position without getting a warning, because the extension is inferred by default.
what if we made that annotation a type mismatch instead of a warning?
(I'm not saying we should or shouldn't, just wondering what the downsides would be)
what are the upsides of that? The type annotation may not be wrong (and can catch a bug, sometimes), but in general it is redundant - hence the warning. But if you do foo : [A] -> [A]*; foo = \x -> x
, the type annotation catches a bug in your program.
hm, fair point
but if we think that's genuinely valuable, we probably shouldn't make it a warning
regardless, I agree with the point that it's worth proceeding with Ayaz's proposed plan (regardless of whether we make the annotation a warning, an error, or nothing)
Last updated: Jul 06 2025 at 12:14 UTC