Stream: beginners

Topic: `bp_N>M>var` syntax in RFC 011?


view this post on Zulip osa1 (Jan 04 2025 at 10:50):

Hi @Ayaz Hafiz. Could you explain what N, M, and var are in bp_N>M>var syntax that you use in RFC 011?

view this post on Zulip osa1 (Jan 04 2025 at 11:05):

As far as I can see there are no examples of N and M being different numbers in the examples, so it's hard to tell why you have two numbers N and M and when they can be different.

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 16:53):

yes, apologies that this is a bit confusing. the idea is to distinguish between the specific branch being targeted, and the pattern within the branch that is being constrained. bp_N refers to the Nth branch being targeted. M>var refers to the branch pattern that should be matched.

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 16:59):

For example, take the case

x: [A [B, C], D [E]]

when x is
  A B -> D E
  D E -> A C
  A other -> A other

Our goal is constrain other down to [C], so we are looking at bp_3. The way we want to constrain is by eliminating whatever is in its type that appears in previous branches. What could appear in other branches is given by the pattern A other that occurs in the second branch, so call this 3>other (the pattern path in branch 3 that leads to other). Now other = [B, C] - bp_1>(3>other) - bp_2>(3>other). bp_1>(3>other) is [B], because that's where other would match the pattern. bp_2>(3>other) is [], because the third branch does not match the second. So we send up with other = [B, C] - [B] - [] = [C].

view this post on Zulip osa1 (Jan 04 2025 at 17:44):

Thanks, it makes more sense to me now.

Have you considered this alternative (or maybe it's the same as what you describe?) approach? When unifying the pattern types with the scrutinee type, you copy the scrutinee type, updating it with the fully covered variants (non-variant types are copied as-is). So in your example, when you check the second branch's pattern, you consider the scrutinee type not as [A [B, C], D [E]] but as [A [C], D [E]]. Then in the third branch, you consider it as [A [C]].

Is this the same as what you describe?

view this post on Zulip osa1 (Jan 04 2025 at 17:48):

The copying should share the unification variables, but copy everything else, updating variants based on fully covered variants in the previous branches.

view this post on Zulip osa1 (Jan 04 2025 at 17:56):

I wonder if this approach is a bit too eager, because you do refinements before generating all of the unification constraints.

view this post on Zulip osa1 (Jan 04 2025 at 18:16):

Do you know if there's prior work on this? I can't find anything.

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 18:19):

yes, i think it is equivalent. you probably need to solve at least the constraints in the body of all prior branches before updating the scrutinee. This still might be too eager, but I think the only other alternative is some fixpoint analysis (maybe there's a simpler method)

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 18:20):

i think you have to break unification variables when you update the scrutinee, but yeah copy the unification vars up to that

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 18:21):

i know that this is quite simple in a typesystem with subtyping - i don't have any papers in mind off the top of my head but kotlin, typescript, etc have done this

view this post on Zulip osa1 (Jan 04 2025 at 18:24):

Why is it simpler with subtyping?

view this post on Zulip Ayaz Hafiz (Jan 04 2025 at 18:32):

you can model this as intersections which are often part of the basic operations of the type system, so you don't need to materialize the types partway

view this post on Zulip osa1 (Jan 05 2025 at 08:39):

Have you considered handling cases where the refinement cannot be done based on the scrutinee type? E.g.

type Err:
    Err([A, B, C])
    Ok(I32)

match x:
    Err('A): ...
    Err(other): ...     # other : [B, C]
    Ok(x): ...

Here [A, B, C] is not a type argument to the scrutinee (x) type, so it can't be refined.

view this post on Zulip osa1 (Jan 05 2025 at 14:37):

I ended up implementing this without updating the scrutinee type as described above: https://github.com/fir-lang/fir/pull/39.

What I did was: for a branch, first unify the pattern type with the scrutinee type as usual, but before type checking the right-hand side, refine binders in the pattern based on coverage information.

Here are some examples of what this can handle: https://github.com/fir-lang/fir/blob/main/tests/Rows3.fir

view this post on Zulip Ayaz Hafiz (Jan 05 2025 at 16:26):

ah very nice. i guess the coverage of the pattern indeed gives you all you need

view this post on Zulip osa1 (Jan 08 2025 at 10:00):

FYI you may want to review https://langdev.stackexchange.com/questions/4239/flow-sensitive-type-refinement-of-polymorphic-variants before implementing this.

view this post on Zulip Richard Feldman (Jan 08 2025 at 14:17):

wow that is the best StackOverflow answer to an advanced question I've ever seen in my entire life :astonished:

view this post on Zulip Richard Feldman (Jan 08 2025 at 14:17):

Alexis King is awesome

view this post on Zulip Anton (Jan 08 2025 at 14:29):

I didn't know about langdev.stackexchange.com , that's useful

view this post on Zulip osa1 (Jan 20 2025 at 08:58):

I discovered an interesting case, where refining a pattern binding causes a type error. Without refinement there's no type error.

test(x: [A, ..r]): [A, ..r]
     match x:
         ~A: ~A
         other: other

Here the RHS type of the first branch is [A, ..x1] where x1 is a fresh unification variable.

In the second branch, normally the type would be [A, ..r] and RHS types of these two branches would unify.

But with refinement, we get [..r], which doesn't unify with [A, ..x1] (type of first RHS) as r is a rigid type variable (coming from the type signature), not a unification variable.

Not sure how to approach this..

@Ayaz Hafiz have you considered this case before?

view this post on Zulip osa1 (Jan 20 2025 at 09:55):

I feel like this is where "absent" fields in variant and record types will come into play.

E.g. we refine other as [..r] \ A ("a record with rows r, not including label A")

Then allow unification [..r] \ A ~ [A, ..x1] (which unifies r ~ x1).

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 13:58):

yes.. one approach is to avoid distinguishing rigid and unification variables until after generalization

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 13:59):

i assume your definition of rigid vs unification variable is rigid can be instantiated, unification can be instantiated with at most one type until generalization?

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 14:00):

if you do this, then the type is x: [A, ..r'] -> [A, ..r'] for fresh unification var r' and I believe it solves correctly. Then after solving the body of test, generalize r' to r

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 14:03):

One thing to be careful about with this approach though is whether you would like to support polymorphic recursion. If polymorphic recursion is permitted, then I think you must take the set-theoretic approach you describe

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 14:04):

imo set theoretic models are better in general, just much harder to implement

view this post on Zulip osa1 (Jan 20 2025 at 14:22):

My definition of rigid type variable is when you have a type parameter in a type signature, that type variable becomes "rigid" when checking the body of the function, i.e. it only unifies with itself.

So in my example r is rigid: test(x: [A, ..r]): [A, ..r] = .... In this identity function t is rigid: id(x: t): t = x

view this post on Zulip osa1 (Jan 20 2025 at 14:24):

I don't care too much about poly. recursion.. but I'm curious what's the difference between the set theoretic approach from non- set theoretic one?

view this post on Zulip osa1 (Jan 20 2025 at 14:30):

I think you are right that if I make r a unification variable and then generalize at the end, I would get the same type as in the signature so it would work.

view this post on Zulip osa1 (Jan 20 2025 at 14:47):

Then allow unification [..r] \ A ~ [A, ..x1] (which unifies r ~ x1).

I'm actually not sure if this is possible. The type [..r] \ A doesn't have any unification variables (r is a rigid variable) so not sure how to make it equal to [..r] again.

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 15:17):

I see - yes then i think making it a unification var would work, as long as you preserve the link (eg all instances of r become a unification var pointing at the same thing)

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 15:19):

W.r.t set theoretic types, i mean treating types as sets of values rather than the typical type-theoretic approach - the type-theoretic approach makes unification easy but distinguishing based on values harder. For most cases the difference is not so severe, like in this case, but with intersections, negations, etc it can make a big difference in my experience. I think Castagna is the leading expert on this topic, https://www.irif.fr/~gc/papers/set-theoretic-types-2022.pdf is a good paper of his i like

view this post on Zulip Anthony Bullard (Jan 20 2025 at 15:24):

I’m interested in how the Elixir project of implementing set-theoretic types , especially for gradual typing, goes.

view this post on Zulip Anthony Bullard (Jan 20 2025 at 15:28):

Giuseppe is the advisor to the project, so I'm pretty sure this is the exact paper that is being implemented here

view this post on Zulip Anthony Bullard (Jan 20 2025 at 15:28):

Here's a good talk from the recent ElixirConf about what they are doing: https://www.youtube.com/watch?v=Bf7dp0Yj8Po

view this post on Zulip osa1 (Jan 20 2025 at 15:31):

I just found another repro of the issue that doesn't use refinements:

test(x: [..errs]): [E1, ..errs]
    if Bool.True:
        x
    else:
        ~E1

This can't be type checked if we consider type of x as [..errs] where errs is rigid (i.e. can't be unified with anything else).

So I think it's clear that I need something like what @Ayaz Hafiz suggested.

view this post on Zulip osa1 (Jan 20 2025 at 20:37):

Thinking about this more, I don't think creating unification variables for type variables in the signature and then generalizing and compring the generalized type with the signature can type check this.

Here's how the type checking goes:

I think this function's type signature is not right.

view this post on Zulip osa1 (Jan 20 2025 at 20:53):

Unfortunately this takes us back to my original question with the refinement.

test(x: [A, ..r]): [A, ..r]
     match x:
         ~A: ~A
         other: other

Here if I refine type of other as [..r], I can't unify it with [A, ..r2] and get the argument or return type in the signature.

view this post on Zulip osa1 (Jan 20 2025 at 21:08):

This seems relevant: https://drops.dagstuhl.de/storage/00lipics/lipics-vol263-ecoop2023/LIPIcs.ECOOP.2023.17/LIPIcs.ECOOP.2023.17.pdf

view this post on Zulip osa1 (Jan 20 2025 at 21:33):

I think we just accept this program as ill-typed. If we don't want refinement we can use the argument:

test(x: [A, ..r]): [A, ..r]
     match x:
         ~A: ~A
         _: x

which type checks.

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 21:40):

yes i think that program has to be ill-typed

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 21:41):

to do anything else you need something other than unification (ie distinguish the input parameter typevar from the output value var)

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 21:42):

i think it is sufficient to check whether any user-annotated rigid variables after generalization can be unified (without instantiation) to the inferred generalized types

view this post on Zulip Ayaz Hafiz (Jan 20 2025 at 21:42):

there is probably a simpler way though

view this post on Zulip osa1 (Jan 20 2025 at 21:47):

Now my problem with this system is that it seems a bit counterintuitive. I implemented it, wrote tests, wrote a blog post about it, but I still find cases that I expect to work but doesn't, then debug it for a few hours, only to realize that it's working as expected and my program should be tweaked. Users will have the same experience, or worse depending on how well they understand rows.

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:33):

yeah

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:33):

this is why unification is not ideal for this

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:34):

my general take is that subtyping is almost always superior

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:34):

especially in the preference of inference

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:34):

but it's hard

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 01:35):

undecidability of semi-unification etc etc (but possible in a restricted context)

view this post on Zulip osa1 (Jan 21 2025 at 07:48):

If you impolement this in Roc are you going to use rows or subtyping?

I don't want to introduce subtyping in my system because (1) I have no experience in implementing subtyping (2) I also don't have any experience in using subtyping outside of OOP langs so I can't predict the interactions it's going to have with the other features.

For now, I'm thinking maybe I can adjust the user type signatures under the hood to make this work like subtyping. For example, if user has [A, ..r] in argument position, maybe I can replace it with [A, B, ..r] somehow to make the types align.

view this post on Zulip osa1 (Jan 21 2025 at 08:18):

So far I found two things that users should be aware of and understand:

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:24):

probably rows but i do think they are the imperfect model for this, for example exactly because the rows leak into userspace and are kind of confusing to users (eg ocaml's [> ...] vs [< ...])

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:29):

Sorry, i think i missed something - is there a reason this doesn't work? Or are there other patterns you are referring to that don't work with your model

test(x: [A, ..r]): [B, ..r]
  match x:
    ~A: ~B
    other: other

view this post on Zulip osa1 (Jan 21 2025 at 14:29):

Ayaz Hafiz said:

probably rows but i do think they are the imperfect model for this, for example exactly because the rows leak into userspace and are kind of confusing to users (eg ocaml's [> ...] vs [< ...])

Yeah, but subtyping exposes users to covariance which is also confusing.. I'm also not sure if subtyping based approaches can be as flexible, as upcasting means losing information.

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:30):

yeah i agree

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:31):

all these models are kind of confusing haha. this is me coming from the perspective of ocaml variants and roc tags (esp. polymorphic versions) seeming to be much more confusing than any typescript types in the common case for people i've talked to about it

view this post on Zulip osa1 (Jan 21 2025 at 14:37):

Sorry, i think i missed something - is there a reason this doesn't work?

The problem is the return types of branches don't unify:

These two don't unify because you can't unify r with anything as it's coming from the type signature (I call this "rigid type variable"), so you can't add B to this record.

Am I doing it wrong?

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:40):

ah i see, and you found that instantiation doesn't work

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 14:40):

yeah that is unfortunate

view this post on Zulip osa1 (Jan 21 2025 at 14:54):

Ayaz Hafiz said:

ah i see, and you found that instantiation doesn't work

To be clear it works, but it infers [A, B, ..r] -> [B, ..r] instead of [A, ..r] -> [B, ..r]. Which is fine (I think the types are identical in terms of how you would use them), but intuitively not what one might expect.

How the type checking would go with instantiation:

So the type before generalization becomes [A, B, ..x3] -> [B, ..x3].

view this post on Zulip Ayaz Hafiz (Jan 21 2025 at 15:41):

yeah that's tough


Last updated: Jul 06 2025 at 12:14 UTC