Allowing implicit cycles in the type representation · compiler development

Stream: compiler development

Topic: Allowing implicit cycles in the type representation

Ayaz Hafiz (Nov 12 2023 at 18:29):

I don't know if this would fix some of our woes with recursive types or not, but I've been thinking about our representation, which tries to convert implicit type cycles into explicit type cycles (ie a transformation from equirecursive to isorecursive types in the literature). Dealing with isorecursive types in printing types/generating code is super useful, but I also wonder if the source of most of our problems is because of that transformation

Ayaz Hafiz (Nov 12 2023 at 18:31):

What we could do instead is go full-in on the implicit cyclic representations. So, we would get rid of any explicit occurs checks, and always unify types when we begin to compare them (this is safe because our checker is non-backtracking), rather than after the point we determine that they are equivalent (as we currently do)

Ayaz Hafiz (Nov 12 2023 at 18:31):

Printing types becomes more annoying, but right now if we miss a recursion point the printer stack overflows anyway

Ayaz Hafiz (Nov 12 2023 at 18:33):

Code-generation of layouts also becomes more difficult, but the layout-interner operates on type variables anyway, so it should automatically cache layouts of the implicit recursion points (though, at that point, we would explicitly need to insert a recursion pointer in the memory layout, so this is a bit tricky)

Ayaz Hafiz (Nov 12 2023 at 18:33):

If this works https://roc.zulipchat.com/#narrow/stream/395097-compiler-development/topic/Recursive.20Types.20with.20List would also trivially work

Ayaz Hafiz (Nov 12 2023 at 18:34):

Might be an interesting thing to try at some point

Richard Feldman (Nov 12 2023 at 19:08):

wouldn't that also speed up type checking if it worked?

Ayaz Hafiz (Nov 12 2023 at 21:22):

possibly yeah

Richard Feldman (Nov 12 2023 at 21:24):

yeah this sounds really interesting

Richard Feldman (Nov 12 2023 at 21:43):

if nothing else, it sounds like a really nice simplification of the type checker

Richard Feldman (Nov 12 2023 at 21:45):

and printing types would get more complex, but that's not a troublesome area of the code base anyway...seems like a great trade to make that part of the code base more complex in order to make the really tricky part simpler

Ayaz Hafiz (Nov 12 2023 at 22:04):

i don’t actually know why there’s a recursive type representation/occurs check in the current impl. is that just a holdover from the initial elm port?

Richard Feldman (Nov 12 2023 at 22:07):

that's definitely where we originally got it! :big_smile:

Richard Feldman (Nov 13 2023 at 01:14):

honestly this sounds to me like a better thing to try next than boxing closures

Richard Feldman (Nov 13 2023 at 01:15):

because it might mean we don't end up needing to do that after all, but even if not—if it works, I suspect it'll improve both the code base and performance anyway, so we'll be glad to have it and can then proceed boxing closures anyway

Ayaz Hafiz (Nov 13 2023 at 03:03):

well, I don't actually know if this works, it's just a thought. And we have a few other problems with lambda sets that are unrelated to recursive types specifically. So I'm not sure - if someone has the opportunity to spike it out, definitely useful though

Ayaz Hafiz (Nov 21 2023 at 05:09):

I really want to figure out how to make this work. Here's a request for help. Consider we have the following function

mapper = \l, f -> when l is
  Nil -> Nil
  Cons x xs -> Cons (f x) (mapper xs f)

It's clear that the inferred type for l is <t> = [Nil, Cons a <t>]. Now, the question is, can we find an algorithm that can efficiently "see" that <t> is recursive during or right after inference?

As is it, the unification would proceed as

l = tvar_0 (tvar = type variable)
l ~ Nil = [Nil]a
l ~ Cons x xs = [Nil, Cons a b]c; xs has type c
mapper xs f => typeof<xs> = typeof<l> => c = typeof<l>
=> typeof<l> = [Nil, Cons a b]<typeof l>

But we don't yet explicitly see that <typeof l> is recursive in itself. The obvious solution is to perform an occurs check at this point, but this is not efficient.

Can we do this efficiently?

Richard Feldman (Nov 21 2023 at 05:15):

some ideas from GPT-4:

Structure Sharing in Unification: By employing structure sharing in the unification algorithm, you can avoid redundant traversals of type structures. When two type variables unify, they can share the same structure, reducing the need for repeated checks.
Efficient Occurs Check: Implementing an efficient occurs check can significantly improve performance. This could involve optimizations such as:
Path Compression: During unification, flatten the structure of type variables to minimize the depth of the tree.
Tail Recursion Optimization: Make the occurs check tail-recursive if possible, to reduce stack usage.
Incremental Unification: Instead of performing a full unification pass at every step, you can incrementally update the type information as you traverse the structure. This reduces the overall computational load.
Caching Results: Cache intermediate results during unification and type checking. If you encounter a type structure you've already processed, you can reuse the cached results instead of recomputing.
Detecting Recursion Early: Analyze the structure of the function and its parameters to identify potential recursion early in the process. For instance, in your example, the recursive call to mapper within itself is a strong hint of recursive types.
Graph-based Representations: Representing types as graphs rather than as tree structures can make it easier to detect cycles (indicative of recursion) efficiently.
Lazy Evaluation: In some cases, using lazy evaluation strategies in your type inference algorithm can defer processing until necessary, reducing immediate computational overhead.

Richard Feldman (Nov 21 2023 at 05:16):

"Efficient Occurs Check" - thanks, just make it efficient, good call :stuck_out_tongue:

Richard Feldman (Nov 21 2023 at 05:19):

hm, here's an idea: can we potentially benefit from knowing that there's a restricted set of places where recursion can validly occur?

Richard Feldman (Nov 21 2023 at 05:20):

for example, what if we just did an occurs check on the ext var of the tag union at the end there?

Richard Feldman (Nov 21 2023 at 05:20):

eh I guess it could just as easily be in one of the tags

Ayaz Hafiz (Nov 21 2023 at 05:21):

yeah that's what we do today

Ayaz Hafiz (Nov 21 2023 at 05:22):

Im just trying to figure out if there's a better way

Ayaz Hafiz (Nov 21 2023 at 05:22):

I realized we have to have at least a flag that tells us if a type is recursive, during unification. otherwise there's just no way to reliably generate code that needs to be boxed in certain places

Ayaz Hafiz (Nov 21 2023 at 05:23):

but figuring out where to place the flag is super hard

Richard Feldman (Nov 21 2023 at 05:23):

like when to try to detect it you mean?

Ayaz Hafiz (Nov 21 2023 at 05:25):

yes

Richard Feldman (Nov 21 2023 at 05:38):

not a direct answer, but I could ask Stephanie Weirich if she could introduce us to someone who might have more experience with the problem (e.g. OCaml devs)

Richard Feldman (Nov 22 2023 at 02:05):

ok I want to try to lay out some thoughts on this one, maybe it will help

Richard Feldman (Nov 22 2023 at 02:06):

the goal is to have some way of saying (I guess on Content?) whether a given type is known to be recursive, and we want to set that during unification, such that by the time solving is done we just have that information going into mono

Richard Feldman (Nov 22 2023 at 02:07):

the occurs strategy would be to, at certain points, walk through all the different variables in the type and see if any of them refer to the original type itself, and then recursively do that for all the variables until we either run out of things to recurse on or detect that the original type refers to itself.

this works, but is slow.

Richard Feldman (Nov 22 2023 at 02:10):

another idea would be to track the set of all the variables each variable references, including transitively, as we go

Richard Feldman (Nov 22 2023 at 02:11):

but that also sounds slow (maybe even slower)

Ayaz Hafiz (Nov 22 2023 at 02:16):

yeah

Ayaz Hafiz (Nov 22 2023 at 02:20):

I think it's helpful to work out of this example, just based on first principles.

mapper = \l, f -> when l is
  Nil -> Nil
  Cons x xs -> Cons (f x) (mapper xs f)

In more detail, let's consider what it takes to infer the type of l here.

1. `l` is introduced. Give it some type variable `v1`.
2. We encounter `Nil` and `Cons x xs` matched against `l`. Suppose typeof<Nil> = [Nil] and typeof<Cons x xs> = [Cons v2 v3]. Then we set v1 = [Nil, Cons v2 v3]
3. We encounter `mapper xs f`. This induces the constraint `v3 = v1`. So now we have v1 = [Nil, Cons v2 v1].

This type is correct but what we'd like is to note at that point that we should store somewhere (e.g. a lookaside table) that v1 is recursive in itself

Ayaz Hafiz (Nov 22 2023 at 02:21):

For what it's worth,

the occurs strategy would be to, at certain points, walk through all the different variables in the type and see if any of them refer to the original type itself, and then recursively do that for all the variables until we either run out of things to recurse on or detect that the original type refers to itself.

Is what we currently do, but only for tag unions. And it's half baked. So we could add it to more places but i'm not sure that scales. For example, we 'd need to also add it to be able to infer something like X : (Str, List X) as recursive properly. And I'm sure there are more cases like this.

Richard Feldman (Nov 22 2023 at 02:22):

we currently do that during unification?

Ayaz Hafiz (Nov 22 2023 at 02:22):

yes

Ayaz Hafiz (Nov 22 2023 at 02:23):

for tags only

Richard Feldman (Nov 22 2023 at 02:27):

Ayaz Hafiz said:

I realized we have to have at least a flag that tells us if a type is recursive, during unification. otherwise there's just no way to reliably generate code that needs to be boxed in certain places

just to check my understanding: does this mean the original idea at the top of this thread can't work?

Ayaz Hafiz (Nov 22 2023 at 02:34):

for Roc's compilation strategy, i don't think it can

Ayaz Hafiz (Nov 22 2023 at 02:35):

although i just had an idea. maybe it only matters to mark the recursion points during specialization (monomorphization)

Ayaz Hafiz (Nov 22 2023 at 02:35):

since at that point you must traverse the full types anyway

Ayaz Hafiz (Nov 22 2023 at 02:36):

the problem is you can end up with duplicate specializations, potentially exponentially many

Ayaz Hafiz (Nov 22 2023 at 02:36):

but maybe that's not a problem in practice?

Richard Feldman (Nov 22 2023 at 02:37):

what if we actually mutated the subs during specialization when doing that?

Richard Feldman (Nov 22 2023 at 02:40):

like we're about to traverse a type, we see that we've never checked whether it's recursive or not, so we check that during the current specialization and then mark it in the subs so future specializations don't need to check again

Richard Feldman (Nov 22 2023 at 02:41):

Ayaz Hafiz said:

the problem is you can end up with duplicate specializations, potentially exponentially many

hm, why would checking for recursion during specialization cause more specializations? :thinking:

Ayaz Hafiz (Nov 22 2023 at 02:53):

Because consider for example we have that mapper function, but it's a more complex type [Cons a {x: b}, Nil], and we need to specialize it twice. The first time we specialize it, it's because it's called with a type b = [Cons a {x : b}, Nil]. But the second time we specialize it, it's because it's called with a type that had the determined recursion point in a different place, like c = [Cons a (c = {x: b}), Nil]. Since the specialization of mapper is determined by the caller, and the callers can be disjoint from each other, we can't adjust this.

This is the advantage of finding the recursion points up-front. If we found the recursion point in the generalized mapper definition, then each specialization would also inherit that recursion point during unification.

Richard Feldman (Nov 22 2023 at 02:53):

also, don't we need to check for this during unification for error reporting?

e.g. X : { y : X } needs to report an error at compile time, and I'm not sure how we could know to do that without doing the work of answering the question of whether it's a (valid) recursive type

Richard Feldman (Nov 22 2023 at 02:54):

I guess that's just limited to type alias declarations though

Richard Feldman (Nov 22 2023 at 02:54):

and opaque type definitions

Ayaz Hafiz (Nov 22 2023 at 02:55):

yeah. type definitions are the easy version, cuz we can always add the recursion marker at that time

Ayaz Hafiz (Nov 22 2023 at 02:55):

nominal types make it easy

Ayaz Hafiz (Nov 22 2023 at 02:55):

nominal even in the sense of aliases. "Explicitly typed"

Ayaz Hafiz (Nov 22 2023 at 02:56):

because they tell you where the recursion (and hence where you should compile as boxed) should happen

Richard Feldman (Nov 22 2023 at 03:00):

so going back to your example of calling with two specializations - what I was suggesting earlier is that the first specialization would essentially do what the occurs check during unification would do

Richard Feldman (Nov 22 2023 at 03:01):

such that by the end of the specialization we'd have a generalized answer

Richard Feldman (Nov 22 2023 at 03:02):

hm but I guess that doesn't work because the specialization wouldn't have access to the original variables during specialization :thinking:

Ayaz Hafiz (Nov 22 2023 at 03:03):

What if the first specialization makes it so that [Cons a {x: b}, Nil] is specialized to [Cons Int {x: self}, Nil]? You need to roll the changes back, and now you've lost the markers right?

Richard Feldman (Nov 22 2023 at 21:43):

I wonder how OCaml does this

Richard Feldman (Nov 22 2023 at 21:44):

because no matter how often people use polymorphic variants, having them in the language at all means you have to do this analysis

Richard Feldman (Nov 22 2023 at 21:44):

and OCaml's compiler has a good reputation for performance, so either occurs is not that expensive in practice (although that would surprise me) or else maybe they found something else that works

Elias Mulhall (Nov 22 2023 at 22:39):

FWIW I think polymorphic variants are usually used pretty sparingly in OCaml code. The general wisdom is that polymorphic variants produce less efficient compiled code. I haven't heard anything about impacting compile times, but there might be a similar philosophy where you're trading speed for flexibility.

Richard Feldman (Nov 22 2023 at 22:57):

yeah the thing is that just because they exist in the language at all, this analysis has to happen

Richard Feldman (Nov 22 2023 at 22:58):

how often they're used only affects how often the analysis comes back with the answer being yes vs no, but it's the same amount of work to get the answer either way

Richard Feldman (Nov 22 2023 at 23:00):

I asked how OCaml does this and got some answers: https://twitter.com/rtfeldman/status/1727443546897252521

Richard Feldman (Nov 22 2023 at 23:01):

this one is interesting, but if it's correct, it's not something we could do: https://twitter.com/welltypedwitch/status/1727449171207074289

Richard Feldman (Nov 22 2023 at 23:01):

it didn't occur to me that OCaml could get away with not having that info once type checking is done, but maybe they can?

Ayaz Hafiz (Nov 22 2023 at 23:24):

I'm pretty sure this is a non-problem for OCaml. Inference of recursive types is not really the issue (that part is easy, just eagerly unify all types you compared). Figuring out where recursive types "are recursive" is the problem, and OCaml doesn't have that issue because it boxes ~everything (so codegen doesn't need to care about where the point of recursion happens). At least that's my understanding

Ayaz Hafiz (Nov 22 2023 at 23:24):

Put another way, the problem for us is that during code generation you must translate a structural type to an equivalent nominal type.

Ayaz Hafiz (Nov 22 2023 at 23:25):

unrelated, but fwiw OCaml doesn't support recursive types of the form X : (X, List X) in general. See https://caml.inria.fr/pub/docs/oreilly-book/html/book-ora209.html

Ayaz Hafiz (Nov 22 2023 at 23:27):

Richard Feldman said:

this one is interesting, but if it's correct, it's not something we could do: https://twitter.com/welltypedwitch/status/1727449171207074289

I'm pretty sure this description is https://roc.zulipchat.com/#narrow/stream/395097-compiler-development/topic/Allowing.20implicit.20cycles.20in.20the.20type.20representation/near/401632093. I think by "reference" they mean "are these two type variables equivalent", which is the standard way to infer recursive types

Richard Feldman (Nov 23 2023 at 00:25):

Ayaz Hafiz said:

unrelated, but fwiw OCaml doesn't support recursive types of the form X : (X, List X) in general. See https://caml.inria.fr/pub/docs/oreilly-book/html/book-ora209.html

just checking, but was this meant to be X : (Str, List X) or similar? I don't think X : (X, ...) is possible in general :big_smile:

Ayaz Hafiz (Nov 23 2023 at 01:24):

Right, thanks for the clarification

Richard Feldman (Nov 23 2023 at 11:23):

I wonder if it's possible somehow to make occurs cheap :thinking:

Richard Feldman (Nov 23 2023 at 11:23):

like by trying to arrange it such that everything we're checking is already in cache

Richard Feldman (Nov 23 2023 at 15:17):

here's an idea: what if during unification we record in Content whether a type is concrete? That should be super cheap.

Then we do an occurs check except it short-circuits on concrete types because we know they're either not recursive or else they're nominal and therefore we already know where they recurse

Richard Feldman (Nov 23 2023 at 15:18):

maybe that would cheaply reduce the number of occurs checks low enough to be fast in practice?

Ayaz Hafiz (Nov 23 2023 at 15:19):

v1 is concrete in https://roc.zulipchat.com/#narrow/stream/395097-compiler-development/topic/Allowing.20implicit.20cycles.20in.20the.20type.20representation/near/403501113. How do you detect the recursion point there?

Richard Feldman (Nov 23 2023 at 15:22):

ugh I'm on mobile and the link doesn't work - what's something I can search for to see where that is? :sweat_smile:

Ayaz Hafiz (Nov 23 2023 at 15:26):

My message from Nov21 in this thread that starts with I think it's helpful to work out of this example, just based on first principles.

Richard Feldman (Nov 23 2023 at 15:32):

hm, yeah fair point :thinking:

Richard Feldman (Nov 23 2023 at 15:53):

so I guess part of the challenge is that the last unification, v1 = v3, is isolated and doesn't "realize" the implication that it just created a loop

Richard Feldman (Nov 23 2023 at 15:55):

I wonder if there's some (typically small) set of variables we could keep track of that are "potentially recursive" and then use simd to check them 4 at a time every time we resolve an equals constraint?

Richard Feldman (Nov 23 2023 at 15:56):

but maybe everything is potentially recursive as far as we can tell

Richard Feldman (Nov 23 2023 at 16:14):

so I guess one of the questions is "when to check" right?

Richard Feldman (Nov 23 2023 at 16:15):

(I realize I'm catching up on a lot of questions you've probably already thought of, appreciate the patience :laughing:)

Richard Feldman (Nov 23 2023 at 16:19):

it seems like if we tried to do a full check after resolving every eq constraint, that would be what worse than if we just did one pass at the end, before mono

Richard Feldman (Nov 23 2023 at 16:20):

although I guess that depends a lot on cache behavior

Ayaz Hafiz (Nov 23 2023 at 17:08):

Richard Feldman said:

so I guess one of the questions is "when to check" right?

yeah

Richard Feldman (Nov 24 2023 at 13:26):

Ayaz Hafiz said:

although i just had an idea. maybe it only matters to mark the recursion points during specialization (monomorphization)

so what if we started with this, and then whenever we detect a recursion point during specialization, we go look up the name of the function we're specializing, look up its unspecializd type and canonical IR, and then do a check of that whole function looking for recursion points and writing them down in subs.

That way, if any future specializations run into this, they will all use the same one.

Richard Feldman (Nov 24 2023 at 13:27):

the theory would be that this wouldn't come up often in practice, so doing extra work (but only 1 extra pass per recursion point, not N of them) would be fine in practice

Ayaz Hafiz (Nov 24 2023 at 22:54):

whenever we detect a recursion point during specialization, we go look up the name of the function we're specializing, look up its unspecializd type and canonical IR, and then do a check of that whole function looking for recursion points and writing them down in subs.

I'm not sure that this gets us much. Like, if there are specializations that force the recursion point to occur in a different place, then we still require another specialization - and now we may have two places in the type that we need to Box, instead of just one

Ayaz Hafiz (Nov 24 2023 at 22:57):

But I agree we should do this. More specializations has only the downside of more code to generate, and you can bound the number of specializations by using an explicit type annotation

Last updated: Jul 26 2025 at 12:14 UTC