Learning about the `Subs` type. · compiler development

so for example here, whenever we ask "hey what's the type of x?" we don't want to jump through all those hops through y and z to get to "oh, x has the same type as blah"

Richard Feldman (Mar 10 2024 at 14:18):

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

Richard Feldman (Mar 10 2024 at 14:18):

this happens during type inference

Richard Feldman (Mar 10 2024 at 14:19):

we make sure that we never have more than 1 "symlink hop"

Richard Feldman (Mar 10 2024 at 14:19):

this keeps type inference fast

Miilyn (Mar 10 2024 at 14:19):

Yes, exactly. So it's also considered a crucial part of Canonicalization, right?

Richard Feldman (Mar 10 2024 at 14:19):

canonicalization happens before all of this

Richard Feldman (Mar 10 2024 at 14:20):

canonicalization basically goes around and creates a new Variable (aka TypeId) for everything we're going to do type inference on (e.g. every expression, every pattern, etc.)

Richard Feldman (Mar 10 2024 at 14:20):

but it doesn't actually do anything with them yet, it just sort of hands out the IDs

Richard Feldman (Mar 10 2024 at 14:21):

then after canonicalization, we have constraint generation, which generates a Constraint data structure which describes how each of the Variables relates to each other (e.g. "x has the same type as y" is a constraint, or maybe "foo is a Str" because we had foo = "..." and we know that string literals have the type Str)

Miilyn (Mar 10 2024 at 14:21):

Richard Feldman said:

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

Is that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?

Richard Feldman (Mar 10 2024 at 14:22):

and then finally we have the "solving" step, where we go through and take the initial Subs (which just has a bunch of uninitialized Variables in it) and iterate through all the constraints to apply them to the Subs

Richard Feldman (Mar 10 2024 at 14:22):

such that at the end of solving, we have a complete Subs type database and can ask it "hey what's the type of this Variable I have here?" and it can always answer that question correctly

Richard Feldman (Mar 10 2024 at 14:23):

Miilyn said:

Richard Feldman said:

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

Is that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?

not really - it's called a union-find data structure, and type checkers that use Hindley-Milner type inference (like Roc, Elm, Haskell, etc. do) basically all use it

Richard Feldman (Mar 10 2024 at 14:23):

regardless of whether they have lambda sets

Miilyn (Mar 10 2024 at 14:28):

Ok, I will have to digest and do further research from here.
It's a wonderful starting point, Im super grateful. Thank you for generously taking the time to explain.

Richard Feldman (Mar 10 2024 at 14:29):

happy to help! :grinning:

Isaac Van Doren (Mar 10 2024 at 17:59):

I learned a lot from this thread :smiley:, thanks!

Luke Boswell (Mar 10 2024 at 20:09):

Me too

Luke Boswell (Mar 10 2024 at 20:10):

I think Ayaz started on a longer guide for compiler internals https://github.com/roc-lang/roc/blob/main/crates/compiler/DESIGN.md

Luke Boswell (Mar 10 2024 at 20:13):

It would be good to flesh that out if anyone is doing research and learns something. Im sure even a few dot points would be helpful for the next person.

Ayaz Hafiz (Mar 16 2024 at 01:08):

That doc could definitely use some love, but also feel free to ask any questions here anytime!

Last updated: Aug 17 2025 at 12:14 UTC