Stream: compiler development

Topic: Learning about the `Subs` type.


view this post on Zulip Miilyn (Mar 10 2024 at 13:49):

Im somewhat embarrassed to ask but could someone please give me a short, informal explanation
on the concept and role of Subs ?

view this post on Zulip Notification Bot (Mar 10 2024 at 14:14):

This topic was moved here from #beginners > Learning about the Subs type. by Richard Feldman.

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:15):

nothing to be embarrassed about, it's a great question! :smiley:

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:15):

Subs is short for "substitutions"

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:15):

I like to think of it as a type information database

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:15):

each entry into it is a Variable (which could have also been called TypeId if we were thinking of it as a database)

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:15):

and then that Variable maps to a Content (essentially the "value" part of the key-value pair in the database)

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:16):

a very important part of it is that values can be "symlinks"

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:16):

in other words, "this type maps to whatever this other type is"

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:16):

so if you have like

x = y
y = z
z = blah

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:17):

then we know that:

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:17):

but the symlinks get collapsed automatically

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:18):

so for example here, whenever we ask "hey what's the type of x?" we don't want to jump through all those hops through y and z to get to "oh, x has the same type as blah"

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:18):

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:18):

this happens during type inference

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:19):

we make sure that we never have more than 1 "symlink hop"

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:19):

this keeps type inference fast

view this post on Zulip Miilyn (Mar 10 2024 at 14:19):

Yes, exactly. So it's also considered a crucial part of Canonicalization, right?

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:19):

canonicalization happens before all of this

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:20):

canonicalization basically goes around and creates a new Variable (aka TypeId) for everything we're going to do type inference on (e.g. every expression, every pattern, etc.)

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:20):

but it doesn't actually do anything with them yet, it just sort of hands out the IDs

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:21):

then after canonicalization, we have constraint generation, which generates a Constraint data structure which describes how each of the Variables relates to each other (e.g. "x has the same type as y" is a constraint, or maybe "foo is a Str" because we had foo = "..." and we know that string literals have the type Str)

view this post on Zulip Miilyn (Mar 10 2024 at 14:21):

Richard Feldman said:

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

Is that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:22):

and then finally we have the "solving" step, where we go through and take the initial Subs (which just has a bunch of uninitialized Variables in it) and iterate through all the constraints to apply them to the Subs

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:22):

such that at the end of solving, we have a complete Subs type database and can ask it "hey what's the type of this Variable I have here?" and it can always answer that question correctly

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:23):

Miilyn said:

Richard Feldman said:

so instead we compress it in the database to just say x has the same type as blah - like that's what we symlink x directly to

Is that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?

not really - it's called a union-find data structure, and type checkers that use Hindley-Milner type inference (like Roc, Elm, Haskell, etc. do) basically all use it

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:23):

regardless of whether they have lambda sets

view this post on Zulip Miilyn (Mar 10 2024 at 14:28):

Ok, I will have to digest and do further research from here.
It's a wonderful starting point, Im super grateful. Thank you for generously taking the time to explain.

view this post on Zulip Richard Feldman (Mar 10 2024 at 14:29):

happy to help! :grinning:

view this post on Zulip Isaac Van Doren (Mar 10 2024 at 17:59):

I learned a lot from this thread :smiley:, thanks!

view this post on Zulip Luke Boswell (Mar 10 2024 at 20:09):

Me too

view this post on Zulip Luke Boswell (Mar 10 2024 at 20:10):

I think Ayaz started on a longer guide for compiler internals https://github.com/roc-lang/roc/blob/main/crates/compiler/DESIGN.md

view this post on Zulip Luke Boswell (Mar 10 2024 at 20:13):

It would be good to flesh that out if anyone is doing research and learns something. Im sure even a few dot points would be helpful for the next person.

view this post on Zulip Ayaz Hafiz (Mar 16 2024 at 01:08):

That doc could definitely use some love, but also feel free to ask any questions here anytime!


Last updated: Jul 06 2025 at 12:14 UTC