Im somewhat embarrassed to ask but could someone please give me a short, informal explanation
on the concept and role of Subs ?
This topic was moved here from #beginners > Learning about the Subs
type. by Richard Feldman.
nothing to be embarrassed about, it's a great question! :smiley:
Subs
is short for "substitutions"
I like to think of it as a type information database
each entry into it is a Variable
(which could have also been called TypeId
if we were thinking of it as a database)
and then that Variable
maps to a Content
(essentially the "value" part of the key-value pair in the database)
a very important part of it is that values can be "symlinks"
in other words, "this type maps to whatever this other type is"
so if you have like
x = y
y = z
z = blah
then we know that:
x
is the same as whatever the type of y
isy
is the same as whatever the type of z
isz
is the same as whatever the type of blah
isbut the symlinks get collapsed automatically
so for example here, whenever we ask "hey what's the type of x
?" we don't want to jump through all those hops through y
and z
to get to "oh, x
has the same type as blah
"
so instead we compress it in the database to just say x
has the same type as blah
- like that's what we symlink x
directly to
this happens during type inference
we make sure that we never have more than 1 "symlink hop"
this keeps type inference fast
Yes, exactly. So it's also considered a crucial part of Canonicalization, right?
canonicalization happens before all of this
canonicalization basically goes around and creates a new Variable
(aka TypeId
) for everything we're going to do type inference on (e.g. every expression, every pattern, etc.)
but it doesn't actually do anything with them yet, it just sort of hands out the IDs
then after canonicalization, we have constraint generation, which generates a Constraint
data structure which describes how each of the Variable
s relates to each other (e.g. "x has the same type as y" is a constraint, or maybe "foo is a Str
" because we had foo = "..."
and we know that string literals have the type Str
)
Richard Feldman said:
so instead we compress it in the database to just say
x
has the same type asblah
- like that's what we symlinkx
directly to
Is that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?
and then finally we have the "solving" step, where we go through and take the initial Subs
(which just has a bunch of uninitialized Variable
s in it) and iterate through all the constraints to apply them to the Subs
such that at the end of solving, we have a complete Subs
type database and can ask it "hey what's the type of this Variable
I have here?" and it can always answer that question correctly
Miilyn said:
Richard Feldman said:
so instead we compress it in the database to just say
x
has the same type asblah
- like that's what we symlinkx
directly toIs that "deduplication" of types related to what you and Ayaz were talking about on the podcast in regards to "lambda set defunctionalization" ?
not really - it's called a union-find data structure, and type checkers that use Hindley-Milner type inference (like Roc, Elm, Haskell, etc. do) basically all use it
regardless of whether they have lambda sets
Ok, I will have to digest and do further research from here.
It's a wonderful starting point, Im super grateful. Thank you for generously taking the time to explain.
happy to help! :grinning:
I learned a lot from this thread :smiley:, thanks!
Me too
I think Ayaz started on a longer guide for compiler internals https://github.com/roc-lang/roc/blob/main/crates/compiler/DESIGN.md
It would be good to flesh that out if anyone is doing research and learns something. Im sure even a few dot points would be helpful for the next person.
That doc could definitely use some love, but also feel free to ask any questions here anytime!
Last updated: Jul 06 2025 at 12:14 UTC