Stream: compiler development

Topic: regions


view this post on Zulip Richard Feldman (Jul 02 2025 at 13:01):

I wanted to lay out a goal - not something we need to pursue right away, necessarily! - but rather a place I'd like us to end up when it comes to Regions

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:02):

this is not about what their numbers represent (e.g. line numbers, indices, whatever) but rather how they're stored

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:02):

this is also starting from Parse AST nodes - tokenization is out of scope (and decoupled from this)

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:03):

so let's say we're parsing some tokens and we end up with this memory layout:

Idx:      0   1   2
AST:     {…} {…} {…}
Regions: {…} {…} {…}

this means we've parsed 3 AST nodes, and we have a Region for each of them

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:03):

if I have an Idx, I can look up the AST node at that index, and also I can look up its Region

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:03):

they're just stored in different arrays

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:04):

now let's say I'm doing canonicalization, and - in this world, although we don't have it today! - we have made it be the case that there is exactly 1 CIR node for each AST node. They are exactly 1-to-1.

Now we've added 3 CIR nodes to the mix, one for each AST node, and memory looks like this:

Idx:      0   1   2
AST:     {…} {…} {…}
CIR:     {…} {…} {…}
Regions: {…} {…} {…}

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:04):

importantly, I didn't have to touch Regions to do this

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:05):

because we made AST and CIR 1-to-1, the entire canonicalization pass was able to just look at AST nodes and make CIR nodes and then in the end we just automatically still have the property that, given an Idx for a CIR node, I can get its Region - even though we didn't have to do any work to copy Region info over or anything

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:05):

now we get to type-checking

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:06):

it's already becoming the case (although we're not there yet) that there is one CIR node per Type Var

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:06):

and if we use the same indexing scheme for those, then we get:

Idx:      0   1   2
AST:     {…} {…} {…}
CIR:     {…} {…} {…}
Types:   {…} {…} {…}
Regions: {…} {…} {…}

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:06):

again, no Region copying was necessary

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:07):

we didn't need to touch it, we just introduced the types and the indices just work out that we still have the same Regions we initially set up during parsing

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:08):

and when we report type mismatch error messages, we have the region info easily accessible - just take the Idx of the type (aka the type var) and go look up the region, and that's the source location

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:09):

there is one more case here, which is that during type-checking, we sometimes need to introduce new types - for example, when we do a polymorphic function call, we have to copy the function's type and then mutate the copy based on how it was called (e.g. if the function was Num(a), Num(a) -> Num(a), we need to be able to call it using a F32 as well as an I64, and each of those calls need to make a separate copy of the type so that we can infer the return value at the call site correctly)

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:10):

and when we're doing that, we still want to be able to refer the regions back to the original source code point if there's an error

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:10):

so at that point, we would actually finally make some new Regions, one per new type added during checking - e.g.

Idx:      0   1   2   3   4
AST:     {…} {…} {…}
CIR:     {…} {…} {…}
Types:   {…} {…} {…} {…} {…}
Regions: {…} {…} {…} {…} {…}

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:10):

importantly, we don't need to go back and make new CIR nodes or AST nodes for this, it's just about types and regions

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:11):

so this is about as memory-efficient as I think we can get things, and it's pretty simple!

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:12):

there is one more potential trick we could do, although I'm not 100% sure it will work (but I suspect it is), which is that if we have a 1-to-1 correspondence between AST nodes and CIR nodes, and we never use AST nodes again after canonicalization...

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:12):

we could just have the canonicalization pass mutate the AST nodes in-place rather than creating new CIR nodes and then discarding the old AST nodes

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:12):

like I said, I'm not 100% sure we could/should do that, but it's worth noting that it's something we could try too

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:12):

anyway, the main thing here (like the title of the topic says) is regions!

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:13):

I think if we can organize things such that regions are laid out in memory this way, it'll make a lot of things simpler and nicer and faster

view this post on Zulip Richard Feldman (Jul 02 2025 at 13:13):

any thoughts on this welcome!

view this post on Zulip Brendan Hansknecht (Jul 02 2025 at 22:05):

Out of curiosity:

  1. What of tokenization and regions?
  2. Old irs go away as we get deeper down the stack, correct?
  3. What of later passes that definitely will add nodes or make larger changes (refcounting as a simple example)?

view this post on Zulip Richard Feldman (Jul 02 2025 at 22:32):

  1. out of scope for this; whatever we decide to do there can work just fine with this design
  2. correct, unless we're in a language server situation
  3. this works up through type checking but stops working as soon as monomorphization enters the picture :smile:

view this post on Zulip Richard Feldman (Jul 02 2025 at 22:32):

so from there on we'll need to do something else

view this post on Zulip Luke Boswell (Jul 02 2025 at 22:37):

Is it right to say the benefit here is that we reduce the size of our AST and CIR nodes by 64bits, moving that memory to the side, such that doesn't need to be touched at until we are generating error reports or debug info like s-expressions.

So improving the likelihood that the IR for a module will fit completely in a smaller cache?

And if we ever achieved the modify in place idea, then we would be effectively halving the memory used again.

view this post on Zulip Anthony Bullard (Jul 02 2025 at 22:41):

i was actually already planning on doing the first thing here in ParseAST!

view this post on Zulip Anthony Bullard (Jul 02 2025 at 22:41):

not mainly for memory savings but because i was sick of all the region extracting code we have for every node type

view this post on Zulip Anthony Bullard (Jul 02 2025 at 22:42):

but this could really reduce the node size and fit way more nodes in the cache line

view this post on Zulip Luke Boswell (Jul 02 2025 at 22:44):

Is there a good way to measure/benchmark this? I'd be super interested to see the difference between our current implementation and after we make this change.

view this post on Zulip Richard Feldman (Jul 02 2025 at 23:00):

partly the benefits are reducing size of nodes, but also the lack of copying means that there's less memory traffic to/from the cache

view this post on Zulip Richard Feldman (Jul 02 2025 at 23:01):

like if I want to copy a Region from AST to CIR node, I need the source Region's address in cache, and also the destination's address in cache

view this post on Zulip Richard Feldman (Jul 02 2025 at 23:01):

so if we can skip the copy, canonicalization doesn't need either of those in cache

view this post on Zulip Richard Feldman (Jul 02 2025 at 23:02):

so the savings in terms of cache impact during canonicalization are actually @sizeOf(Region) * 2 per node

view this post on Zulip Richard Feldman (Jul 02 2025 at 23:02):

because the AST's Region can just get evicted from cache for other things and never need to be brought back in, and there is no such thing as "CIR's Region" to need to get brought in

view this post on Zulip Anthony Bullard (Jul 03 2025 at 14:40):

is there an argument that during parsing this could cause more cache eviction of the node array since we are switching to writing to a different region (pun intended) of memory for adding the region? i guess the same could be said of extra data

view this post on Zulip Anthony Bullard (Jul 03 2025 at 14:41):

But we'll see. it'd be nice if we had a ready made benchmark set up to run when we make a change like this

view this post on Zulip Brendan Hansknecht (Jul 03 2025 at 23:26):

Technically yes depending on usage pattern, but I would guess quite unlikely to be a problem.

view this post on Zulip Richard Feldman (Jul 04 2025 at 00:23):

I don't think so

view this post on Zulip Richard Feldman (Jul 04 2025 at 00:24):

we're writing the same total number of bytes during parsing, and it's all being written sequentially (appending to arrays) so it probably doesn't matter much whether it's split between 2 address ranges or 1

view this post on Zulip Anton (Jul 04 2025 at 09:43):

Luke Boswell said:

Is there a good way to measure/benchmark this? I'd be super interested to see the difference between our current implementation and after we make this change.

Cache misses can be benchmarked like here


Last updated: Jul 26 2025 at 12:14 UTC