Stream: ideas

Topic: Stmt SoA: moving symbols out


view this post on Zulip Folkert de Vries (Feb 09 2022 at 21:17):

I just had a fun SoA-related realization: all but one (RuntimeError) of our Stmt variants have a Symbol (or JoinPointId, which is a newtype around a symbol). We can then store this snippet

let a = map left f 
let b = (map right f)
let d = Node a b
return d;

As two arrays

0: "a"
1: "b"
2: "d"
3: "d"

0: Let (map left f)
1: Let (map right f)
2: Let (Node 0 1)
3: Return 2

Here left, right and f are not converted to this array form but they would be. Also I'm ignoring the layout here, which we can put into a third array.

Moving the symbols out of the Stmt variants saves 8 bytes per variant. Our idea so far was that we'd replace the symbol by a NodeId<Symbol>, a 32-bit index into a symbol array. This new approach leaves the index implicit (it's the same as the statement you're looking at), saving an extra 4 bytes for each Stmt node.

But the really exciting thing is that this representation allows us to efficiently move up the tree. E.g. if we encounter

let d = Node a b

Then currently there is not a good way to figure out what a actually is. The only way to do it is to keep some environment around that maps symbols to values. That means you're almost building an interpreter. Also it's slow and keeping the map up-to-date is error prone.

In contrast, given

2: Let (Node 0 1)

If we access to the Stmt array, we can just look at what's at index 0 and 1. But during code generation, at least in debug builds, we can still use the symbols in the symbol array to transfer the variable names into the LLVM IR that we generate.

view this post on Zulip Richard Feldman (Feb 09 2022 at 21:29):

coooooooool :heart_eyes:


Last updated: Jun 16 2026 at 16:19 UTC