Stream: ideas

Topic: UnionFlowList


view this post on Zulip Norbert Hajagos (Sep 28 2023 at 09:02):

Datastructure UnionFlowList (scratch name)

TLDR: Kinda like utf-8 strings, reading the n-th element from a UnionFlowList would always be a linear, left-to-right operation. In return internally the tag discriminators would specify the length of their payload, placed subsequently in the byte array, so an element's length would not be the size of the biggest tag. It would be the size of the tag actually stored.

Reading Bob Nystrom's Crafting Interpretes book, he created a Bytecode interpreter in C where a Chunk was a series of bytes that held op-codes and (depending on the preceeding op-code) data. When he read a CONST op code, he knew the next 2 bytes were a const, so next time he read 2 bytes in. Lots of people followed the book in rust and have their own implementation. Almost all of them are slower, since they use a Vec<Code> for the bytecodes where Code is an enum of payload-less op-codes or op-codes with payloads (like the 2 byte CONST).

Why were the "naive" rust implementations slower?
The size of the elements of the list will be size of the tag discriminator + padding +the chunkiest payload of the union. The list will have a lot of blank memory if the union's type has a big Tag, but it occures rarely in the list. Bigger lists => more cache misses => more waiting for data for the cpu. Advantage: List can be randomly accessed [ O(1) ] since all elements have the same size.

With a UnionFlowList, we could have the same thing that Bob achieved. This is a good case for when you want to use the entire payload. Performance-wise, I think it would complement the talked-about #ideas > Struct Of Arrays idea, having most of the use-cases covered.
Though very specific, I often think about this when using rust and wish I wasn't wasting that memory space, but I'm not going to trade my comfy Vec<Code> with pattern matching for an obscure Vec<u8>.

Why not implement in userspace?

Sounds hard, but could be done. Acessing the payload would require a bound-check tough. A built-in can assume that if the payload-size associated with the tag discriminator is greater than 0, the payload is there.

Linting

There could be a linter warning if the UnionFlowList doesn't acceppt a tag union, saying that you probably don't want that, but sill... it would be better if it only acceppted Tag unions, but idk if that could be done.

Im pretty unqualified

I'm not a low level guy. I just have some free time. idk SIMD, or the penalty on processing speed (if any) the variable-lenght offset jumping imposes when iterating. Would be nice to know if this would actually be faster.


Last updated: Jun 16 2026 at 16:19 UTC