monomorphizing dictionaries · ideas

Stream: ideas

Topic: monomorphizing dictionaries

Richard Feldman (Sep 15 2022 at 04:16):

Java's EnumMap is a specialized dictionary for keys that are enumerations, where it essentially uses the fact that they're contiguous integers to back the data structure with a flat array, so there's the fastest possible lookup times and it never needs to reallocate

Richard Feldman (Sep 15 2022 at 04:17):

it occurs to me that we could do this automatically by essentially monomorphizing Dict SomeEnumeration V into a pointer to an array - where the length would be statically known to be the number of tags in the enumeration, and there wouldn't be a capacity because it would never grow

Brendan Hansknecht (Sep 15 2022 at 04:36):

That's a really cool idea.

Brian Carroll (Sep 15 2022 at 05:24):

Ooh! Love it!

Brian Carroll (Sep 15 2022 at 05:25):

No bounds checks needed either, if we can convince LLVM of that.

Richard Feldman (Sep 15 2022 at 05:26):

ooh good point! I think having the length get inserted statically would take care of that

Richard Feldman (Sep 15 2022 at 05:27):

like if the specialization of Dict.len for those was hardcoded at compile time to return a static number, then once that gets inlined, LLVM should see the bounds check as being something like if 1 < 2 and elide it!

Brian Carroll (Sep 15 2022 at 05:27):

Yeah or we could treat it as a tuple

Brian Carroll (Sep 15 2022 at 05:28):

Oh right, good point

Brian Carroll (Sep 15 2022 at 05:29):

But couldn't it be a variable like x<2 ?

Richard Feldman (Sep 15 2022 at 05:32):

oh hm, yeah

Richard Feldman (Sep 15 2022 at 05:32):

oh, actually

Richard Feldman (Sep 15 2022 at 05:33):

since we're monomorphizing, we could just directly monomorphize Dict.get specially in this case to a different implementation which doesn't even emit a bounds check in the first place

Brian Carroll (Sep 15 2022 at 05:36):

Oh yeah :+1:

Folkert de Vries (Sep 15 2022 at 11:09):

wait, the length of such a dict is not a constant right? the set of keys is limited, but it's no guarantee that each key is in the map?

Richard Feldman (Sep 15 2022 at 11:37):

oh good point

Richard Feldman (Sep 15 2022 at 11:37):

so length varies, but capacity does not

Folkert de Vries (Sep 15 2022 at 11:38):

yes and you need some tag in the array to distinguish a present from a missing element

Folkert de Vries (Sep 15 2022 at 11:38):

so, Option<T> effectively

Folkert de Vries (Sep 15 2022 at 11:38):

or some bitmask off to the side somewhere

Richard Feldman (Sep 15 2022 at 15:46):

hrm, another interesting point: if we offer the guarantee (from the IndexMap implementation) that insertion order is maintained, then this would need to be less efficient

Richard Feldman (Sep 15 2022 at 15:47):

in the sense that it would need to have 2 arrays, one for ordering and the other for fast lookups

Richard Feldman (Sep 15 2022 at 15:48):

but I suppose insertions and lookups would both be extremely fast (basically it's what I originally said plus a Vec::push that never needs to reallocate or check capacity)

Richard Feldman (Sep 15 2022 at 15:48):

and then iteration actually gets faster, since you just have an array that you can walk over directly instead of having to check for presence on every key

Richard Feldman (Sep 15 2022 at 15:49):

so maybe that's fine (or even an improvement, depending on whether you want to traverse them) after all :big_smile:

Richard Feldman (Sep 15 2022 at 15:49):

I guess in the case of a Set, traversing would be common

Brendan Hansknecht (Sep 15 2022 at 17:36):

Luckily we could size the integers in the ordering array to save space. That will make it more cache friendly.

Brendan Hansknecht (Sep 15 2022 at 17:37):

But yeah, extra lookup on every use

Qqwy / Marten (Sep 16 2022 at 09:47):

It might make sense to expose both variations of dictionaries (ones with and without the guarantee about insertion order being maintained as long as there are no deletes) as different user types. I think that there are many cases in which the guarantee is not needed and extra performance would be nice in those situations.

Brendan Hansknecht (Sep 16 2022 at 14:53):

We have discussed that a lot in the past. I think the core issue is that the unordered version implicitly exposes randomness to the program and doesn't feel like it fits s pure functional language.

Also, roc general pushes for simplicity more than max performance, and based on some tests of indexmap in rust, it tends to be similarly performant.

Brendan Hansknecht (Sep 16 2022 at 14:57):

Also, the random version can be made in a roc userland library if there is really enough demand for it.

Last updated: Jul 23 2026 at 13:15 UTC