Stream: ideas

Topic: monomorphizing dictionaries


view this post on Zulip Richard Feldman (Sep 15 2022 at 04:16):

Java's EnumMap is a specialized dictionary for keys that are enumerations, where it essentially uses the fact that they're contiguous integers to back the data structure with a flat array, so there's the fastest possible lookup times and it never needs to reallocate

view this post on Zulip Richard Feldman (Sep 15 2022 at 04:17):

it occurs to me that we could do this automatically by essentially monomorphizing Dict SomeEnumeration V into a pointer to an array - where the length would be statically known to be the number of tags in the enumeration, and there wouldn't be a capacity because it would never grow

view this post on Zulip Brendan Hansknecht (Sep 15 2022 at 04:36):

That's a really cool idea.

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:24):

Ooh! Love it!

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:25):

No bounds checks needed either, if we can convince LLVM of that.

view this post on Zulip Richard Feldman (Sep 15 2022 at 05:26):

ooh good point! I think having the length get inserted statically would take care of that

view this post on Zulip Richard Feldman (Sep 15 2022 at 05:27):

like if the specialization of Dict.len for those was hardcoded at compile time to return a static number, then once that gets inlined, LLVM should see the bounds check as being something like if 1 < 2 and elide it!

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:27):

Yeah or we could treat it as a tuple

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:28):

Oh right, good point

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:29):

But couldn't it be a variable like x<2 ?

view this post on Zulip Richard Feldman (Sep 15 2022 at 05:32):

oh hm, yeah

view this post on Zulip Richard Feldman (Sep 15 2022 at 05:32):

oh, actually

view this post on Zulip Richard Feldman (Sep 15 2022 at 05:33):

since we're monomorphizing, we could just directly monomorphize Dict.get specially in this case to a different implementation which doesn't even emit a bounds check in the first place

view this post on Zulip Brian Carroll (Sep 15 2022 at 05:36):

Oh yeah :+1:

view this post on Zulip Folkert de Vries (Sep 15 2022 at 11:09):

wait, the length of such a dict is not a constant right? the set of keys is limited, but it's no guarantee that each key is in the map?

view this post on Zulip Richard Feldman (Sep 15 2022 at 11:37):

oh good point

view this post on Zulip Richard Feldman (Sep 15 2022 at 11:37):

so length varies, but capacity does not

view this post on Zulip Folkert de Vries (Sep 15 2022 at 11:38):

yes and you need some tag in the array to distinguish a present from a missing element

view this post on Zulip Folkert de Vries (Sep 15 2022 at 11:38):

so, Option<T> effectively

view this post on Zulip Folkert de Vries (Sep 15 2022 at 11:38):

or some bitmask off to the side somewhere

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:46):

hrm, another interesting point: if we offer the guarantee (from the IndexMap implementation) that insertion order is maintained, then this would need to be less efficient

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:47):

in the sense that it would need to have 2 arrays, one for ordering and the other for fast lookups

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:48):

but I suppose insertions and lookups would both be extremely fast (basically it's what I originally said plus a Vec::push that never needs to reallocate or check capacity)

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:48):

and then iteration actually gets faster, since you just have an array that you can walk over directly instead of having to check for presence on every key

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:49):

so maybe that's fine (or even an improvement, depending on whether you want to traverse them) after all :big_smile:

view this post on Zulip Richard Feldman (Sep 15 2022 at 15:49):

I guess in the case of a Set, traversing would be common

view this post on Zulip Brendan Hansknecht (Sep 15 2022 at 17:36):

Luckily we could size the integers in the ordering array to save space. That will make it more cache friendly.

view this post on Zulip Brendan Hansknecht (Sep 15 2022 at 17:37):

But yeah, extra lookup on every use

view this post on Zulip Qqwy / Marten (Sep 16 2022 at 09:47):

It might make sense to expose both variations of dictionaries (ones with and without the guarantee about insertion order being maintained as long as there are no deletes) as different user types. I think that there are many cases in which the guarantee is not needed and extra performance would be nice in those situations.

view this post on Zulip Brendan Hansknecht (Sep 16 2022 at 14:53):

We have discussed that a lot in the past. I think the core issue is that the unordered version implicitly exposes randomness to the program and doesn't feel like it fits s pure functional language.

Also, roc general pushes for simplicity more than max performance, and based on some tests of indexmap in rust, it tends to be similarly performant.

view this post on Zulip Brendan Hansknecht (Sep 16 2022 at 14:57):

Also, the random version can be made in a roc userland library if there is really enough demand for it.


Last updated: Jun 16 2026 at 16:19 UTC