Stream: ideas

Topic: hashmap alternatives


view this post on Zulip Richard Feldman (Nov 15 2021 at 05:01):

so I've been surprised to learn over the past year that hashmaps are often slower in practice than flat arrays of key/value tuples where you do an O(n) linear scan for most operations instead of the hashmap's O(1).

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:02):

https://terrainformatica.com/2017/10/15/when-linear-search-is-faster-than-stdmapfind-and-stdunordered_mapfind/ is an example of this in C++ with extremely small collections (like under 10 items) but I've also heard of it being true in practice for some collections with thousands of elements or more

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:03):

but of course eventually the O(n) will catch up with you; if you have a billion elements in the collection, for example, a hash map will be way faster than a linear scan of an array

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:03):

but this makes me wonder about a sorted version of a dictionary backed by a flat array

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:04):

like what if it's just sorted key/value tuples in an array

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:05):

one downside I can see is that you have a lot more branch mispredictions when it's searching the array to find the right spot

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:06):

which wouldn't happen in the case of the linear array scan; instead, the CPU would always be able to prefetch the next cache line because it would know you're doing a linear scan

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:06):

so maybe the O(log n) sorted version might also run slower than the O(n) linear scan! :sweat_smile:

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:11):

it makes me wonder if there's a "small string optimization" version of the data structure which could be helpful - that is, at a small enough capacity (say, under 1024 elements) the backing data structure is a flat array, and above that it's a hash map

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:11):

so you get linear scan performance for smaller collections, and then for larger collections you get a branch misprediction followed by O(1) operations

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:12):

I don't know if there's any data structure out there like that, or any research into its performance!

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:34):

I also be wonder about SIMD and data structures like that - for example, if you had small enough values, you could potentially do a linear scan faster using SIMD

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:35):

but you could take that idea a step further and, instead of using tuples, organize the backing array into 64B chunks with the keys at the beginning and then as many values as will fit in the remainder of the 64B

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:36):

(64B because that's the typical cache line size these days)

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:37):

so for small keys (e.g. numbers) you could probably often scan an entire 64B chunk in 1-2 SIMD instructions to find out if the key is there, and if so, the corresponding value is already in cache

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:39):

alternatively, you could store all the keys in one array and all the values in another; that would mean you'd be able to find the key much faster, but when you did, the value wouldn't be prefetched, so you might have a cache miss

view this post on Zulip Richard Feldman (Nov 15 2021 at 05:39):

interesting things to think about! :smiley:

view this post on Zulip Brendan Hansknecht (Nov 15 2021 at 06:09):

I think the very interesting question and likely hard part of a data structure for something like this is the graceful transition from one form to another. If you use a flat map based dictionary and it grows large enough that you want to swap it to a hashmap based dictionary, there will likely be a huge cost that will make performance analysis quite brittle.

E.g. yesterday it ran fast but today it is terribly slow. Oh, today we have 1025 items and the cost to convert to a hash map broke our performance.

view this post on Zulip Brendan Hansknecht (Nov 15 2021 at 06:12):

I think being able to pick an implementation based on the use case is very important, but without a ton of research and testing, I would be extremely skeptical of defaulting to something other than a hash map.

view this post on Zulip Brendan Hansknecht (Nov 15 2021 at 06:16):

Would mostly be worried about blowing up the worst case performance and users believing roc is a terrible language because of it.

So i guess hashmap is defense against the worst case as opposed to picking for average or best case.

view this post on Zulip Folkert de Vries (Nov 15 2021 at 08:16):

just a general question (also relevant to rust): at what point does storing the keys and values in separate arrays make sense?

view this post on Zulip Folkert de Vries (Nov 15 2021 at 08:17):

it removes padding potentially, but if the key is small, then simd will just tear through that array of keys

view this post on Zulip Folkert de Vries (Nov 15 2021 at 08:17):

and you pay the cost of one load from the values array

view this post on Zulip David K (Nov 15 2021 at 09:40):

I wonder if the best approach would be to have 2 different structures with an identical API, so that one is a drop in for another. Users can choose which ever one they need (and test the performance on a case by case basis).

view this post on Zulip David K (Nov 15 2021 at 09:40):

So you could have a Map which is the default "probably good enough" version backed by a hash map and a SmallMap for increased perf on small maps.

view this post on Zulip David K (Nov 15 2021 at 09:40):

Downside is of course this is now one more thing the user has to think about.

view this post on Zulip Folkert de Vries (Nov 15 2021 at 09:41):

Yes, I prefer rust's model here where there is one solid choice in the stdlib, and if you want something special, there are packages for that

view this post on Zulip Folkert de Vries (Nov 15 2021 at 09:41):

I prefer that to c++'s stdlib having many competing subtly different data structures that seem to do the same thing

view this post on Zulip David K (Nov 15 2021 at 09:45):

Ye, in c++ the annoying thing is that the one that looks like the "default" one, std::map, is the one you should probably never use.

view this post on Zulip Folkert de Vries (Nov 15 2021 at 09:48):

right, that's terrible

view this post on Zulip David K (Nov 15 2021 at 10:56):

I am slightly confused by the if example in that article. The ternary operator chain they use skips the step of loading the string from the array for the comparison.

view this post on Zulip David K (Nov 15 2021 at 10:56):

I tried running it with a function that does have to load the string and got a result that's about 2 times worse (but still faster then std::map and std::unordered_map.

view this post on Zulip David K (Nov 15 2021 at 11:04):

(turns out I messed up the function slightly and it's actually closer to 1/3 slower)

view this post on Zulip David K (Nov 15 2021 at 11:05):

if2_t2e 1.8518 # having to look up values in the array
if_t2e 1.12448
unordered_map_t2e 2.30018
map_t2e 2.99629

view this post on Zulip Folkert de Vries (Nov 15 2021 at 11:10):

C/C++ but can be any modern programming language
:thinking:

view this post on Zulip Folkert de Vries (Nov 15 2021 at 11:10):

40 years old, very modern

view this post on Zulip David K (Nov 15 2021 at 11:17):

C++20 is a pretty nice modern-ish language. Unfortunately this article uses C++11...

view this post on Zulip Zeljko Nesic (Nov 15 2021 at 13:31):

I think that most of those kinds confusion comes from naming.

Eg. is it String or Text or Varchar
Similar words for similar concept.

What might make sense is calling one thing:FastDict and the other one BigDict.

I absolutely agree with the idea to give optimal data structures, but I think we should be transparent about it. Maybe you'd be able to query it at runtime eg:

implementation : Dict k a -> [ FastVariant, BigVariant ]

view this post on Zulip Joshua Warner (Nov 15 2021 at 14:06):

I like the idea of the implementation being easily swappable - and furthermore that the roc runtime/library/etc can just pick the "right" one most of the time. This could be driven by:

This could be either staticly or dynamically dispatched to the correct impl - but if it's static, I'd say that mostly rules out the automated/PGO option since we'd definitely want to avoid cases where PGO could accidentally make the wrong decision and lead to pessimization if you happen to hit some extreme case.

Also note, this is potentially a security problem if you pick the wrong one, since it can lead to much easier DOS attacks. The hashmap impl will be the safe-but-slow one.

view this post on Zulip Jim Duey (Nov 15 2021 at 14:36):

Very interesting area to put some thought into. Pardon me for asking a basic question, but does Roc use anything like a Hash Array Mapped Trie? https://en.wikipedia.org/wiki/Hash_array_mapped_trie (Clojure programmer here. :grinning: ) There's been some work in the Clojure community to make persistent hash maps fast. And I did a version in C that automatically detects when you can mutate a map rather than duplicate structure that relies on the ref count of the data structs.

view this post on Zulip kprotty (Nov 15 2021 at 14:38):

Richard Feldman said:

it makes me wonder if there's a "small string optimization" version of the data structure which could be helpful - that is, at a small enough capacity (say, under 1024 elements) the backing data structure is a flat array, and above that it's a hash map

This seems to be what Erlang does so maybe it has some merit in practice: https://www.erlang.org/doc/efficiency_guide/maps.html
TLDR: they use a flat array for maps/dicts <= 32 items and switch to a HAMT which @Jim Duey mentions above

view this post on Zulip kprotty (Nov 15 2021 at 14:47):

Richard Feldman said:

but you could take that idea a step further and, instead of using tuples, organize the backing array into 64B chunks with the keys at the beginning and then as many values as will fit in the remainder of the 64B

I'm just here to give out references :), but it seems like you're close to describing the design of abseil/swiss hash tables (the default implementation of HashMap in Rust). Some references down below, but the idea is to store 7 or 8 bits of the hash in a separate 16-byte chunk and use SIMD to scan 16 slots at a time (by comparing their hashes: cmpeq_epi8). There would be false positives / collisions but the hope is that the linear SIMD scan instead of jumping around like in quadratic probing + quickly skipping over cells that definitely don't match = speed increase. Also just something to look into

CppCon talk on it: https://www.youtube.com/watch?v=ncHmEUmJZf4
Design notes: https://abseil.io/about/design/swisstables
a TLDR design notes: https://gankra.github.io/blah/hashbrown-tldr/

view this post on Zulip Richard Feldman (Nov 15 2021 at 16:45):

also interesting:

https://preshing.com/20110603/hash-table-performance-tests/
https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

view this post on Zulip Brendan Hansknecht (Nov 15 2021 at 17:32):

Especially if abilities pan out so that Dict doesn't need to be a builtin, I would be 100% for offering any of the super efficient hash maps as our base. Then we can document the performance characteristics well and add comments about other map alternatives that either we also provide implementations of, or we leave to the community to make packages for.
Of course, some of these algorithms might be a tad complex to get the needed performance in only Roc, so not sure how to deal with that.

view this post on Zulip Richard Feldman (Nov 15 2021 at 17:45):

I prefer rust's model here where there is one solid choice in the stdlib, and if you want something special, there are packages for that

yeah one idea to build on that is to aim for a builtin dictionary that has good asymptotics (so your program doesn't explode if you get an outlier amount of entries) but which overall prioritizes speed for a low number of entries (at the expense of performance at a high number of entries), since the more entries you have, the more likely you are to want to specialize away from a one-size-fits-all default and towards a collection more tailored to your particular use case

view this post on Zulip Richard Feldman (Nov 15 2021 at 18:38):

this series of benchmarks is :astonished:

https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/

view this post on Zulip Richard Feldman (Nov 15 2021 at 18:43):

it makes a very thorough case for robin_hood::unordered_flat_map!

view this post on Zulip Richard Feldman (Nov 15 2021 at 18:43):

which is MIT-licensed, conveniently

view this post on Zulip Brendan Hansknecht (Nov 15 2021 at 19:32):

Also, another random note, it is interesting to think what different groups are optimizing for. For example, Google variants of things tend to be optimized for the plethora of internal use cases and tend to focus more on memory usage than speed (at least if they have been optimized in the last 5ish years). The reason for this is that memory usage costs more than compute. Of course there are certain code paths that will directly affect latency of a web service and compute performance matters, but generally speaking adding more cores is cheap and web services are embarrassingly parallel.

view this post on Zulip Joseph Anthony Zullo (Nov 17 2021 at 13:16):

Well, since our lists are really arrays, I don't think tree maps would make sense here? To be consistent we should probably be using array like data structures?

view this post on Zulip Joseph Anthony Zullo (Nov 17 2021 at 13:17):

It's also worth pointing out that there are differences depending on your type of input, I remember reading an article somewhere that showed that Java bitsets are more efficient than Java hashmaps well into bitset sizes into the 1000s, but this only works for integer sets.

view this post on Zulip Joseph Anthony Zullo (Nov 17 2021 at 13:20):

I know elixir/erlang does just association arrays into hash trees, to be consistent with our list implementation we should probably do association arrays to hash tables, unless I'm misunderstanding how lists work here.

view this post on Zulip Brendan Hansknecht (Nov 17 2021 at 16:57):

Can you elaborate more on this? Why should the map data structure be consistent with the array data structure?

Also, by consistent you mean a similar underlying implementation?

And yes, our lists are array like. They are essentially a std::vector.

view this post on Zulip Joseph Anthony Zullo (Nov 18 2021 at 15:03):

On second thought I don't think I'm right, it does clearly say that Dict is like that in Elm in the roc-for-elm-programmers. I just thought it was weird to have lists be a mutable vector under the hood and then to use hash trees (someone mentions HAMTs) for Dicts which are a purely functional data structure. We have Python's list and Elm's Dict. There isn't anything necessarily wrong with this approach, and it could be potentially useful. But it can also be confusing why lists can't be used nonlinearly without cloning, but dicts are cloned for free automatically.


Last updated: Jun 16 2026 at 16:19 UTC