There have been discussions about adding the Hash ability and a hash-based Dict/Set implementation for some time, but AFAIK, there isn't a concrete description of the API to be exposed (please correct me if I'm wrong here). So to kick this off, here is a barebones API that I think could work for us, at least to get started:
Hash has
hash : a, hasher -> hasher | a has Hash, hasher has Hasher
Hasher has
addBytes : a, List U8 -> a | a has Hasher
complete : a -> U64 | a has Hasher
hashValue : a, hasher -> U64 | a has Hash, hasher has Hasher
hashValue = \val, hasher ->
hash val hasher |> complete
we'll derive the Hash ability for structural types, and Hasher won't be derivable - it will be an ability an author of hashing algorithms implements to support hashing types that implement Hash.
I'm intentionally offering a simple API here so we can iterate as needed. Thoughts?
could Hasher run faster if it had all the methods of Rust's Hasher?
so for example it didn't need to heap-allocate a List when given an integer
I was thinking the derivers would always call Num.toBytes for when it was time to hash a number. once we have constant lists that would be zero-costs since you could just treat the number as a slice into the list of bytes
But I think having a write* for each number type also makes sense. That's what we do with Encoding/Decoding as well
hm, but wouldn't that only be true of a statically-known number?
No, because a U64 is equivalent to a slice of 8 bytes right
I think we will want follow rust. The cost of going to bytes and not getting optimizations for the specific algorithm will be high.
Rust's default implementations of Hash also take a slice into the bytes of a number: https://doc.rust-lang.org/src/core/hash/mod.rs.html#373-375
The core algorithm would to a lot of extra checks that could be removed with compile time know sizes and separately optimized implemtenation.
ok, so separate question: it seems that Rust hardcodes the output to be U64 - is that good? :sweat_smile:
like as opposed to requiring the hasher to also provide (for example) a U32 and U128 output
for example, I know some hashing algorithms are much faster at making 32-bit than 64-bit numbers
so maybe it would be useful to have that as an output option, for when you know you're using one of those
and similarly, maybe in the future there would be a desire for 128-bit hashes? I don't really know enough to say whether that would be true or not though
One challenge there is that not all hashing algorithms can support smaller-sized output sizes
The question is what is the goal? If we want Hasher for hash maps, it should be Nat or U64 or something of that sort. If we also want it to support cryptographic hashes, it would need to support other sizes.
I definitely think hash maps is the main goal
Then I would say the output should be either Nat or U64.
trying to output Nat sounds tricky with all the arithmetic involved, so I guess U64 it is :sweat_smile:
Also, I was looking at a rust implementation of wyhash, I am am surprised that everything is done with bytes. I would assume the overhead would be quite expensive, but I guess they inline everything which essentially removes the cast to bytes and just converts things to u64.
this is kind of interesting re. u64 vs usize hashing in Rust: https://github.com/rust-lang/rust/pull/36595#issuecomment-249725336
I am am surprised that everything is done with bytes. I would assume the overhead would be quite expensive
It looks like they need 32 bits at a time exactly: https://github.com/eldruin/wyhash-rs/blob/8707fbe33bcd8354819712a93a4457cc83c367c2/src/v1/functions.rs#L78-L79 I bet most of the bytes you get are just direct indexes into values on the stack so it's not a big deal
Just for a quick reference, with current roc, hashing a u64 is about 3x faster than converting a u64 to a list and then hashing it.
This is with reusing the same list and using List.set to avoid calling roc_alloc
Definitely some future work will make this better, but I am not sure how much better.
I was thinking the derivers would always call Num.toBytes for when it was time to hash a number. once we have constant lists that would be zero-costs since you could just treat the number as a slice into the list of bytes
I don't think we could make this a seamless slice. It would have to be something else special. There would be no actual memory to point to due to the number being in a register. You can't just put the number on the stack because once you return from a function, it could get clobbered.
Thus, you would still have to call roc_alloc and put it on the heap to keep the memory around.
Otherwise, the list returned from Num.toBytes would need to ensure it never escaped the current scope without allocating first.
So I guess not having real references means that we will have issues that Rust does not run into.
With all this said, I like the idea of requiring a hash function for all unsigned numbers. All other numbers will just automatically get mapped to those number types. Everything else would get mapped to a List of bytes (or a List of some unsigned number type).
Also, seamless slices may make this point mute, but I think it might also be useful to have a function with the signiture List U8 -> {transmuted: List U64, remainder: List U8} that essentially is a memory transmute with limited scope related to number types.
I don't think we could make this a seamless slice. It would have to be something else special. There would be no actual memory to point to due to the number being in a register. You can't just put the number on the stack because once you return from a function, it could get clobbered.
I don't understand why this is a problem - if I have a U64 rvalue, I can transmute that to a [u8; 8] rvalue, and then I can create a RocList from that with length 4 and zero capacity. The U64 doesn't need to be bound anywhere. What am I missing here?
When we return a list, we don't do anything to it's data. We just return a pointer, length, and capacity. There also is no small list.
if we created the list from a reference to a U64. The U64 would be bytes on the stack. The list would point to those bytes. If we return, those bytes now should be considered clobbered with the stack frame. The next function call might overwrite it.
If we instead put the bytes of the U64 onto the heap, it will cost a call to roc_alloc.
oh, I see. there is no small list optimization
okay got it
I was assuming the list owned the elements
I guess we could technically add small list optimization for types 16 bytes or under (due to padding constraints). That would also make this fast.
Yeah, that seems ideal. Then Num.toBytes also wins in general, not just for hashing purposes
elsewhere this might be annoying though, I would guess most data structures are >= 16 bytes so you could only put one of them in the small list before having to roc_alloc
For a record, you wouldn't turn the entire thing into bytes. You would instead call hash an each element within the record. So the small list optimization should still help.
This is because opaque types may not want every field hashed, lists need to hash the data they point to, and you have to be careful around hashing padding.
Anyway, I think as a summary, the api you mentioned should be fine if we add some form of small list optimization. If we do not add some form of small list optimization, I think that we should add a function for every unsigned integer type as well. (aside, not sure how the api works with bool and tags in general)
yeah I think it's best to assume no small list optimization here
i think for every integer type is fine, to align with Encode/Decode. Bools/tags should not need any ability member, though we will probably also want a way to grab a tag union’s discriminant from user space to make hashing them easier
Ah, makes sense.
Then let's go with that. Hash really is just a weird form of encode that loses information.
it also can't fail, unlike Encode
Encode can fail....interesting. i would have guessed only decode can fail.
Oh, one more potential change, but it would require some for of transmute type function (List Num a -> List U8, though it could also theoretically support some record types if they can safely be converted to bytes). Looking at the rust Hash trait, it also has hash slice. This is specifically so that some types can make their hash function just hash the list as a list of bytes instead of a list of individual elements. This would greatly improve the speed of hashing for a list of ints for example.
also worth noting that for Dict to be generalized, we'll need not only Hash but also Eq - although for now everything kinda "has Eq" anyway, so that's not a blocker for anything except custom equality
Encode can fail because some serialization formats can't represent every type Roc supports - this didn't occur to me at first, but then I realized serde's serialize can fail and started thinking about why they would do that...
another Zulip thread talked about some values that aren't representable in JSON: integers too big to fit in a F64 (since that's what JS uses for its number types) and NaN
(I don't think the JSON spec actually puts a cap on max integer size, actually, although in practice there's a limit to what a JS engine will decode successfully)
yeah we’ll need some more stuff for abilities to support binding variables to both Hash and Eq. I made an issue about that earlier today, will try to implement tomorrow
Also re tag discriminants- I think we can get away with not exposing that in user space. Derived Hash impls can use a discriminant-extracting function we hide away, and Hash for opaque types should decide how to represent discriminants themselves. That way we avoid leaking the discriminant representation through a surface function (you could still maybe figure it out by hashing a tag union, but only if the union has no tags that have payloads)
Also, just did some testing of my wyhash implementation again. If you tell roc to always inline all of the wyhash function, the performance is pretty decent. Compared to the original C++ implementation.
Single byte -> ~ 0.5x
256B chunks -> ~0.66x
65KB chunks -> ~1x
As an aside, if we write direct functions like I created for U64, it can actually be faster than the C++ version. Though not by much iirc.
Is that a time metric or a throughput metric?
Ah based on your last comment it must be throughput
Richard Feldman said:
also worth noting that for
Dictto be generalized, we'll need not onlyHashbut alsoEq- although for now everything kinda "has Eq" anyway, so that's not a blocker for anything except custom equality
Eq is necessary to handle hash conflicts I presume?
If so, IIRC there are ways around this, by having a second (and third if necessary, etc.) round of hashing if a conflict arises. Let me see if I can find the literature about this again.
EDIT: No, I was mistaken. This is not possible.
About working with bytes vs U32 vs U64: Many 'fast' (that is, non-cryptographic) hashing algorithms try to harness SIMD as much as possible. My educated guess is that this might be why some hashing algorithms ostensibly work on bytes but still are able to be fast.
Richard Feldman said:
another Zulip thread talked about some values that aren't representable in JSON: integers too big to fit in a
F64(since that's what JS uses for its number types) and NaN(I don't think the JSON spec actually puts a cap on max integer size, actually, although in practice there's a limit to what a JS engine will decode successfully)
:upside_down: I read all seven variants of the JSON spec a while back while working on the Roc JSON parser.
I recommend https://seriot.ch/projects/parsing_json.html for anyone who wants a brief summary of the intricacies of working with JSON.
Any parser that turns all numeric values encountered in JSON to IEEE754 double-precision floats (such as, but by far not limited to, JavaScript's built-in JSON parser) is unable to parse integers outside of the range (-2^53..2^53) because a 64-bit float has a 52-bit mantissa.
Most notably Twitter encountered this problem in the wild when moving their API ids from sequential integers to 'snowflakes' (IDs with rough ordering because the first part of their number is based on a timestamp) https://developer.twitter.com/ja/docs/basics/twitter-ids.
The standard itself confusingly claims both "A JSON parser MUST accept all texts that conform to the JSON grammar" and in the same paragraph, "An implementation may set limits on the range and precision of numbers.". (section 9 of RFC8259). That's as much clarity as we get.
So: While the spec does not explicitly disallow it, how large numbers are handled is implementation defined, and thus encoding large numbers in JSON is not portable.
Another relatively well-known tidbit is that NaN and +-Infinity are not allowed in JSON.
Another edge case is raw (non-Unicode) bytestrings. These are also plainly not allowed.
A less well-known problem is that older standards only allowed arrays and objects at the top level in JSON. This restriction was relaxed in newer versions of the standard (RFC7158 from 2014), but many parsers in the wild still follow this older rule. I've spent two full days of debugging on a professional project once because of this problem.
OK, sorry for my slightly off-topic rant :sweat_smile:
The tl;dr: Hash encoding is definitely different enough from other kinds of encoding to merit its own ability.
Is that a time metric or a throughput metric?
Yeah, it was a throughput metric.
That being said, I just worked on it this morning to see if I could close the gap. The answer is yes. We can get 100% performance parity with the C++ version for those benchmarks.
For the single byte benchmark, I had to improve my test runner and run longer (the hash was actually running fast).
For the other two benchmarks, I changed all of my additions and subtractions to wrapping addition and subtraction (which should be 100% safe assuming no bugs).
Now all of those benchmarks are approximately the same speed as C++ :tada: (Though we more heavily rely on inlining in small List cases)
wow, beautiful! I guess that's the first pure-Roc hashing function to achieve performance parity with C++! :tada:
did that require telling the Roc compiler to do more inlining than it normally would? Or was this with a stock (e.g. nightly) compiler build?
This is with the modified compiler that does more inlining. Without it the performance story is much worse different. ~2x slower for small lists and ~10x slower for large lists.
I wonder if we should increase llvms default inline aggressiveness (or add a flag to modify it. The reverse of -opt-size). Due to our extra checks on math operations and lists sizes, functions are less likely to get inlined than the c equivalent without the checks.
In our case, we use the same value rust uses, iirc, which is 275. For hashing, the magic number seems to be about 550.
Otherwise, a more direct solution is to special case the hash module, but the feels less great.
In this specific case, there is no notable executable size difference.
I'm definitely game for trying out increasing that number! Seems like an easy knob to turn, and if we see problems from it in other programs we can always reevaluate
I guess in the future if someone starts looking into binary size bloat (and -opt-size is too slow), we will have a good hunch of what to look into.
So I was doing some tests of tracking state in the hasher vs passing in a list of bytes. So in one form, it basically adds bytes one at a time, but keeps track of an internal state to batch operations. In the other version, it processes a list of bytes as a whole.
As expected, the processing the list of bytes as whole is way faster (~5-10x). This is even the case if I force a copy in the list of bytes case. There may be something wrong with my code gen, but I think it is correct. Also, to accumulate state, I originally used a statically sized list that never got reallocated. That was slow, so I tried the equivalent of an array....I don't advise it. It was faster than the list version, but still way slower than the version without state.
The Must Fun Data Structure to Use
That's really interesting. Do you have two diffs? I'd love to see what mono and LLVM we produce
but yeah it sounds like we should flip the impl to generate a list of bytes whenever possible
Here is a monster diff using the record to store data. It only works if you set the number of elements to hash to x where x > 48 and x%48=16. That was just the easiest case to get working: https://github.com/bhansconnect/roc-dict/commit/8dbb8e5abe72e993c276e5140c34499a1d4ca9f1
Also, the performance difference is much worse if you increase the compilers inline threshold (I set it to 750).
I also added a stateful list based version for reference. Requires a modified compiler for any vague performance. Will upload and post branch here shortly: https://github.com/bhansconnect/roc-dict/commit/67f0ef0b458a96107cca3309605022f919e48782
The hashing-perf branch has my performance changes for running hashing.
Also, I realized that in the last commit I pushed, I accidentally did so with debug = true in the platform. If doing benchmarks, make sure to change that back to false.
Also, I just pushed an update stateful List based version that uses a ring buffer to avoid extra copying. It is now close to the record based version in performance. Still much slower than the stateless List of bytes based version even if we are forced to copy the list of bytes on every call to hash.
For hashing, do you think it is worth supporting unordered hashing? Then you could make a set hashable for example (without insertion order mattering).
I think to support it, either it would have to be added directly to the hasher API, or it would need to expose some way to create a new/sub-hasher from the Hasher API. That way, you can hash individual elements and merge them in an unordered manner. So essentially you need some special accumulation.
We could expose something like Hash.hashUnordered that would be generic between all hashers.
Could that be implemented as the custom hash implementation for a container, rather than something exposed by the ability?
Not with the current hasher impl. Cause you need to hash essentially
So you can't only accumulate with add*
Can you elaborate by what you mean by "unordered friendly manner" here? Ultimately the elements are going to follow some ordering, how do you want to hash them?
One of the simplest ways and what absl does (performant, though i don't think it is guaranteed to be as good of a hash). Sum the hash of each element with wrapping addition. Also add 1 every time you overflow to not lose entropy. Then take that final value do a proper hash mix/combine with the initial state.
A more mathematically correct way would sort the hashes and then hash the hash list. But that costs a lot of memory space. Either way require hashing each item with a clean state.
I see
if you provide the container a constructor to build a hasher this is not a problem though, right? That's what Rust does with default type variables I think
Ah, so you pass a constructor into the containers hash function along with your current hasher.
But wouldn't that not match the hash ability API anymore?
Or, the container keeps the hash function in their data. Then it can use that in its hash implementation
Seems odd to add that to the constructor of every Set for example....so I don't particularly like that solution.
Well, you could have a default constructor that provides the default hasher, whatever that is
another downside of this though is that e.g. Set would gain an extra type variable. You could expose Set as the default alias with the default hasher though, and have an SetWithHasher (better name needed) as the actual implementation folks can use if they want to use a custom hasher.
e.g.
SetWithHasher k hasher := {makeHasher : {} -> hasher, ...} | hasher has Hasher
Set k : SetWithHasher k DefaultHasher
empty : Set k
emptyWithHasher : hasher -> Set k hasher
This is needed if you want a container that is generic over the hashing algorithm anyway
I guess the other option is to expose the "sub hasher" constructor on the Hasher ability, but I don't know how prudent that is - like what if you want to create a sub hasher with other configuration parameters? Or is that not an actual concern
It seems wrong to have a hash function that takes a hasher but then hash with a different algorithm. Maybe someone intentionally wants to use a stronger hash function. They would assume the set would use the hasher they provide to it's hash function, but it wouldn't.
Ah, you're talking about hashing Set itself
I think mostly you just want to create a version with a reset internal state.
I don't think you would change any config
Yeah, hashing set itself or anything else that is unordered.
hmm okay
so we would need
Hasher has
# .. the other stuff
new : a -> a | a has Hasher
yeah?
How common of use case is this? I had never really heard of this rehashing idea before, I've only ever seen the appending to a constant hasher and calling "finish" at the end
Literally only required for unordered hashing and not needed for anything else
Unordered hashing is pretty rare.
But yeah, thats all you would need
on the other hand you would need this if you want a hashmap/hashset that is unordered but you want to preserve equality under hashing, ie x = y => hash(x) = hash(y) right
which is a goal we want
so I feel like we should add this
:+1:
yeah the relationship with equality was my first thought here
so given where we ended up on equality for sets and dictionaries, I agree we should do unordered hashing for them too!
One other random thought/questions:
Wyhash (in it's default form) depends on 128bit multiplication. On some devices, that will not be directly support and will be slower (32 bit devices, many arm devices that aren't M1, wasm?). This may be a case where we want some sort of lowlevel that can switch out the default standard library hashing implementation. For example, absl, uses cityhash for those devices. Also, wyhash has an alternate form that swaps out the 128bit multiplication.
Do we have a way/want to support this? As I see it, the main upside is performance and the main downside is that hashes may not be the same across different devices (based on cpu architecture).
makes sense to me!
agreed- how would a lowlevel for that look like? what would we need to support? just something that takes a list of bytes and spits out a hash, or something more?
I was more thinking some sort of compile time constant to select between two different hashers both implemented in Roc.
Cause I think the hashing algorithms are fine to write in Roc.
yeah we can make that happen
Oh, just realized one other thing while porting. I have a function that crashes if it fails to get a value from a list. It should never fail (fail a bug in the implementation). How should I write that function in the standard? Should I somehow propagate up the result? Should I just default to 0? Should I just leave in the crash? Maybe make a better named function for the crash?
getByte : List U8, Nat -> U8
getByte = \list, index ->
when List.get list index is
Ok b -> b
_ -> 0 - 1
is there any way you can get rid of that branch?
i can give it a pass if you like if you have the source
would could expose a lowlevel unreachable to built ins but that isn’t ideal
I really don't think so, I ultimately have to get data from the list. I can't walk the list because I want to load data in chunks without the cost of managing state and accumulation. Also, at the end there is a chance that we actually have to shift backwards and reanalyze some bytes.
Also, it is for porting this: https://github.com/bhansconnect/roc-dict/blob/main/examples/Wyhash.roc#L121 to the standard library.
Basically a lot of looping in various chunk sizes (starting big and getting smaller) then loading a bunch of blocks of bytes as U64s. Then hashing those blocks.
The functions that actually grab from the list are they wyr3, wyr4, and wyr8 functions that load up to 8 bytes into a U64.
hashes being different shouldn't matter, right? Since we don't expose hash ordering anyway, people shouldn't be able to tell other than performance
also would getUnsafe work to get rid of the branch? We already have a lowlevel for that, and could just let Dict implement something that compiles down to it too
Hashes being different I guess wouldn't be exposed to the user at all (past performance). We aren't exposing the algorithm and there won't be a way to serialize a dictionary that includes the hashes.
getUnsafe should work... Thanks!
Last updated: Jun 16 2026 at 16:19 UTC