this is interesting...something we could think about for our hashing API: https://purplesyringa.moe/blog/thoughts-on-rust-hashing/
Only starting to skim this, but the author is already pointing out many of my complaints with the hashing api
Though roc will not end up being accidentally quadradic (thanks index map)
How this API should look like and whether it can be shoehorned into the existing interfaces remains to be seen. I have not started work on the design yet, and perhaps this article might be a bit premature
Oh, they don't really have any sort of suggestion or solution
Also, this is why in roc, hasing a list of ints is super slow, but hashing an array of bytes is fast.
We really need some sort of way to aggregate things. Especially all of our primitives types that could just be hashed as a giant chunk of bytes
I tried at one point to do a streaming hasher (probably should try again), but the perf was pretty abysmal due to having to copy all bytes over to a separate list for accumulation. I think that probably could be done a lot smarter, but as mentioned in the artical, it is adhoc in an inconvenient way. This is especially true for fast hashers for hash tables
For more robust and secure hashes, the copy matters a lot less
All hash algorithms really just want to operate on a giant array of bytes. The more things that can be viewed that way, the better
yeah, one vague idea I had was to maybe allow our derived hashing implementations to cheat and view things as raw bytes even though a handwritten pure roc implementation couldn't do that
yeah, though you probably shouldn't hash padding (though I guess you could, but would make hashing system dependent) and you also can't do that with more complex types that contain pointers.
Also, I guess this is their core idea:
In my opinion,
Hasher
andHash
are a wrong abstraction. Instead of theHash
driving theHasher
insane, it should be the other way round:Hash
providing introspection facilities andHasher
navigating the hashed objects recursively. As a bonus, this could enable (opt-in) portable hashers.
I feel like the idea here when mapped to roc would be to have the hasher use the hash spec to create a bespoke lambda that will hash in the correct block size. It will aggregate the input into blocks and would be smart enough to be able to aggregate many list elements into a single block instead of hashing each of them separate.
so instead of thinking from smallest unit to biggest, it would be using a spec to think about the whole picture at once.
Still not sure how it would work in practice, but worth messing around with the idea.
agreed!
we'll have a perfect opportunity to do that with static dispatch
Also, adding small list optimization for list u8, probably would help a lot here. Then can just convert any numeric primitive straight to bytes without an allocation
my biggest concern is that the logic to come up with the correct hashing would be quite complex and expensive. I guess if the logic generates a lambda, it could be run once and store in a dict object for example.
I've been thinking about this more. I feel like for the general case for hashing a list of things, the best bet would be to hash it as an array of bytes, but have a bit of logic that when loading a chunk of the list will automatically zero out all padding bytes.
The two main pain points are:
But I'm the general case, this probably would be way faster.
If user defined hashes was only allowed to decide extra bytes to consider padding (basically ignore some fields), it would also work in this form. But more complex user padding would still be problematic
Last updated: Jul 06 2025 at 12:14 UTC