Stream: beginners

Topic: Hash builtin


view this post on Zulip Luke Boswell (Dec 08 2022 at 07:30):

Is there a way to use the Hash ability to make a function like hashMyList : List U8 -> U64? I've been looking a the test_gen examples but haven't managed to figure it out. I was thinking of using this as an index for a Dict U64 SomeOtherType.

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 07:40):

You want the hash as the key and not the list of bytes?

view this post on Zulip Luke Boswell (Dec 08 2022 at 07:41):

Can I just use the list of bytes as the key?

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 07:41):

That should work.

view this post on Zulip Luke Boswell (Dec 08 2022 at 07:41):

Is that efficient thought? Like what about if I am doing lots of isEq's everywhere?

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 07:42):

Lists of hashable things should be hashable. Though i guess i am not 100% sure that the derive will work correctly (definitely can make an opaque type and implement the hash ability manually though)

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 07:43):

Like outside of the dictionary you are doing is Eq of the lists of bytes? I am not sure I fully understand the question

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 07:46):

Also, to answer the general question on hashing: you can write a hash function for a type that dict will use. So you could define how to hash some opaque type and dict would use it. That can be done in a generic way over the hasher implementation.

As for hashing some arbitrary value, that is not exposed in roc anywhere. The dict hasher is intentionally not exposed to avoid end users depending on it (we want to be able to upgrade it). So currently to get a hash in roc, you would have to implement a hasher (or copy the one from dict and make that into a library)

view this post on Zulip Kevin Gillette (Dec 09 2022 at 05:05):

To ensure users don't depend on it, we could just have the platform provide a seed to Roc at program init which can be used as a something like a salt or shift for any dict hashing, such that the same process space will always yield consistent results for the same values, but across different runs with the same binary (or different binaries), you'll get different results. That would still leave us with legitimate referential transparency, yet immediately dissuade anyone from depending on particular values, because, unless they never change their code and never run it as more than a single immortal process (in which case the concern is moot), they'll never have the opportunity to reproduce observed values.

Additionally (or instead), we could expose an opaque type to represent hashes, with a function to wrap a U64 directly into an opaque hash (for those who know what they're doing), but no function to get a printable value back out, thus preventing anyone from observing the particulars of builtin dict hashing. We would also need functions to perform various operations on hashes, such as merging two hashes together, but frankly those would be useful anyways: consider hashing your own singly-linked list data structure (e.g. conses): it'd be far more convenient if you could just recursively ask the Roc stdlib to produce an opaque hash for each node value and, in turn, merge it with the previous accumulated/avalanched hashed node values, much as you might keep feeding bytes into a running sha hash.

Asking developers to come up with their own hash algorithms merely because we're worried about exposing details (which again, we can work around) seems like it's quite error prone, because almost everyone will come up with pretty bad hash values if left to their own devices. Instead, providing a system whereby we allow people who know what they're doing to have full control, while providing convenient hash combinators for everyone else, seems like a win.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 05:34):

Oh, the issue technically isn't with dict anymore. No matter the hashing algorithm, dict will act the same for end users. And users are able to add the hash ability to any type. So any type can be used with a dict.

The issue is if someone wants to hash a value and use the generated hash for something else. If we expose our hash algorithm and someone depends on it, we no longer can change it.

So the current idea is that hasher implementations will be provided by userland libraries. Someone just has to implement the ability. The first userland library will likely be a wyhash implementation made by copying what we currently have in the standard. Of course this requires a package system before it is convenient, but not a big deal to the end user once we have that.

Theoretically, we could expose some in the standard, but if we do so, they will be separate from the dict hashing algorithm. That way the dict hashing algorithm can be whatever we need. Even dynamic based on platform or data size. That's all.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 05:42):

Oh, extra note, we have a hasher ability that might be worth looking at. This is the generic API for hashing we currently expose:

https://github.com/roc-lang/roc/blob/74b3d14277a67388813531faad0a89649e5ad964/crates/compiler/builtins/roc/Hash.roc#L36

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 05:45):

It doesn't currently expose a combine, but that is because the hasher keeps the state and merges it automatically.

view this post on Zulip Kevin Gillette (Dec 09 2022 at 09:00):

Brendan Hansknecht said:

The issue is if someone wants to hash a value and use the generated hash for something else. If we expose our hash algorithm and someone depends on it, we no longer can change it.

Indeed, that's how I understood your meaning. That was my point about an opaque type: you can put values in, but you can't get them back out. You can use the stdlib to manipulate them and pass them into Dict or Set, but can't use or depend on their particular values in any way that would permit you to send them outside of the process.

The tradeoffs/usability for non-stdlib hashed data-structures would be an open question, but there are potentially some enforcement measures we could take in theory (allow the opaque type to get unpacked, but don't allow that to happen in a package that imports the platform, and don't allow libraries to store or expose the hash or derived values across the package boundary. Such could be achieved with flow analysis or a provenance system (such as a "secret" bit set in much the same way as we handle ref counters).


Last updated: Jul 05 2025 at 12:14 UTC