hash flooding attacks · ideas · Zulip Chat Archive

I just had an idea about hash flooding attacks - what if we encouraged decoding HTTP headers in servers by using the Decode ability to directly parse only the headers you care about?

Richard Feldman (Sep 15 2022 at 05:22):

in other words, rather than doing what most languages do (parsing the header bytes into a dictionary, and then letting you access that dictionary and do further decoding kinda like you do with environment variables, where you parse individual header strings into numbers/booleans/etc as needed), the server provides some function along the lines of:

headers : Request -> val | val has Decode

Richard Feldman (Sep 15 2022 at 05:24):

and it parses the raw header bytes directly into whatever data structure you want - e.g. let's say I have a record of type { contentLength : U64, contentType : Str }, the decoder could translate camelCase record field names into Capitalized-Kebab-Case, and just parse out exactly the headers I'm looking for in a single pass through the bytes

Richard Feldman (Sep 15 2022 at 05:25):

Richard Feldman (Sep 15 2022 at 05:27):

alternatively, if you need to access all the headers (because for whatever reason you want to iterate over them) you can always ask for them as a plain List, which is both the most efficient way to do that and also not vulnerable to hash flooding either

Brendan Hansknecht (Sep 15 2022 at 06:15):

How does the decode ability work with ignoring data? Also, how does decode work with scanning once while ignoring field ordering? Just haven't thought about those depths of it

Brian Hicks (Sep 15 2022 at 10:35):

I really like the idea extemporalgenome shared on GH: parse the common ones into a struct (constant time lookups, yay!) and hold the rest in some other data structure.

That said, I don't see how the "other data structure" with this signature can be something other than either an array of bytes or a linked list, both of which are gonna exhibit the perf characteristics that we wanted to avoid. (Unless the initial scan pass just found byte offsets? How would this actually work?)

Richard Feldman (Sep 15 2022 at 11:01):

yeah and the hashing required just to do the insertions alone can be enough for the attack to succeed

Brendan Hansknecht (Sep 15 2022 at 14:36):

Due to deciding on index map as the base of our dict, the hashing alg doesn't matter, correct? As such, to stop has flooding, we just need to seed the alg with random data when creating a dict and then it becomes super hard to flood due to being unique. Flooding requires knowing the target. Also, setting practically limits on where to stop loading headers should help.

Brendan Hansknecht (Sep 15 2022 at 14:36):

Richard Feldman (Sep 15 2022 at 15:43):

that's a possibility, but seeding the algorithm with random data slows down the performance of all hashmaps - e.g. SipHash is a lot slower than Wyhash pretty much across the board

Richard Feldman (Sep 15 2022 at 15:44):

a lot of languages pay that cost, but I'd rather not make all hashmaps slow just for the sake of this one use case (HTTP headers in web servers) if we can avoid it, especially when it seems that hash flooding can be mitigated in server-specific code without affecting all other users of the stdlib!

Brendan Hansknecht (Sep 15 2022 at 17:32):

Brendan Hansknecht (Sep 15 2022 at 17:33):

Brendan Hansknecht (Sep 15 2022 at 17:34):

Since hashing tends to be the fast part and the lookup is slow, this should have minimal costs

Richard Feldman (Sep 15 2022 at 18:00):

Richard Feldman (Sep 15 2022 at 18:01):

I'm still overall skeptical that it's even necessary, but good to know it's an option!

Richard Feldman (Sep 17 2022 at 04:00):

Richard Feldman (Sep 17 2022 at 04:02):

the linked writeup (which is super detailed and includes examples of how to test a server for hash flooding susceptibility!) includes one of the mitigations we talked about early on here: don't alter the hash function itself, but rather have the collision resolution use something with a nonlinear traversal time, e.g. another hash map with a different hashing algorithm

Richard Feldman (Sep 17 2022 at 04:04):

that reason for rejection sounds disappointing to me; the original talk introducing hash flooding explicitly cites this category of mitigation (improving the asymptotics of collision resolution) as an effective defense, and its seems like their testing harness is a pretty effective way of testing out whether the attack is still reproducible!

Richard Feldman (Sep 17 2022 at 04:05):

JSON dictionaries are perhaps a stronger case than HTTP headers for having the stdlib Dict be resistant to hash flooding

Richard Feldman (Sep 17 2022 at 04:05):

because certainly decoding JSON into a Dict will be something people will want to do, and often that JSON will be coming from an untrusted input (e.g. HTTP request)

Richard Feldman (Sep 17 2022 at 04:07):

and in that scenario, it seems like the only way someone could mitigate this (assuming Dict itself didn't) would be to cap the input size to a small enough number of bytes that executing the attack would be infeasible

Richard Feldman (Sep 17 2022 at 04:07):

but of course that limitation might not be compatible with their use case; it might be important that they allow untrusted inputs large enough to perform a successful hash flood attack

Richard Feldman (Sep 17 2022 at 04:11):

Richard Feldman (Sep 17 2022 at 04:13):

with a careful enough design of the fallback mechanism, it seems plausible that it would never affect performance in any noticeable way if an attack is not happening, while still being able to effectively mitigate an attack

Richard Feldman (Sep 17 2022 at 04:15):

relatedly, I'm not sure that the fallback map would need to use randomness; if it used a different hashing algorithm from the main map, and especially if it used one that was good at minimizing collisions (at the expense of hashing speed, say), it seems like it could potentially be infeasible for an attacker to find a set of keys that would actually collide on both hashing algorithms in practice

Richard Feldman (Sep 17 2022 at 04:18):

but if it needed to, as Brendan noted earlier, it would always be possible to do that; after the 4 "normal" collisions (or whatever number), it could generate a random seed on the fly and store that on the heap for future collision resolution. I'd like to avoid that if possible though, because we'd need to introduce another primitive for all hosts to implement (e.g. roc_random or something, similar to roc_alloc) unless we maybe used something like heap pointers as seeds - which of course would vary by Dict, but wouldn't be terribly random

Brendan Hansknecht (Sep 17 2022 at 05:21):

I am very confused, how does the second hash map "strictly improve performance". Haven't read the write up yet, but that just does not sound possible.

EDIT: after reading the article, i think they mean strictly improve performance in the case of a hash collision attack, I do no think they mean improve performance in the general case.

Brendan Hansknecht (Sep 17 2022 at 05:52):

Also, if we are already using indexmap because of ordering, salting would not affect the roc app at all. It would not be visible in roc user space at all. This means that the more complex solution of multiple hash maps shouldn't be needed in our case.

Also, depending on the goal, it could be a per compile random value instead of a new random value (that would avoid host additions at the cost of some security).

Long term, I think that host functions of this sort are of little cost because they should be possible to make into a library in the host language and shared. Even without a library, it is likely about 5 lines of code.

Brendan Hansknecht (Sep 17 2022 at 05:58):

I don't think you can have a slow hashing algorithm for the second hashing algorithm. I am pretty sure every miss in the first map needs to check the second map. As such, if the second hashing function is slow, lookup time for items not in the map would be very slow.

Richard Feldman (Sep 17 2022 at 20:34):

Brendan Hansknecht (Mar 04 2023 at 05:22):

Brendan Hansknecht (Mar 04 2023 at 05:24):

Brendan Hansknecht (Mar 04 2023 at 05:26):

U128 has the expected exponential time cost when has flooding. Every time we insert double the number of elements, it takes 4 times longer to perform the operation.

Brendan Hansknecht (Mar 04 2023 at 05:28):

For Str, since we have to allocate and copying into a List in order to get at the bytes, hashing takes a ton of time. As such, the cost of hashing overshadows the cost of the probing for inserts. So until we have seemless slices, I guess hash flooding doesn't make Str hashing much worse in Roc. Of course with enough elements, eventually hash flooding would effect strings as well.

Stream: ideas

Topic: hash flooding attacks

Richard Feldman (Sep 15 2022 at 05:21):

Richard Feldman (Sep 15 2022 at 05:22):

Richard Feldman (Sep 15 2022 at 05:24):

Richard Feldman (Sep 15 2022 at 05:25):

Richard Feldman (Sep 15 2022 at 05:27):

Brendan Hansknecht (Sep 15 2022 at 06:15):

Brian Hicks (Sep 15 2022 at 10:35):

Richard Feldman (Sep 15 2022 at 11:01):

Brendan Hansknecht (Sep 15 2022 at 14:36):

Brendan Hansknecht (Sep 15 2022 at 14:36):

Richard Feldman (Sep 15 2022 at 15:43):

Richard Feldman (Sep 15 2022 at 15:44):

Brendan Hansknecht (Sep 15 2022 at 17:32):

Brendan Hansknecht (Sep 15 2022 at 17:33):

Brendan Hansknecht (Sep 15 2022 at 17:33):

Brendan Hansknecht (Sep 15 2022 at 17:34):

Richard Feldman (Sep 15 2022 at 18:00):

Richard Feldman (Sep 15 2022 at 18:01):

Richard Feldman (Sep 17 2022 at 04:00):

Richard Feldman (Sep 17 2022 at 04:02):

Richard Feldman (Sep 17 2022 at 04:04):

Richard Feldman (Sep 17 2022 at 04:05):

Richard Feldman (Sep 17 2022 at 04:05):

Richard Feldman (Sep 17 2022 at 04:07):

Richard Feldman (Sep 17 2022 at 04:07):

Richard Feldman (Sep 17 2022 at 04:11):

Richard Feldman (Sep 17 2022 at 04:13):

Richard Feldman (Sep 17 2022 at 04:15):

Richard Feldman (Sep 17 2022 at 04:18):

Brendan Hansknecht (Sep 17 2022 at 05:21):

Brendan Hansknecht (Sep 17 2022 at 05:52):

Brendan Hansknecht (Sep 17 2022 at 05:58):

Richard Feldman (Sep 17 2022 at 20:34):

Brendan Hansknecht (Mar 04 2023 at 05:22):

Brendan Hansknecht (Mar 04 2023 at 05:22):

Brendan Hansknecht (Mar 04 2023 at 05:24):

Brendan Hansknecht (Mar 04 2023 at 05:26):

Brendan Hansknecht (Mar 04 2023 at 05:28):