two stage lookup · beginners · Zulip Chat Archive

Stream: beginners

Topic: two stage lookup

Luke Boswell (Oct 27 2023 at 11:05):

Reading about the two-stage lookup tables for Unicode properties, and the implementation notes suggest the following is an optimal method.

I've been scratching my head thinking about how to do this in Roc, and I'm not quite sure I really understand how this works.

My understanding is that instead of just having a list that is Num.MaxU32 in length and then doing a getProperty = \u32 -> List.get u32, you could break this into two tables which are smaller, and then do a lookup on the first table to get the offset into the second table.

I'm probably getting tired, but how does this work? Is anyone able to provide a simple example of this?

I'm reasonably confident I can generate the tables using a script, and I feel like I could sample the desired values by looping through all possible values, but I think I'm missing a key ingredient here.

unsigned get_property(unsigned ch)
{
    const unsigned BLOCK_SIZE = 256;
    unsigned block_offset = stage1[ch / BLOCK_SIZE] * BLOCK_SIZE;
    return stage2[block_offset + ch % BLOCK_SIZE];
}

Joshua Warner (Oct 28 2023 at 02:56):

IIRC a key detail here is that some (most?) of the blocks will actually be identical, and thus can be duplicated. That's the primary advantage of this method over just doing the large table look up.

Joshua Warner (Oct 28 2023 at 02:57):

I saw a blog post a while ago about someone building a better compression method for these tables. I couldn't find it tho.

Last updated: Aug 17 2025 at 12:14 UTC