Stream: beginners

Topic: Max expression size


view this post on Zulip Luke Boswell (Oct 17 2023 at 03:50):

Is there a limit on the size of an expression? If I write a function which includes an expression with thousands of boolean and arithnetic operations?

I'm looking to write a function List U32 -> List [A,B,C,D] that is efficient and I figured branchless would give a fixed number of calculations per U32. The use case is mapping unicode code points into graphmeme cluster break classes.

The other idea was to create a top level Dict U32 [A,B,C,...] but not sure if that would be more efficient?

I can write a benchmark, but thought I might ask here first and save myself in case there is an obvous solution to reach for.

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 03:55):

Like a when expression?

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 03:55):

Just need something more concrete to understand?

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 03:59):

Or a doc with some technical info on what you want to do would be good

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:04):

Yeah, I could explain that better. I mean like a huge sequence of comparisons, like isControlClass = (u32 > 0x0 && u32 <= 0x9) || (u32 >= 0xB && u32 <= 0xC) || (u32 == 0x61C) || ...

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:07):

I might be mixing operators there... but I mean boolean.

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:11):

I have wondered if it would be efficient to use some kind of lookup table, but that seems like a lot of memory. I've done some research on that and maybe it can be reduced by using multple lookups in sequence. The goal is U32 -> [A,B,C,...N].

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:13):

I can imagine creating the lookup tables by sampling every possible value from 0x0 to 0x10FFFF, and then generating the equivalent roc List [A,B,C,...].

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:19):

https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt

view this post on Zulip Luke Boswell (Oct 17 2023 at 04:20):

This doc maps the unicode code points to a graphmeme cluster break property. The rules for when to split up into extended graphmeme clusters are determined using these property values.

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 04:33):

Ah yeah, I would do the giant Boolean expression and not waste the memory on something that big

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 04:33):

That said, if it can be represented as a chain of very small lookups or a few medium lookups, that could be faster.

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 04:34):

Depends mostly on the size of the lookup and what it optimized to

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 04:34):

The Boolean expression should optimize just fine

view this post on Zulip Brendan Hansknecht (Oct 17 2023 at 04:35):

If it is faster than the lookup mostly depends on if the lookup will be in cache and just how many instructions it becomes


Last updated: Jul 05 2025 at 12:14 UTC