Stream: ideas

Topic: maximum number of tags in a union


view this post on Zulip Richard Feldman (Apr 22 2022 at 12:23):

what if the maximum number of tags in a tag union was 256?

view this post on Zulip Richard Feldman (Apr 22 2022 at 12:24):

can anyone think of a real-world use case where that would cause a problem?

view this post on Zulip Richard Feldman (Apr 22 2022 at 12:24):

e.g. you could have [ A, B, C, ... ] up to 256 of those, but not 257+ in the same union

view this post on Zulip Richard Feldman (Apr 22 2022 at 12:25):

of course you could nest them, so if you really needed more than 256 alternatives, you could break one of them down like [ A, B, C, ..., Other [ A, B, C, ... ] ]

view this post on Zulip Richard Feldman (Apr 22 2022 at 12:26):

the context is that I'm working on some serialization/deserialization APIs, and if tags can always be represented as 1 byte in memory, then we always know how to serialize and deserialize them; if they're potentially bigger than that, we need to store more information (e.g. the size) in the serialization, and serialization gets more complicated and slower

view this post on Zulip Richard Feldman (Apr 22 2022 at 12:28):

so there's a concrete runtime performance downside to automatically upgrading to U16 if we go over 256 tags, and I'm trying to see if there's a real-world downside anyone's aware of that limiting to 256 at compile time would cause

view this post on Zulip Maxime Tremblay (Apr 22 2022 at 12:39):

Total roc beginner here, but in rust and haskell, most of my enums contain 2 to 5 variants. The only time I use more than that is when building parsers. But, I never get close to 256. And, as you mention, you can always do [Part1 [A1, B1, C1 ...], Part2 [A2, B2, C2 ... ], ...]

view this post on Zulip Folkert de Vries (Apr 22 2022 at 12:43):

I remember in Clean there is a module containing all html tags as an ADT

view this post on Zulip Folkert de Vries (Apr 22 2022 at 12:44):

I believe that contained more than 256 members

view this post on Zulip Folkert de Vries (Apr 22 2022 at 12:44):

but... I don't consider that to be good code

view this post on Zulip Richard Feldman (Apr 22 2022 at 13:03):

yeah plus you can have custom elements with arbitrary strings for HTML tag names, so there's no real way to enumerate all of them

view this post on Zulip Martin Stewart (Apr 22 2022 at 13:25):

At my previous job we had an Elm custom type listing every sport recognized by the Swedish government (for insurance purposes). It had 305 variants.

view this post on Zulip Richard Feldman (Apr 22 2022 at 13:28):

hmm, how annoying would it have been to nest them somewhat?

view this post on Zulip Martin Stewart (Apr 22 2022 at 13:33):

Probably not so bad since it would still be possible to treat it like a flat list by always writing

allSports =
    [ SportGroup1 Swimming
    , SportGroup1 ...
    , SportGroup2 Tennis
    , SportGroup2 ...
    ]

toString sport =
     when sport is
        SportGroup1 Swimming -> "Swimming"
        SportGroup1 ... -> ...
        SportGroup2 Tennis -> ...
        SportGroup2 ... -> ...

view this post on Zulip Martin Stewart (Apr 22 2022 at 13:34):

Worth mentioning, I wrote a serialization package for Elm. In my case, I chickened out and used a UInt16 instead :sweat_smile:

view this post on Zulip Fabian Hoffmann (Apr 22 2022 at 14:06):

Martin showed it is easy to work around a 256 limitaion. Question is, what woul happen if I try to do more? I would be fine with a nice error message that explains the nesting solution.

view this post on Zulip Folkert de Vries (Apr 22 2022 at 14:28):

hmm, we can only know the actual max size of a tag union during monomorphization

view this post on Zulip Kevin Gillette (Apr 22 2022 at 14:31):

Since Roc has implicit tag unions, that would suggest to me that languages with non-implicit unions/enums aren't fully representative of what this might look like here.

Since error handling is the main case I'm aware of in which tags can accumulate implicitly, would that provide us with a fair idea of whether 256 is enough?

I can _imagine_ a non-trivial imperative system having more than 256 ways in which something can fail, when you consider perhaps, an HTTP API service with various kinds of input validation, data-layer validation (before writes and after reads), conflict resolution, many types of network-related failures, and then the possibility that each layer will contextualize some low level failures into high level terms (instead of NetWriteFailure, it might be UserCreateFailure, with NetWriteFailure still showing up in other places as well).

That said, I don't know if an explosion of error tags will happen to Roc since it can't directly perform world-interacting side effects, thus must handle its existing error tags, in practice, before asking the platform to cause more error-producing side effects.

view this post on Zulip Kevin Gillette (Apr 22 2022 at 14:39):

In any case, if 256 ends up being too limited, they'll open project issues. If the majority of unions have fewer than 256 members, we could signal by setting the high bit to 1 to indicate a special case (which gives us 7 other signal bits as well). Alternatively, we could reserve value 0 or 255 for the special case. The special case could mean "consult other, wider memory in a known offset/address for these tags"

view this post on Zulip Kevin Gillette (Apr 22 2022 at 14:45):

Does 1 byte buy us a lot compared to 1 word? Bytes only pack efficiently with other bytes, so I'd imagine that in many contexts on the stack, our 1 byte tag variable would be followed by 3 or 7 bytes of wasted padding.

A list of tags would certainly benefit and pack nicely, especially if they're all non-parameterized tags, but if we only have 12 such tags in a particular case, for example, a packed list of 4-bit tag elements would have twice the memory efficiency.

view this post on Zulip Zeljko Nesic (Apr 22 2022 at 17:02):

I haven't got near 256 number of variants, max was in Elm 113.

That being said, I have a concern along the line of Kevin's: on some "bigger" project, error tags might accumulate really fast? At the end you might end up with a open union that has way more than 256 tags. The problem is that you would have to go back and wrap some of them in other tag, which then defeats the purpose of open tags.

Maybe they should pay performance penalty if they go over 256, but that is just some arcane stuff at that point.

view this post on Zulip Martin Stewart (Apr 25 2022 at 21:13):

A use case I thought of that would exceed 256 tag unions.

Suppose someone wanted to define some UI with inputs fields. Each input field would have an id so that the model could track which input has focus (something like this type Model ids = { selected = Result {} ids }).

For the ids, instead of something wasteful and typo prone like Str, you'd instead use tag unions

button : id, msg, Str -> Ui [ id ]* msg
button id onPress text = ...

and then all the ids being used would accumulate in view functions

footer : Ui [ FaqButtonId, TosButtonId, HomepageButtonId ]* Msg
footer =
    Ui.column
       [ button FaqButtonId PressedFaqButton "Faq"
       , button TosButtonId PressedTosButton "Terms of service"
       , button HomepageButtonId PressedHomepageButton "Homepage"
       ]

until the top level view function which would have all the ids (potentially much more than 256 of them!)

Maybe I've made a mistake somewhere that makes this impossible to implement. But if it's possible I think it's a reason to make tag unions support more than 256 variants.

view this post on Zulip Richard Feldman (Apr 25 2022 at 21:22):

interesting!

view this post on Zulip Richard Feldman (Apr 25 2022 at 21:23):

I actually recently realized that there's a design that can gracefully upgrade to U16 without causing problems anyway, so I think I can mark this as resolved

view this post on Zulip Richard Feldman (Apr 25 2022 at 21:24):

but it's great to know about use cases like this in case something similar comes up in the future, so thanks for sharing it!

view this post on Zulip Martin Stewart (Oct 26 2023 at 14:40):

Very late reply I know but I had another use case* in Elm where we ended up creating a enum with a little over 3000 variants. Still fits in a U16 of course but I thought it was worth noting.

*It was for enumerating all the possible outcodes for a UK postal code


Last updated: Jun 16 2026 at 16:19 UTC