I have some use cases for a bitset type to represent a set of flags in a highly compact way, and I was wondering if we could have this as a Roc built-in module.
I think it would be cool to declare a bitset type in terms of ordinary tags, but with obvious restrictions, like having only up to 64 tags in a single bitset type, and no payloads allowed. Then each tag could be represented as a bit in an integer, and automatically compile it to the smallest integer that can represent the type.
idk, I’m a big fan of data oriented design, and I know Roc isn’t trying to give you full control over the underlying data representations, but this seems like a nice semi-low-level feature that could be possible in Roc.
Hmmm.... So you want a bitset tag, not a generic bitset?
As in you want it to be a specific size and to have names for each variant
Normally I think of a bitset as a compacted listof bools.
A tag union without payloads under the hood is represented by the smallest unsigned integer that fits.
Is that similar to a bitset in this context?
Could you make one using Nominal types, wrap a U8 and provide helpers that do the bit twiddling?
@Brendan Hansknecht A list of bools that is properly compacted into a single integer is conceptually the same thing as a bitset, although you probably don't want the API to be the same as List. You really want it to look like a Set API.
I'm mostly wanting it to look something like Set [ Tag1, Tag2, Tag3 ], but where the entire Set instance is represented as a single integer (in this case of 3 members it would be a simple U8 under the hood).
@Luke Boswell Right, the tags are represented as integers, but unless I'm able to cast them to actual integer types, they cannot serve as members of a bitset, simply because you need to perform bitwise operations on their underlying integer representations in order to do the Set operations, plus you need to store the Set as a single integer internally.
Right now, here are the ways you could theoretically implement a bitset in Roc:
Write a Set type whose members are integers and hope LLVM (or perhaps the Roc compiler itself) will optimize it into a single integer representation under the hood.
Write a Set type whose members are tags (without payloads), and, just like #1 above, hope the compiler pipeline is smart enough to compile it into the compact integer form. So basically the same as #1 but relying on the underlying integer representation of tags.
Write a nominal type wrapper (with a standard Set API) around an int type, and then pass integers to that API, using aliases to identify the meaning of each integer. This wouldn't involve any tag union types, and is not semantically safe code, since there is no way to statically prevent you from mixing up members from two distinct bitsets.
Do the same as #3, but allow the user to pass in two functions (currently via Abilities, later via static dispatch) that convert tags into integers and vice versa, and then perform the bitset calculations. This adds type safety at the cost of potential runtime overhead (unless the compiler pipeline is smart enough to eliminate those mapping operations). #4 is the way I would currently approach this problem. But it's a bit more tedious than it needs to be, at least IMO.
If we really wanted Roc to have this as a built-in, then we could effectively get #4 without the tedious user-side code or the potential runtime cost.
In terms of compiler design, the way I would approach it is to take the type declaration (e.g. BitSet [ One, Two, Three ]), and restrict the tag union from having any tags with payloads. Then I would map the tags to integers in increments of 1 (in contrast to OCaml's polymorphic variants, which are represented as hashcodes of the variant names). The number of tags in the union would determine the size of the underlying integer (e.g. 28 tags would map to a U32). Then all the BitSet functions would compile to inlined bitwise operations for speed.
And you could actually allow an arbitrarily large number of tags (greater than 64) in a BitSet if you represent it as a tuple of integers and just eat the cost of additional bitwise operations on multiple integers in the tuple. In the end, it could work exactly like a Set [ ...tags ] type, but with a guaranteed compact representation. In fact, you could just implement this entire thing as a special case of the Set type, where a Set of payload-less tag unions is just automatically compiled into the ultra-compact representation.
And hey, I haven't dug into Roc's source code, so maybe something like that is already being done, but I just wanted to float the idea in case it isn't.
Actually, the more I think about it, the more I like the idea of just making this a special case of the built-in Set type. Perhaps a dedicated BitSet type is a bit niche, especially given the high-level nature of the existing built-ins. It's probably an awkward addition to the standard library.
If a Set of a payload-less, non-exensible tag union could be magically transformed into an integer or tuple of integers under the hood, then the user doesn't need to think about it, and they get the performance benefits for free.
Yeah, thanks for the clarity
Just making sure I understood your scope/goals
Currently, we would store at least a byte per tag instead of a bit per tag. I think on average 9 bits per tag in a set. Assuming the tag is less than 256 elemetnts
Yeah, that makes sense. I guess right now a Set of tags would likely provide the most compact representation, since it should just be a pointer to the Set instance, right? And I'm just talking about a data structure like this:
Item : {
name : Str,
data : Data,
flags : Set [ Flag1, Flag2, Flag3, Flag4 ],
}
Obviously you still have to allocate space for at least one byte per tag, but for the sake of cache locality and alignment, that would probably be the best use of space at this time. Unless I'm missing something...
Yeah, current best would be manual but twiddling. Next best is probably a flat list where you manually write a function to convert from tag to index.
Not far behind that would be a set of a tag (more overhead due to hashing and due to metadata), but still flat and not too bad.
More than just a pointer though. Pointer, length, capacity.
Also, in the long term, I would prefer to just automatically specialize for this
But roc doesn't really have support for that...so not sure the best way to have the compiler handle it.
Any set of tags with no payloads is just automatically a bitset. (Maybe with a limitation on tag size)
Gotcha. I guess for now I'll just do the Set of tags for ease of use and optimize it later into the bit-twiddling version if I really need the perf and this optimization isn't already in place.
I would be happy to take a look at the compiler internals to see if there is anything I can contribute there. I just have zero Zig experience and don't have a good mental model for the codebase just yet.
Currently dictionary is implement in pure roc. Given that roc has no type introspection, this actually makes it impossible to specialize here.
I think we want to move it into zig though (to get more perf gains).
I think that makes sense as a longer-term goal. Maybe this should just go in the issue tracker as a feature request so it doesn't get lost. I'll go write an issue soon.
And thanks for the background info, @Brendan Hansknecht
Last updated: Jun 16 2026 at 16:19 UTC