Tags are super useful and elegant and I really like them, but what the hell are they, mathematically speaking? They just seem like magic objects to me, even though I understand how they function in a practical sense. Like what do they compile to? Do they compile to anything at all, or are they just an abstraction that vanishes? What's actually happening when I decide to add a branch to a function that returns a tag?
The tag ID is represented as an integer at runtime.
For each expression in your program that has a tag type, the type checker will figure out the set of all possible tags that can end up there. Then that set of tags gets a runtime representation where each tag is assigned an integer value.
So the same tag name A
could be represented as different numbers in different places, depending on what other tags reach there.
Tags can also have payload values and in that case they form a data structure containing the payload values and the tag ID.
You did ask for a mathematical answer and I realise this is a programming answer!
Still helpful, it gives a skeleton that might help me double check my understanding when trying to understand a more mathy version. I guess part of why I want the maths answer is how neat and tight the rest of the type system is. It's just a highly comprehensible, unified system, and is otherwise conventional, but then you have tags that seem to blur the very lines between variable, name, and value.
They're so bizzare, even as they're so simple in practical use.
https://vimeo.com/653510682 around the 40 minute mark talks about the in-memory representation, kinda aimed at a C/C++ audience - I'm happy to elaborate on any of that!
I think of them as functions. So if you have MyType : [Foo Str]
then Foo
is a function Foo : Str -> [Foo Str]
and when you write Foo "some string"
you're applying Foo
to the argument "some string"
. This isn't what Foo
explicitly is in the language but I think it works conceptually.
It is however explicit in languages like Elm and Haskell. For example in elm:
> type Foo = Foo String
> Foo
<function> : String -> Foo
Here tags are just functions and you can use them in all the same ways. (in haskell they are called value constructors)
Here is the math interpretation:
You can think of tags as forming an ADT/sum type/coproduct (I'll use coproduct). Specifically, for a given tag union type [Zero, Add x y]
, we have the inclusion operations
Now, Roc has an important distinction between "open" and "closed" tag union types. The distinction is that a closed tag union cannot grow - that is, there is no way to transform the closed union type [Zero, Add x y]
to any other union type. However, the open tag union type [Zero, Add x y]*
can grow - that is, there is an operation I can perform to add a new variant to that type, thus making the type larger. We can thus think of some relationship like
If you think of all possible tag unions as sets, you can say adding the tag Sub x y
to [Zero, Add x y]*
is a function that maps [Zero, Add x y]*
to [Zero, Add x y, Sub x y]*
and leaves all other tag unions intact.
If you are familiar with category theory, we can say precisely that there is some category consisting of all open and closed tag unions, where growing an open tag union by one variant is an endofunctor mapping that particular tag union. And, there is a free functor from a open tag union to a closed one, and a forgetful functor the other way, that form a monad over the category
Okay from these replies and some independent research, I think I get how Tags work. Tags without payloads are basically Just Types but with weird rules that mean you can create them on the fly and treat them syntactically like values (if there's something deeper to this, tell me!). Without payloads, the type has only one inhabitant with no properties beyond that, but the with "with payload" version is still a little mysterious to me mathematically. The rules around scope and equality are still a little murky to me, which I think is where a lot of my confusion comes from.
Tags without payloads are enums.
Tags only disappear fully is there is only a single possible tag.
With a payload, a tag is justed an enum and a union of data.
Things get a tiny bit mor complex with recusion
Yeah on a moment-to-moment level I'd understood that's how they worked, I just found it bizzare (if extremely practical) that I could just write one function over here that spits out a Tag, another over there that accepts a Tag, and the two just know that they're talking about the same thing. Like that's not how things normally work, if you defined a type in a function then it wouldn't magically exist elsewhere. I guess it only works because Tags without values have no properties beyond their names and thus having things like that have such a wide scope won't cause things to stop working in one function just because you created a tag elsewhere. The payload is almost the same but opposite, a fungible bag of data unconstrained by its tag.
Oh, that is really more about our type inference than tags specifically
With hindsight I can see that, but it was just so strange because it clearly would always work, but felt somehow like it shouldn't. Like it should somehow cause problems, contradictions. I think the more abstract side is starting to click though.
Like I'm confused by backpassing too, but that doesn't bother me because that's currently so opaque that I know it's just a matter of getting it, whereas Tags felt more mysterious because it felt like there was something that wasn't clicking even though I understood the pragmatic use perfectly.
I think one key point to reconciling this is the fact that almost all types in Roc are structural, rather than nominal (except for opaque types, but we can disregard that here). Unlike in C/Rust/Java where if you declare, for example, and enum in a scope, each variant in the enum is scoped to the “name” of the enum. But in Roc, when enum variants (tags) compose into an enum, the enum does not have a “name” - its type is determined solely by the variants it consists of. That’s why passing a tag A between two functions that declare it separately works - because the types are compared as [A]=[A], which is true - whereas in, for example C++ if you try to compare OneEnum::A=OtherEnum::A, this is false, because the names of the enums are different.
Those are some great searchterms to investigate later. Yeah, the fact that you can be so loose with it, the fact that you don't need a scope or enclosing name, is what's so surprising to me. You can see in the other comments how it slowly dawns on me why this doesn't break things.
Ayaz Hafiz said:
If you are familiar with category theory, we can say precisely that there is some category consisting of all open and closed tag unions, where growing an open tag union by one variant is an endofunctor mapping that particular tag union. And, there is a free functor from a open tag union to a closed one, and a forgetful functor the other way, that form a monad over the category
By the way, I have had this thought at the back of my mind for a while, sorry for resurrecting an old thread :sweat_smile: ... This sounds like something there should be a paper about. Is anyone writing a paper about this? Or is this based on some paper already? Maybe someone should write a paper about it? :grinning_face_with_smiling_eyes:
I don't know of any academic literature about that interpretation exactly, but it's the same as the idea of "pointed sets" (as opposed to regular sets) in math which are well-studied.
Thanks, that gives me something to read about!
Hi, I got a bit nerd sniped by that category theory connection / question and tried to work out some details. I mostly ran into problems with that. I don't think the Tags
subcategory "just works" and would need some work to define what the Objects actually are that is more concrete than "all Roc types". Once that Object set has some mathematical modeling, that will directly include a model of open and closed tag unions.
If you actually do that modelling, you'll probably want to start with a smaller subset of Roc.
Anyway, here is what I wrote about that earlier, without trying to model what the Objects and Arrows look like.
I don't think we can throw categories on this and expect it to be simple.
growing an open tag union by one variant is an endofunctor mapping that particular tag union
Here a functor is a mechanism to turn one union type into another union type. The example functor here is "Adding Sub x y to the union".
[Zero] -> [Zero, Sub x y]
[Zero, Add x y] -> [Zero, Add x y, Sub x y]
[Roc] -> [Roc, Sub x y]
[Zero, Add x y, Sub x y] -> [Zero, Add x y, Sub x y]
Now the last one is a bit weird. It doesn't actually change the type because "Sub x y" is already in there. But it hints at a problem this example functor has:
[Sub Marine] -> TYPE MISMATCH, does not compile.
So you'll need to define each such functor on a separate subset (sub-category) of the "Tags" category.
there is a free functor from a open tag union to a closed one, and a forgetful functor the other way
I just arrived here from a podcast on Roc so I had some reading up to do on open and closed. But I don't think you can forget either way. I failed to construct the forgetful functor.
Here is what I tried:
Given a Roc function f : [Red Str] -> [Red Nat]
(e.g. string length) we can extend it to f*: [Red Str]* -> [Red Nat]*
by defining
f* : [Red Str]* -> [Red Nat]*
f* = \tag ->
when tag is
Red str -> f str
x -> x
So in this instance the functor looks ok.
I don't know if an open tag union is actually allowed in return position in Roc. I assume it is allowed, otherwise we wouldn't have much of a Tags
category to start with.
But let's consider g: [Red] -> [Green Str]
with Red -> Green "Apple"
as definition.
The extension would be defined on g*: [Red]* -> [Green Str]*
- but where does this map Green 7
? It is not allowed to map to Green 7
, because the type [Green Str]*
guarantees that any value tagged as Green
must carry a Str
payload.
I had no success trying to define a functor going the other way either :-(
More on open/closed from the Tutorial
If you have an open union, that means it can accumulate more tags through conditional branches.
If you accept an open union, that means you have to handle the possibility that it has a tag you can't know about.
To me this sounds very much like "an open union you have" and "an open union you receive" are actually two different things that just happen to share a name.
The "open union you receive" seems to be a proper type. It can appear it type signatures.
The "open union you have" seems to be the type of an expression - and it feels a lot more like a "closed union" to me. Because I can pass the value of the expression to functions that require a closed union, I know that only values of type "closed union" can actually be the value of this expression.
Last updated: Jul 06 2025 at 12:14 UTC