nominal types · ideas · Zulip Chat Archive

Stream: ideas

Topic: nominal types

Richard Feldman (Oct 25 2024 at 23:00):

I've been thinking a bunch about https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/Alias.20analysis.20error.20across.20modules/near/438500819 and I think that despite my initial reservations, it's worth talking through what it could look like in Roc to change from "opaque types" to "nominal types" - specifically for tag unions.

Richard Feldman (Oct 25 2024 at 23:03):

for example, here's one sketch of a design idea:

change := to no longer mean "opaque type" but instead "nominal tag union" - e.g. instead of Email := Str we might do Email := [Wrapper Str] (the usual implements stuff Just Works the same way it does today)
the tags in this nominal union could be referred to by qualifying them using the type, e.g. in that Email example, I could match on Email.Wrapper str -> ...
modules can choose to expose these tags, e.g. exposes [Email exposing [Wrapper], ...] - if they're exposed, then other modules can import the Email type and use Email.Wrapper, but otherwise they can't (and the type is opaque)
you could also bring these tags into scope by using e.g. import Email exposing [Email exposing Wrapper]

Richard Feldman (Oct 25 2024 at 23:05):

one thing this would immediately let us do is to say:

Bool is now a nominal tag union, defined as Bool := [True, False]
we automatically import its True and False tags, and expose them
this means when you put True into the repl, it says True : Bool - and pattern matching etc still works the normal way
it also means you can't construct a userspace tag named True anymore, or False

Richard Feldman (Oct 25 2024 at 23:07):

this would still let us customize how Bool encoding and decoding works, which was the original motivation for making it opaque

Richard Feldman (Oct 25 2024 at 23:08):

(we could consider doing the same to Result, which has also come up in the past)

Richard Feldman (Oct 25 2024 at 23:11):

going back to https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/Alias.20analysis.20error.20across.20modules/near/438500819 - currently, the best solution to this problem is to disallow recursive tag unions unless they're opaque. However, this does create a problem, which is that it's important that we have a rule that opaque types can't be sent to the host.

Besides the fact that the future replay feature can't work otherwise, on a fundamental level, signaling that a type is Opaque means "I am reserving the right to modify the internal implementation details of this thing, including its structure" - but modifying the internal structure of something you send to the host is undefined behavior; the host relies on statically knowing that type's layout, so if you say "I am going to feel free to change this thing's layout" while also sending it to the host, that would be among the biggest footguns imaginable. :sweat_smile:

Richard Feldman (Oct 25 2024 at 23:12):

and yet, we want to be able to send recursive types to the host!

Richard Feldman (Oct 25 2024 at 23:12):

nominal types would be the best fix for that. It would be a way to say "here is this thing's exact structure, it's not opaque, and yet it is nominal, so that bug can't happen."

Richard Feldman (Oct 25 2024 at 23:13):

anyway, that's the general shape of the idea. I'm curious what others think!

Richard Feldman (Oct 25 2024 at 23:58):

oh yeah I forgot to mention - nominal tag unions would also give us a way to have enumerations where you associate each tag with a number or a string or something like that

Richard Feldman (Oct 26 2024 at 00:07):

and similarly, nominal record types could be a way to specify serialization customizations, e.g. "this field should encode and decode as if it had this name instead of its actual name"

Richard Feldman (Oct 26 2024 at 00:08):

there might also be some path for better null handling in json there, not sure

Luke Boswell (Oct 26 2024 at 01:45):

This sounds great :grinning:

Brendan Hansknecht (Oct 26 2024 at 06:31):

What if I want to send a roc dictionary to the host? :grinning_face_with_smiling_eyes: That is opaque but maybe something a host would want. I guess that would lock it to a specific version of roc potentially if we update the dict impl....still might be useful in a handful of cases especially where perf really matters. Though maybe in those cases, it is basic to control the data structures and unwrap them before passing to the host

Brendan Hansknecht (Oct 26 2024 at 06:32):

Otherwise I don't really grok the tradeoffs between nominal and opaque, but nominal tags sound nice in general.

Jasper Woudenberg (Oct 26 2024 at 08:26):

Besides the fact that the future replay feature can't work otherwise, on a fundamental level, signaling that a type is Opaque means "I am reserving the right to modify the internal implementation details of this thing, including its structure" - but modifying the internal structure of something you send to the host is undefined behavior; the host relies on statically knowing that type's layout, so if you say "I am going to feel free to change this thing's layout" while also sending it to the host, that would be among the biggest footguns imaginable. :sweat_smile:

I don't completely understand this point. I get that making a type opaque means reserving the right to make changes to the internal structure, but who is making that promise to whom? I'm trying and failing to come up with a scenario where this becomes a problem, here's the ones I considered:

If it's the platform author making the promise to application authors than I don't think it matters. If as a platform author I change the internal implementation of an opaque type then I can also change the host code to match.

If it's an application author using that promise to abstract one part of the application from another than I also don't how it matters. The host cannot statically known application-defined types because the application is written and compiled later. From the host's perspective all application-defined types might as well be opaque.

The one scenario I might see a problem is if a platform directly depends on a Roc library and generates glue for the library's opaque types structure in host code. Then if the library updates the structure of the opaque type the host will break. Forbidding sending opaque types to the host would prevent this, but refusing to generate glue for opaque types would too.

Richard Feldman (Oct 26 2024 at 12:40):

Brendan Hansknecht said:

What if I want to send a roc dictionary to the host?

oh yeah, builtins are always fine - the host knows their layout based on the Roc version

Sam Mohr (Oct 26 2024 at 12:48):

currently, the best solution to this problem is to disallow recursive tag unions unless they're opaque

What about anonymous error unions? For example:

tryFoo = \x ->
    result1 = try x ? FooErr
    result2 = try tryBar 123 ? FooSawBarErr

   "success"

tryBar = \y ->
    result1 = try y ? BarErr
    result2 = try tryFoo "abc" ? BarSawFooErr

    200

FooErrors : [FooErr, FooSawBarErr BarErrors]

BarErrors : [BarErr, BarSawFooErr FooErrors]

Would this be allowed?

Richard Feldman (Oct 26 2024 at 12:52):

those don't look recursive to me, unless I'm missing something, so should be fine

Sam Mohr (Oct 26 2024 at 12:52):

Whoops, naming

Sam Mohr (Oct 26 2024 at 12:53):

The point is that we currently let users "just propagate with try". It seems that without allowing for recursive tag unions, then this wouldn't be allowed, and they'd have to figure out how to propagate mutually recursive errors

Sam Mohr (Oct 26 2024 at 12:54):

Maybe with a try tryFoo "abc" ? BarErrors.BarSawFooErr or equivalent

Sam Mohr (Oct 26 2024 at 12:54):

Which isn't that bad

Richard Feldman (Oct 26 2024 at 12:54):

:thinking: do mutually recursive errors come up in practice? I can't think of a scenario where that would come up

Sam Mohr (Oct 26 2024 at 12:55):

I agree, not a popular usage of tag unions

Richard Feldman (Oct 26 2024 at 12:59):

Jasper Woudenberg said:

The one scenario I might see a problem is if a platform directly depends on a Roc library and generates glue for the library's opaque types structure in host code. Then if the library updates the structure of the opaque type the host will break. Forbidding sending opaque types to the host would prevent this, but refusing to generate glue for opaque types would too.

yeah this is the scenario that's a problem. I don't think forbidding glue generation is an ok solution because glue generation is optional (e.g. today most platform development is done without glue)

Richard Feldman (Oct 26 2024 at 13:03):

I also don't think it would be useful to have a rule of "you can use opaque types but only if they came from the platform, not from third party packages" - if you can only use opaque types that you have access to unwrap, then it's simpler to have the rule be "ok so just unwrap them before you send them"

Jasper Woudenberg (Oct 26 2024 at 13:27):

yeah this is the scenario that's a problem. I don't think forbidding glue generation is an ok solution because glue generation is optional (e.g. today most platform development is done without glue)

Yeah, I'm doing that myself in Zig at the moment :sweat_smile:.

If it's platform development that benefits from the 'opaque types cannot be sent to the host' rule, is it then fair to say this proposal is trading application author convience for platform author convenience, by not going with the 'recursive types need to be opaque' approach?

For a platform author not using glue to integrate with an opaque type from a library they'd have to look at the library source code and find the opaque type implementation. I guess that sounds so iffy to me I can't really imagine a host author doing it by accident. Wonder if a "don't do that!" would be enough, considering how much other opportunities platform authors have to mess up given how low they are in the stack.

Richard Feldman (Oct 26 2024 at 13:52):

oh the bigger problem with the third party thing is that version ranges mean you can't possibly know what layout you're getting

Richard Feldman (Oct 26 2024 at 13:52):

because opaque types can change their internal structure as a nonbreaking change

Richard Feldman (Oct 26 2024 at 13:53):

so let's say I'm a platform author, I depend on v1.0.0 of a third party package, I look at its layout and code my host against that

Richard Feldman (Oct 26 2024 at 13:53):

then they release 1.0.1 which has a different internal structure, the application selects that, and now we have UB

Jasper Woudenberg (Oct 26 2024 at 14:21):

Sure, that makes sense. I guess I'm wondering how likely it is someone will try binding host code to the implementation of an opaque type from an external library in the first place, and so how much we're willing to sacrifice application development to prevent it.

I say that because writing C-bindings between languages feels like an advanced skill, much more so than the rule of thumb "don't integrate with the private implementation details of external code", so I've trouble imagining someone trying this without understaning the problems with it. I'm open to being biased on the order I learned things myself though :sweat_smile:.

Brendan Hansknecht (Oct 26 2024 at 16:18):

As a note, I would not be surprised if recursive tags become semi common in the future of roc. In essentially every language that has errors instead of exceptions, some form of library is written that creates deeply nested errors to essentially build up context/a stack trace. This almost always eventually hits some form of mutual recursion.

This could easily happen in roc if at every call that could fail a user simply wraps the error with more context and returns

Brendan Hansknecht (Oct 26 2024 at 16:22):

Also, I'm a bit confused by some of the initial statements, is this fully replacing opaque types or is it specific to tag unions? Cause you also mention maybe adding nominal records? Just don't fully understand the scope here

Richard Feldman (Oct 26 2024 at 18:33):

it's a pretty vague idea; I'm not sure either! :big_smile:

Richard Feldman (Oct 26 2024 at 18:34):

this isn't at the stage of a proposal or anything, just trying to feel out the general idea

Richard Feldman (Oct 26 2024 at 18:35):

Brendan Hansknecht said:

As a note, I would not be surprised if recursive tags become semi common in the future of roc. In essentially every language that has errors instead of exceptions, some form of library is written that creates deeply nested errors to essentially build up context/a stack trace. This almost always eventually hits some form of mutual recursion.

but none of those languages have polymorphic sum types; is there still demand for that in practice if you do? (I hope not, but I guess we'll see!)

Brendan Hansknecht (Oct 26 2024 at 18:37):

Sounds reasonable to try

Isaac Van Doren (Oct 27 2024 at 17:47):

It seems to me that adding nominal tag unions would effectively add nominal records at the same time because you could define a tag union with a single tag and a record as the payload. We can already do something similar with anonymous unions today but it’s not quite the same because you don’t need access to the specific constructor to make one.

Isaac Van Doren (Oct 27 2024 at 17:51):

This seems like a good solution to the problem and I think it would be nice to have nominal tag unions at times. That being said, it’s unfortunate that it would increase the size of the language. I could see the presence of nominal and structural unions being very confusing for beginners.

It could also introduce more decision points where you have to ask yourself which tool to use. Maybe this wouldn’t be much of an issue if it is explicitly communicated that you should always use structural unions unless you have a specific need for nominal ones.

Richard Feldman (Oct 27 2024 at 17:53):

yeah maybe terminology like "fixed" vs "flexible" might help? :thinking:

Richard Feldman (Oct 27 2024 at 17:53):

might also be more confusing haha

Richard Feldman (Oct 27 2024 at 17:53):

or like "anonymous records"/"anonymous tag unions" vs "named records"/"named tag unions"

Isaac Van Doren (Oct 27 2024 at 18:00):

Anonymous/named sounds promising to me, but i think the presence of both kinds of unions will be somewhat confusing regardless

Brendan Hansknecht (Oct 27 2024 at 18:52):

More confusing than opaque tag unions?

Brendan Hansknecht (Oct 27 2024 at 19:09):

Given the :thinking:, I'll clarify a bit. I don't think this makes the language any more complex. It is just swapping out opaque tags for nominal tags. Slightly different tradeoffs, but no more complexity or confusion.

Isaac Van Doren (Oct 27 2024 at 19:22):

Right now if you want to use a tag union there's only one choice. You could choose to wrap that tag union in an opaque type if that is desired but it's a separate concept. With this design, there would now be two ways to use tag unions:

Color : [Red, Blue]

Color := [Red, Blue]

The difference between these two choices is not obvious. They can both be used to solve similar problems, but they each have different consequences. This seems more confusing than the current situation.

Isaac Van Doren (Oct 27 2024 at 19:25):

I could also see a world where users familiar with other languages with nominal tag unions like Elm or Haskell might not realize that Roc has structural unions and default to always explicilty declaring nominal unions with :=. That might not be too difficult of an anti-pattern to correct, but I think it would happen with this approach.

Brendan Hansknecht (Oct 27 2024 at 19:40):

I think you are just pointing out that opaque tags today are a pain to use and don't really work. It has been requested multiple times to make them more flexible. We have opaque tags today. Look at bool. They just suck to use. I think long term, eventually something would give in that system as well.

Brendan Hansknecht (Oct 27 2024 at 19:40):

So I would still argue it is not really more complex in practice

Brendan Hansknecht (Oct 27 2024 at 19:41):

Also, I think it would be totally fine if someone only want to use nominal tags. I think it would even be a reasonable best practice

Brendan Hansknecht (Oct 27 2024 at 19:42):

Apart from error style tags that accumulate tons of adhoc variants, I think that nominal tags are more type safe and that is beneficial

Isaac Van Doren (Oct 28 2024 at 01:32):

Well right now because you can't use an opaque tag union like a tag union normally there's never really the need to make a decision about which you should use. But this change will introduce that decision which seems more complex to me.

I don't have a better solution to propose and I don't think this is enough of a reason not to go with nominal types, it's just unfortunate that it could introduce some confusion.

Richard Feldman (Nov 01 2024 at 14:08):

one idea is that we could just give them a separate name - so like "tag union" means what it does today, and the idea is that "it's a union of tags, and the union can grow on the fly" and then separately we have a concept of "enum" (for example, to use Rust's terminology) which is a hardcoded enumeration of alternatives

Richard Feldman (Nov 01 2024 at 14:08):

rather than using the terminology of "nominal" and "structural"

Richard Feldman (Nov 01 2024 at 14:10):

so then we could say like

tag unions have the feature of being able to grow on the fly, you don't have to qualify (or import unqualified) the tag names, and equality etc. are all inferred for you automatically
enums are for when you want more control than that - you don't want them to grow on the fly, you want to control which variants can be accessed in other modules, you want to customize how equality/serialization/etc. works

Richard Feldman (Nov 01 2024 at 14:14):

the declaration syntax could use a keyword to reinforce that, e.g.

enum Bool [True, False]

Richard Feldman (Nov 01 2024 at 14:57):

or maybe another way to explain it could be in terms of the nominal types, e.g.

Roc has enums and structs
Records are anonymous structs
Tag unions are anonymous enums

Brendan Hansknecht (Nov 01 2024 at 16:06):

I think we would need to remove the idea of closed tags completely from user space for that framing to feel cohesive

Brendan Hansknecht (Nov 01 2024 at 16:06):

Cause it sounds like enums are just closed tags

Brendan Hansknecht (Nov 01 2024 at 16:06):

Of course we would still need closed tags in general to avoid requiring _ -> in pattern matching

Fritz Psiorz (Nov 01 2024 at 18:06):

Richard Feldman schrieb:

one thing this would immediately let us do is to say:

Bool is now a nominal tag union, defined as Bool := [True, False]

we automatically import its True and False tags, and expose them

this means when you put True into the repl, it says True : Bool - and pattern matching etc still works the normal way

it also means you can't construct a userspace tag named True anymore, or False

I don't really like this. I like the fact that you can just use any capitalized identifier as a tag. There are situations in which one might want to use True and False as tags, e.g. when you're handling some kind of external language or data representation that has true and false as possible values.

Norbert Hajagos (Nov 01 2024 at 19:05):

I think it is an upside that you can't name your tags True or False. You know it is either one of the two values, not maybe the Roc True, but actually, it is an open tag union that in certain code paths is Truh (typo), or Undefined (example: because you model something like a dynamic language value with your tag).
I like the idea of removing the concept of closed tags in favor of enums for the public (tutorial and general explanations), like how Task is fading to the background with the current purity inference proposal

Richard Feldman (Nov 01 2024 at 20:42):

Fritz Psiorz said:

Richard Feldman schrieb:

one thing this would immediately let us do is to say:

Bool is now a nominal tag union, defined as Bool := [True, False]

we automatically import its True and False tags, and expose them

this means when you put True into the repl, it says True : Bool - and pattern matching etc still works the normal way

it also means you can't construct a userspace tag named True anymore, or False

I don't really like this. I like the fact that you can just use any capitalized identifier as a tag. There are situations in which one might want to use True and False as tags, e.g. when you're handling some kind of external language or data representation that has true and false as possible values.

we could allow opting out of that, e.g.

import Bool exposing [Bool]

Richard Feldman (Nov 01 2024 at 20:43):

I guess in general we could allow you to import builtin modules with different exposing settings in case you (for some reason) really want to choose names that builtin modules reserve

Richard Feldman (Nov 01 2024 at 20:44):

but if I'm being honest, I think demand for that in practice would be close to zero

Richard Feldman (Nov 01 2024 at 20:48):

especially because the workaround is so easy: just name them True_ and False_ or something, just like how people work around reserved record field names like if

Isaac Van Doren (Nov 03 2024 at 21:47):

one idea is that we could just give them a separate name - so like "tag union" means what it does today, and the idea is that "it's a union of tags, and the union can grow on the fly" and then separately we have a concept of "enum" (for example, to use Rust's terminology) which is a hardcoded enumeration of alternatives

Calling only one of the kinds of unions tag unions and the other enums feels odd to me given that they are both tagged unions. I like the idea of calling them named tag unions and anonymous tag unions more.

I think avoiding nominal/structural as the primary way of communicating the differences is a good idea. I suspect these are less familiar terms than anonymous/named and there seems to be a fair amount of confusion about what they mean.

Isaac Van Doren (Nov 03 2024 at 21:50):

or maybe another way to explain it could be in terms of the nominal types, e.g.
* Roc has enums and structs
* Records are anonymous structs
* Tag unions are anonymous enums

It seems like in Elm people almost always use structural records rather than nominal ones. If that is the case, maybe there isn't a need to have separate names for structural and named records.

Richard Feldman (Nov 03 2024 at 23:23):

what about "custom tag union"?

Richard Feldman (Nov 03 2024 at 23:24):

that name suggests what the default is: you have normal tag unions, and then when you want to customize them beyond the defaults they give you, you switch to a custom tag union

Richard Feldman (Nov 03 2024 at 23:24):

and a custom one isn't compatible with the normal ones (which means it can't grow automatically) because, well, it's custom!

Richard Feldman (Nov 03 2024 at 23:24):

that is, it's not the same as them anymore - which was the whole goal anyway

jan kili (Nov 03 2024 at 23:29):

Smooth idea. Would records also rename to "custom structs"?

Richard Feldman (Nov 03 2024 at 23:36):

assuming we want nominal versions of both, I think I'd go "tag union / custom tag union" and "record / custom record"

Richard Feldman (Nov 03 2024 at 23:36):

keeping with the theme of "you can make a custom version if you want it to work differently from the default"

Isaac Van Doren (Nov 04 2024 at 02:48):

Ooh yeah I like that! Nice that then the default can be described just as a tag union and there doesn’t have to be any other modifier like anonymous or structural.

jan kili (Nov 04 2024 at 03:05):

Does this mean you can't define a type alias for an open tag union or open record?

Isaac Van Doren (Nov 04 2024 at 03:06):

No you could definitely still define type aliases for anything

Isaac Van Doren (Nov 04 2024 at 03:10):

To use a custom (nominal) tag union you will have to declare it explicitly and it's tags will be associated with it exclusively. Defining a type alias just serves as a shorthand to refer to a type rather than creating a new, distinct type.

Last updated: Jul 23 2026 at 13:15 UTC