So I've been working on getting the lsp spec working in roc and honestly it's been pretty painful, enums and unions are really pretty rough. Here is an example of what's needed to decode a single enum:
interface CompletionItemKind
exposes [
CompletionItemKind,
]
imports [
DecodeUtils,
]
CompletionItemKind := [
Text,
Method,
Function,
Constructor,
Field,
Variable,
Class,
Interface,
Module,
Property,
Unit,
Value,
Enum,
Keyword,
Snippet,
Color,
File,
Reference,
Folder,
EnumMember,
Constant,
Struct,
Event,
Operator,
TypeParameter,
]
implements [
Decoding {
decoder: decodeCompletionItemKind,
},
Encoding {
toEncoder: encodeCompletionItemKind,
},
]
get= \@CompletionItemKind val-> val
from= \val-> @CompletionItemKind val
decodeCompletionItemKind =
ok = \tag -> Ok (@CompletionItemKind tag)
DecodeUtils.wrapDecode \val ->
when val is
1 -> ok Text
2 -> ok Method
3 -> ok Function
4 -> ok Constructor
5 -> ok Field
6 -> ok Variable
7 -> ok Class
8 -> ok Interface
9 -> ok Module
10 -> ok Property
11 -> ok Unit
12 -> ok Value
13 -> ok Enum
14 -> ok Keyword
15 -> ok Snippet
16 -> ok Color
17 -> ok File
18 -> ok Reference
19 -> ok Folder
20 -> ok EnumMember
21 -> ok Constant
22 -> ok Struct
23 -> ok Event
24 -> ok Operator
25 -> ok TypeParameter
_ -> Err TooShort
encodeCompletionItemKind = \@CompletionItemKind val ->
num =
when val is
Text -> 1
Method -> 2
Function -> 3
Constructor -> 4
Field -> 5
Variable -> 6
Class -> 7
Interface -> 8
Module -> 9
Property -> 10
Unit -> 11
Value -> 12
Enum -> 13
Keyword -> 14
Snippet -> 15
Color -> 16
File -> 17
Reference -> 18
Folder -> 19
EnumMember -> 20
Constant -> 21
Struct -> 22
Event -> 23
Operator -> 24
TypeParameter -> 25
Encode.u32 num
I was hoping someone might have some ideas how this might be improved. If no such improvement exists I think a discussion of some improved syntax for defining number and string enum encodings might be warranted.
For comparison here is the same definition in C#:
public enum CompletionItemKind
{
Text = 1,
Method = 2,
Function = 3,
Constructor = 4,
Field = 5,
Variable = 6,
Class = 7,
Interface = 8,
Module = 9,
Property = 10,
Unit = 11,
Value = 12,
Enum = 13,
Keyword = 14,
Snippet = 15,
Color = 16,
File = 17,
Reference = 18,
Folder = 19,
EnumMember = 20,
Constant = 21,
Struct = 22,
Event = 23,
Operator = 24,
TypeParameter = 25,
}
yeah I've thought about this in the past
one idea is to allow something like this:
CompletionItemKind := [
Text = 1,
Method = 2,
...etc
]
and then only allow that syntax specifically when defining opaque types
and not when defining type aliases or anonymous tag union types
there are some follow-up questions there though, such as:
List U8? If so, how would that look?(deleted)
another case where this comes up is http response codes
like wanting to be able to say NotFound and have that turn into 404
I was thinking something similar.
Would it be remotely possible within type aliases?
With an opaque type we do still have to annoyingly convert too and from using the get and from methods.
I was thinking, could we perhaps attach some extra field to the type at compile time and then use that when generating the decoder/encoder for those tags?
So you can just write: CompletionKind:[Text=1,Method=2]
I'd say a definite no to "enum tags" with payloads. It's hard to imagine how that would work in most of the "key value" style encodings.
the problem with doing it in type aliases is that there could be conflicts when unions merge
like if I have [Foo, Bar] and that gets unioned with [Foo, Baz], that results in [Foo, Bar, Baz]
but what if in the first one I said Foo is 1 and in the second one I said Foo is 3...in the combined union, what is Foo?
Ahh, yeah u see what you mean. It does seem like something that should probably only exist in annotations, these types are at the edges of your program after all. Like if your unions merge that info gets dropped.
Could it be required that it be a closed union? Would that fix it?
right now since you can't specify, the compiler is free to choose whatever numbers it wants and there's no ambiguity
hm maybe, but I think it could still be confusing
opaque types seem like a better fit for this because they don't get unioned etc.
Do we plan to encode standard un-opaque tag unions as numbers or strings?
no plans, but if there's some motivating use case we could discuss it
one idea for how to go to/from numbers would be abilities:
ToU8 implements {
toU8 : val -> U8
where val implements ToU8
}
FromU8 implements {
fromU8 : U8 -> Result val [OutOfRange]
where val implements ToU8
}
and then auto implement those for opaque types which use that syntax
Richard Feldman said:
opaque types seem like a better fit for this because they don't get unioned etc.
I do think not having to wrap and unwrap the opaque type is a significant ergonomic improvement when working with these types.
I just don't see how it can work though haha
like if I write:
Even : [Foo = 2, Bar = 4]
Odd : [Foo = 1, Baz = 5]
getAnEven : Stuff -> Even
getAnOdd : Stuff -> Odd
x =
if something then
getAnEven stuff1
else
getAnOdd stuff2
what is the type of x?
or is the idea that [Foo = 1] is a totally separate type from [Foo] and now we track numbers as part of the type?
I guess that's an option
in which case x would be a type mismatch, whereas it would work fine if you removed the type annotations or removed the = 1 =2 etc
as a brief aside, one of the notable things about having this baked into the language is that there can be a performance benefit compared to the status quo
so there's the ergonomic piece of having the conversion to/from number be auto-derived instead of having to write it out (like you do today), but then separately there's the piece of having the in-memory representation already be the number, so converting it to the desired number is free at runtime
Basically, but potentially you could have it so that when you merge the tag unions it just drops the numbers, also Foo could automatically cast to Foo=1
I'll make some examples and play with the idea a bit more tomorrow when I'm at my computer.
Basically I was thinking you have to do these annotations at the decode encode edges of your program and it all gets automatically converted to and from normal tag unions.
isn't that essentially the opaque type design? :thinking:
How would you feel about storing the mapping in a dictionary, with maybe two helpers encoderFromDict and decoderFromDict?
In a case like this would probably be better to just use a list and make the index be the integer. Dict would be unnecessarily slow for something this small.
I have tried that, but it's messy and I dislike it. It's easy to add another tag and then forget to add it to the list and then get a panic
Yeah, sorry, I was saying use a list instead of a dict (cause perf), probably still isn't nice to use a list in general
I feel like enabling this feature on type aliases will likely cause problems due to type inference.
Like with default values, this can't be definitely only in the type system. It has to be defined even if the user never specified types.
That's where I expect the complexity of this feature to come in. That and by default everything is ordered alphabetically, not in definition order.
The default order will make autoderive not useful by default.
Well it wouldn't make any sense if the user didn't specify types. I'd say in this case it doesn't make sense to infer it. Like, you don't know what values to assign to each tag, sometimes enums have gaps or start at 0. The user should be made to annotate the type or just not use the feature
Yeah, so then it would be an opaque type feature
Though, maybe we can accept that some features are only allowed when adding a type definition. That's what I want us to do for default values as well. Only allow them in type definitions and require the type definition to use them at all.
The big issue with that is it means you can't comment out the type alias and leave roc to infer the type. It will change the semantics of the program.
True. But you can say the same about opaque types, which if this feature replaces the need for in this case,and improves ergonomics... Have we given up anything?
I'm not saying we should, I'd just like to explore all options so we can be aware of the compromises we are making by picking one :)
I think being able to define a type like this:
Message:{
messageType:[Text=1, Email=2]
body:Str
}
Is about as good as the ergonomics could be, and far superior to an opaque type where you need these functions:
get: ....
from: ...
# and maybe
text: ...
Email ...
Yeah, I 100% agree. Which I why I also want us to put default values into the type definition. I think giving extra power if you add type definitions is super valuable.
At least so far, it just isn't a tradeoff that has been accepted.
Cool, well maybe as we accumulate more cases where it would be an improvement it might even out that tradeoff more :).
I do like that both these features wouldn't at all prevent you from never annotating a type. Just give extra power if you do
I think the big difference with opaque types is that the compiler can infer them even if you comment out all type info.
This is due to the @MyOpaque someData that clearly adds the type info even though types are not explicitly specified.
So this is inferable with the types commented out
MyOpaque := [
Foo = 7,
Bar = 3
] implements [
ToU8, # auto derive impl
]
# This program does the same thing with this commented out
# Foo is always 7
# someFunc : MyOpaque -> List U8
someFunc \myOpaque ->
Encode.encode myOpaque
main =
....
# This adds in the type info due to the `@MyOpaque`
someFunc (@MyOpaque Foo)
This changes semantics:
MyAlias : [
Foo = 7,
Bar = 3
]
# This encodes Foo as 1 if commented out and as 7 if typed.
# someFunc : MyAlias -> List U8
someFunc \myAlias ->
Encode.encode myAlias
main =
....
# No added type info to connect to MyAlias
someFunc Foo
a relevant distinction between this and default record fields is whether the program is still runnable without the annotation
as opposed to giving a type mismatch at compile time (due to missing fields and not knowing what to use as a default value) and then having to crash at runtime if the program is run anyway
I think a compiler error is the better case. So I would label default values as safer. It blocks compilation instead of compiling a subtly incorrect program.
Specifically I am imagining the ci and debugging story (if it gets into prod).
I have never came across a case when I had to map enums to integers (or almost never) it is always an enum to a string instead. If anything from the discussion goes into the Roc, will it be limited to enum↔integer mappings?
If we did implement some alternative syntax I would like it to work for both strings and ints.
C and CPP and C# and Java all use ints for enums. I've used quite a lot of APIs that use that. String enums are a very JS thing in my mind
I have seen limited use of string enums in printing/parsing. I mean that is even what decode does by default, but the name exactly matches the tag.
We convert enums to strings all the time in our Java code base at work. I can’t think of any instances of converting them to numbers.
Ah okay, my bad. I should have checked that, i thought I remembered it being the same but it seems like that's a C, CPP, C# thing.
You know I thought that there were automatically assigned numbers for enums in Java but looks like there actually aren’t
So it sounds like both numbers and strings are common.
Obviously due to type checking, it is easy to go from enum to number/string and ensure you haven't missed anything (even if it is a bit verbose, it is just a simple when ... is). Going the other way is where most of the pain comes in. No clean way to match. No clean way to ensure you didn't actually miss something.
Of course both could be automatically generated, but the general concern may be solvable even in a simpler manner.
Suppose a person defines a toEnum function with a when statement:
toEnum = \tag ->
when tag is
Foo -> 0
Bar -> 1
...
Another approach might be for the standard library to include Decode.enum or Encode.enum helpers, that take a function like the above and generate an encoder or decoder from it.
encoder = Encode.enum toEnum
decoder = Decode.enum toEnum
The implementation of Decode.enum would have to 'reverse' the implementation of toEnum, which it could do by calling toEnum once with every value in the tag. Don't know how tricky this would be, but I imagine it might be possible to implement in the standard library?
To support strings, we could instead have stdlib helpers Encode.strEnum, Encode.intEnum, and similar decoders.
Oh hey, that's a really cool idea! It's certainly a lot simpler than adding new suntax and stuff. Great suggestion!
We could definitely do that using the "macros" within the compiler(like the way deriving decoding is implemented).
Could it be implemented in a reasonable way though? I'd assume it would either be a super brittle pattern match or it would have a large runtime cost.
Brendan Hansknecht said:
Going the other way is where most of the pain comes in. No clean way to match. No clean way to ensure you didn't actually miss something.
Another approach to helping you ensure you didn't miss something could be making that easy to test. Currently if you wanted to test "for each possible tag t of my tag union, decode (encode t) == Ok t" how would you write that?
We don't have a good way sadly. Would be a list that has to be manually updated as well.
Richard Feldman said:
one idea is to allow something like this:
CompletionItemKind := [ Text = 1, Method = 2, ...etc ]and then only allow that syntax specifically when defining opaque types
Many languages have a possibility to annotate different things, like enums and then serialization libraries use it to decode/encode automatically. This is the simplest way, because you define everything once and in-place. So, following @Richard Feldman we could define:
CompletionItemKind := [
Text = 1,
Method = 2,
...etc
]
or
CompletionItemKind := [
Text = "text",
Method = "method,
...etc
]
and voilà!
But... there is a case to be made that we should not mix types (application logic) and some "trivial matters" like serialization. Or we simply cannot do it (when e.g. the enums are too generic or from other packages)... or maybe when we would have to use same enums in different serialization contexts!
Having said all that, I think we should allow for flexibility. The best of both worlds would be to let simple case be simple while not limiting other scenarios:
It would be gorgeous if it would be dead-easy to provide just one-way mapping, like @Jasper Woudenberg suggested, and the other direction would be derived :heart_eyes:
One very important note:
Tags are more than just enums. Each tag can contain data. That should hopefully fit seamlessly into whatever design we pick.
Brendan Hansknecht said:
One very important note:
Tags are more than just enums. Each tag can contain data. That should hopefully fit seamlessly into whatever design we pick.
Currently, my colleagues are working with APIs and have to decode and encode JSONs back and forth and "enums" (i.e. tag unions without any additional payload) are super common. Most popular languages (the rumors of them being dead are sadly still premature) do not have tag unions anyway and this is probably why most of the API (and so the JSONs) we have to deal with have nothing that would fit them. It is always simple fields of strings, numbers and "enums".
So, it would be really great if we could:
I don't see any reason to restrict to only one. The poor man's tagged enum are in a lot of apis.
I have seen any apis in roughly the form
{
message: Enum (often string enum in json),
data: varying data type that is specific to the message variant,
}
This is a tagged union just in a poor form.
Also, more properly typed protocols like protobuf have something that maps directly to tagged unions.
I wanted to check I understood what we are talking about here, so I made the below example which is a simple encoding and decoding of a list of CompletionItemKind. This may be helpful for others so sharing here. The below app will print the following to stdio:
$ roc dev example.roc
(@CompletionItemKind Text)
(@CompletionItemKind Method)
(@CompletionItemKind Function)
(@CompletionItemKind Constructor)
(@CompletionItemKind Field)
(@CompletionItemKind Variable)
(@CompletionItemKind Class)
(@CompletionItemKind Interface)
(@CompletionItemKind Module)
app "example"
packages {
pf: "https://github.com/roc-lang/basic-cli/releases/download/0.8.1/x8URkvfyi9I0QhmVG98roKBUs_AZRkLFwFJVJ3942YA.tar.br",
json: "https://github.com/lukewilliamboswell/roc-json/releases/download/0.6.3/_2Dh4Eju2v_tFtZeMq8aZ9qw2outG04NbkmKpFhXS_4.tar.br",
}
imports [pf.Stdout.{line},json.Core.{json}]
provides [main] to pf
main =
input : List U8
input = ['[','1',',','2',',','3',',','4',',','5',',','6',',','7',',','8',',','9',']']
itemKinds : List CompletionItemKind
itemKinds = Decode.fromBytes input json |> Result.withDefault []
itemKinds
|> List.map Inspect.toStr
|> Str.joinWith "\n"
|> Stdout.line
CompletionItemKind := [
Text,
Method,
Function,
Constructor,
Field,
Variable,
Class,
Interface,
Module,
] implements [
Decoding { decoder: decodeThing },
Encoding { toEncoder: encodeThing },
Inspect,
]
decodeThing : Decoder CompletionItemKind fmt
decodeThing = Decode.custom \bytes, _ ->
ok : _, List U8 -> DecodeResult CompletionItemKind
ok = \tag, rest -> {result: Ok (@CompletionItemKind tag), rest}
when bytes is
['1', .. as rest] -> ok Text rest
['2', .. as rest] -> ok Method rest
['3', .. as rest] -> ok Function rest
['4', .. as rest] -> ok Constructor rest
['5', .. as rest] -> ok Field rest
['6', .. as rest] -> ok Variable rest
['7', .. as rest] -> ok Class rest
['8', .. as rest] -> ok Interface rest
['9', .. as rest] -> ok Module rest
_ -> {result: Err TooShort, rest: bytes}
encodeThing : CompletionItemKind -> Encoder fmt
encodeThing = \@CompletionItemKind tag -> Encode.custom \bytes, _ ->
append : U8 -> List U8
append = \u8 -> List.append bytes u8
when tag is
Text -> append 1
Method -> append 2
Function -> append 3
Constructor -> append 4
Field -> append 5
Variable -> append 6
Class -> append 7
Interface -> append 8
Module -> append 9
It's basically the same thing as what @Eli Dowling posted at the start, just another version I guess and I've cut some of the Tags out for brevity.
Yeah, so we are talking about auto generating the encode/decode implementations that you specified manually in that example.
Then it expanded scope to say you might also want to auto generate a string mapping as well
Text -> "text"
Method -> "method"
...
Specifically the question is can we add information to the type such that we can auto generate encode/decode and ensure we never miss a case.
The last piece on top of that is can the implementation be flexible enough to also make for easy support of tag unions that contain data rather than just enums with no data.
I think that is the full set of questions being looked at here
In your example, if you add a new enum variant. You will get a type error that leads to updating encode, but no help to update decode.
Well, if we do go with this design I would consider it completely separate to tag unions with data inside. Those encoders and decoders can be decided by the format. We can look at how other languages with tag unions encode this. But for json you might have something like:
"myUnion":{
"Tag1":{
//..tag content...
}}
//Or:
"myUnion":{
"tag":"Tag1",
"value":{
//..tag contents...
}}
Enum tags are just a completely different thing
For sure, I guess you can't autoderive a tag union with data encoder in many formats, but we should be able to autoderive the two core pieces still. We should be able to autoderive the tag encoding and autoderive the contained data encoding. It would still be up to the user to decide how that wires into the final output.
I think it should be implemented in the format. We just provide the encoding for the name of the tag and the contents and the format decides how to encode and decode the two. Just like we do for record encoding and decoding. Then we just implement something like I suggested above.
Which is what we do currently, right?
tag : Str, List (Encoder fmt) -> Encoder fmt where fmt implements EncoderFormatting
The Str is the encoding of the tag. The List (Encoder fmt) is how to encode each field of data.
Oh, yeah exactly :sweat_smile:. I'm on mobile, I would have checked otherwise, oops .
So if that is the case, it just means whatever syntax we pick, we need to make sure that it works with tags with data.
CompletionItemKind := [
Text = 1,
Method = 2,
...etc
]
CompletionItemKindWithData := [
Text Str = 1,
Method U64 I32 = 2,
...etc
]
Ahh, I see, good point. Well that makes my bottom suggestion much more appealing
I was actually initially thinking we might just want to not allow these enum style tag annotations on tags with data at all. But it does seem like it'd work fine
Could something like comptime help us here? Or maybe some kind of code-gen?
Well that's basically what we're talking about. Autoderived decoding for records and tuples currently uses what are basically macros/comptime but using hand written roc expressions in the compiler. Think of it as a macro system with the worst syntax imaginable :sweat_smile:. That's what my null decoding PR is updating
I was thinking in roc userland
Well I agree, even just in compiler land it would be handy. It's a big addition but I think it would enable some really cool stuff. Like testing out syntax changes very easily. And doing super fun stuff like automatic generation of types for JSON. Which would make scripting in roc super fun. Basically you provide a slice of sample json and the types get generated and added to your program then you can use it to do transformations on the Json data. F# has this and it's amazingly useful for scripts
I had a cool idea in this area for automated encode and decode for tag unions.
Obviously the most general solution to all of this is macros/comptime, but I believe we can make some good progress with this
I often want my tag unions to just be the names but camel cased, or snake cased or pascal cased.
Sometimes I want tag unions to be unions but not tagged to interop with JS, they should just try decoding each tag until one matches and then also encode with no tag info.
We could use custom formatters for that that wrap an existing formatter like this:
interface UnionTags
exposes [
unionTags
]
imports [
]
UnionTags fmt := {otherFormatter:fmt } where fmt implements EncoderFormatting
implements [
EncoderFormatting {
u8: encodeU8,
u16: encodeU16,
u32: encodeU32,
u64: encodeU64,
u128: encodeU128,
i8: encodeI8,
i16: encodeI16,
i32: encodeI32,
i64: encodeI64,
i128: encodeI128,
f32: encodeF32,
f64: encodeF64,
dec: encodeDec,
bool: encodeBool,
string: encodeString,
list: encodeList,
record: encodeRecord,
tuple: encodeTuple,
tag: encodeTag,
},
]
unionTags =\fmt-> @UnionTags { otherFormatter:fmt}
encodeTag:Str, List (Encoder _) -> Encoder _
encodeTag = \name, encoders ->
Encode.custom \bytes, @UnionTags { otherFormatter } ->
when encoders is
[only] ->
bytes |> Encode.appendWith only otherFormatter
_-> panic "cannot encode multi arg tags as unions "
forward=\n->
Encode.custom \bytes, @UnionTags {otherFormatter} ->
bytes |>Encode.append n otherFormatter
# all the other functions just forward
Basically we ignore the tag part of the tag union in this encoder.
Currently this crashes the compiler, probably because it doesn't like encodeFormatting type to be generic. But with module params it won't have to be and will probably work quite well :)
You could just implement this encoder/decoder in your opaque type and then
Last updated: Jun 16 2026 at 16:19 UTC