Stream: ideas

Topic: Supporting discriminants when encoding/decoding unions


view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 16:47):

Im evaluating using Roc in a specific system, where one of the needs is to encode/decode structures as unions discriminated on a particular key that I may or may not want reflected in the type being encoded or decoded. As a simple example, suppose our system already defines a "Status" message with the rough JSON schema of

status : { "kind": "success", "message": string } | { "kind": "failure", "code": number, "message": string }

perhaps we'd like to encode this message as the Roc type

Status := [Success { message: Str } , Failure { code : U64, message : Str }] has [Encoding, Decoding]

Unfortunately, the standard derived encoding and decoding for this opaque type, with the current JSON formatting implementation, would expect JSON messages that look like

{ "Success": [{ "message": string }] } | { "Failure": [{ "message": string, code : number }] }

One way to deal with this is to define a JSON formatting implementation that has some kind of configuration option to specify how certain discriminants should be encoded. For example, maybe the JSON implementation defines

interface Json exposes [JsonFormatting, TagEncoding, format]

JsonFormatting := { encodingsForTags: Dict Str TagEncoding, <other fields...> }

TagEncoding: [
  Default, # wrap the tag payloads as an array keyed by the tag name, like `{ "Success": [{ "message": string }] }`
  InlineAsSingleton { key: Str, value: Str }, # inline the discriminant in the payload, if it's unary, like `{ "type": "success", "message": string }`
]

format : Dict Str TagEncoding -> JsonFormatting

this is okay, but it has a few drawbacks. A few on the top of my head:

Another option is to write a custom JSON encoder/decoder for this type, but that loses the advantages of the Encoding/Decoding abilities and usage of JSON encoding/decoding packages in the ecosystem.

Another option is to define the Roc type as something like

Status := { type : String, code : Result U64 {}, message : Str }

but this loses the desired type safety.

I wonder if there is something we can do at the language level to better support this kind of pattern, as I believe it occurs quite often.

For context, here's how I might define this in typescript, using literal types

type Status = { type : 'success', message: string } | { type : 'failure', code: number, message: string }

in Python:

import typing as t

class Success(t.TypedDict):
  type: t.Literal["success"]
  message: str

class Failure(t.TypedDict):
  type: t.Literal["failure"]
  code: int
  message: str

Status = t.Union[Success, Failure]

in Rust with serde:

#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum Status {
    #[serde(tag = "success")]
    Success { message: String },
    #[serde(tag = "failure")]
    Failure { message: String, code: u64 },
}

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 16:57):

One option is to extend the Encoding/Decoding ability API to take a parameter like the TagEncoding described above in the JSON module, for example

# [tag tagName payloads tagEncoding] encodes a tag variant and its payload types
tag : Str, List (Encoder fmt), TagEncoding -> Encoder fmt | fmt has EncoderFormatting

and then have some kind of new annotation syntax you can use for the purposes of deriving Encoding/Decoding for a type, a-la serde's use of Rust annotations or Go's tags on struct fields. For example

Status := [Success { message: Str } , Failure { code : U64, message : Str }] has [Encoding with {
  tagEncoding: InlineDiscriminant,
  renameTags : when tag is
    Success -> "success",
    Failure -> "failure",
}]

which are only applicable for builtin types. I haven't really thought this through, but on the surface it doesn't strike me as a good idea.

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:12):

what about this?

TagNameTransformation : [
    ToSnakeCase,
    ToKebabCase,
    ToCamelCase,
    None,
]

TagNameStrategy : [
     # e.g. [Success { message : Str }] ==> { "Success": [{ "message": String }] }
    BecomeFieldName TagNameTransformation,

     # e.g. [Success { message : Str }] ==> { "kind": "Success", "message": string }
    StoredInField Str TagNameTransformation,
]

format : TagNameStrategy -> JsonFormatting

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:15):

in my experience at least, it's most common to encounter a given JSON payload that has a consistent policy like this

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:16):

e.g. all field names are kebab-case or snake_case or camelCase, discriminants are handled the same way throughout the payload, etc.

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:18):

yeah, that works as long as you use different formattings when the behavior changes

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:18):

right, but in my experience a given payload is going to be consistent within itself and not mix & match

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:18):

and you already need to specify a formatting per payload that you want to encode or decode

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:19):

In the specific case I'm considering, there are instances where there are unions discriminated by a key, and cases where they are not (the unions are just disjoint without a common key name)

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:19):

within the same payload?

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:21):

They are not the same type, but they compose like

MyType1 : ... # discriminate on "type"

MyType2 : ... # no discriminant

..

MyCompoundType : ... # deeper references to MyType1 and MyType2

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:21):

So the payloads are disjoint, but they end up being used as part of a larger structure - if that answers your question

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:22):

hm, do you have JSON of MyCompoundType? or just separate JSON payloads for MyType1 and MyType2, and then the outputs of those get put into MyCompoundType on the TypeScript side?

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:23):

Yeah, I would like to encode/decode MyCompoundType as a whole

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:24):

what if the discriminant was there but got ignored on the TypeScript side?

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:24):

that is, Roc encodes the discriminant but TS ignores it

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:25):

Roc would not be able to decode it if the discriminant was missing in that case, though - if you used the derived ability and the single tag encoding configuration

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:26):

hm, but how would Roc decode it without a discriminant anyway? :thinking:

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:27):

One option is backtracking - for example if you have [Foo {a : Str}, Bar {b : Str}], if you don't see an "a" key in the message, fallback on decoding the "Bar" variant by looking for the "b" key

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:29):

Another practical option is to have your platform encode/decode the type appropriately if it's a case like this. Since in practice you'll probably be getting these messages over a network or other effectful operation.

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:30):

well you can always handwrite a decoder at that point

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:32):

If you hand write a decoder, you would either need the Decoding API to be extended to take a tag configuration option, or you wouldn't be able to parameterize over Decoding formatters arbitrarily right. Like, you would have to write a decoder specifically for the JSON case, unless Decode.tag took a tag configuration option

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:32):

right, but I think having the platform encode/decode the type has the same drawback, yeah?

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:33):

yeah

view this post on Zulip Richard Feldman (Mar 30 2023 at 17:35):

is there some way the Decoding API could be changed to make this possible without either of the following being true?

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 17:58):

yeah im not sure. Naively I would say do what serde does and have some way to specify whether it's tagged/untagged in Decode.tag/Encode.tag, but Im not sure that generalizes to arbitrary formattings. Maybe formattings can ignore that if it's not applicable.

view this post on Zulip Richard Feldman (Mar 30 2023 at 18:07):

serde also relies on nominal types

view this post on Zulip Richard Feldman (Mar 30 2023 at 18:08):

one of the things I think about with encoding/decoding designs is "if you want to use this, do you have to convert all your structural types into nominal types just to get the encoding/decoding behavior you want?"

view this post on Zulip Richard Feldman (Mar 30 2023 at 18:08):

ideally not, of course :big_smile:

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 18:12):

well, changing the decoding API doesn't need to relate to nominal/structural typing at all. For example, similar to the API I first described, maybe you can define Encode.tag as

Encode.tag : Str, List (Encoder fmt), TagEncodingStrategy, TagNaming -> Encoder fmt | fmt has EncoderFormatting

this API works equally for structural and nominal types, other than that derived implementation must choose a default encoding and naming strategy (but I don't see how they couldn't without language changes)

view this post on Zulip Martin Stewart (Mar 30 2023 at 20:27):

Could this be solved with a code generation tool? You give it a type, it generates Roc decoders/encoders for you. Then if you need to tweak it, it's easy since it's just user code at that point.

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:28):

that's an interesting direction...I've been thinking about how roc glue could be expanded to work on arbitrary interface modules instead of just platform modules

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:29):

and then you could give it a glue spec to generate client encoders and decoders (e.g. in Elm or TypeScript)

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:29):

and those could be specific to your application and its types

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:29):

downside being that you'd have to incorporate it into your build process

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:32):

Ayaz Hafiz said:

well, changing the decoding API doesn't need to relate to nominal/structural typing at all.

what I meant is that the serde approach relies on nominal types

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:33):

it's like "here's my type, and also here's how I want to encode its fields"

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:34):

that only works for nominal types, since with structural types you'd be saying "here's the shape of my type, and any time in the entire program that any type happens to have this shape, encode it as follows" which (even if abilities worked that way) wouldn't be a great design :sweat_smile:

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:35):

but then again, to be fair - because of how opaque types work in Roc, it's pretty easy to define an opaque wrapper just for serialization/deserialization and then say "ok now unwrap it and that's the type I'll actually use"

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:35):

so maybe that's not actually a real concern

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:36):

I guess it's annoying if you have a nested data structure which stores a structural type, and you want to serialize the whole data structure while getting some custom behavior for one of its nested structural types

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:36):

but you want to specify that custom encoding/decoding behavior in terms of the nested type, not the nested data structure that contains it

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 23:39):

i feel like that would only really come up during prototyping though. If these are messages over a network you’re likely to explicitly type their structure, at which point yeah there might be little overhead to use an opaque type there

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 23:40):

in any case , this distinction for serde’s case only matters with regard to annotations that tell it how to encode/decode the type right? which would have to be a new language addition to Roc if it were to be reflected, which i think we would prefer not to do?

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:42):

I'd definitely prefer not to do it

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:45):

I'm open to the idea in some form, but it feels like a Pandora's Box to open

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:50):

a thing I also don't love about it is that it feels like something to add to the language almost exclusively for JSON specifically, which is a category of language design decision that tends not to age well (e.g. Scala having baked-in syntax for XML, which was about as popular a serialization format at the time as JSON is now)

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:50):

like binary formats don't care about this, and I assume XML would have explicit discriminants...feels like a JSON-specific thing

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 23:52):

yeah I have the same feeling.

view this post on Zulip Ayaz Hafiz (Mar 30 2023 at 23:52):

binary formats might care about it but yeah they’re usually more standardized

view this post on Zulip Richard Feldman (Mar 30 2023 at 23:52):

I assume a binary format would use an integer discriminant

view this post on Zulip Martin Stewart (Mar 31 2023 at 11:16):

Richard Feldman said:

downside being that you'd have to incorporate it into your build process

What I had in mind was an editor tool. You'd create a type representing the external data you want to decode, and then you press some hotkey to generate a JSON decoder for it. After that it's just user code so you don't regenerate it unless you decide later you want to throw away your old decoder and start again. In other words, no complicated build process.


Last updated: Jun 16 2026 at 16:19 UTC