Supporting discriminants when encoding/decoding unions · ideas

Im evaluating using Roc in a specific system, where one of the needs is to encode/decode structures as unions discriminated on a particular key that I may or may not want reflected in the type being encoded or decoded. As a simple example, suppose our system already defines a "Status" message with the rough JSON schema of

status : { "kind": "success", "message": string } | { "kind": "failure", "code": number, "message": string }

Status := [Success { message: Str } , Failure { code : U64, message : Str }] has [Encoding, Decoding]

Unfortunately, the standard derived encoding and decoding for this opaque type, with the current JSON formatting implementation, would expect JSON messages that look like

{ "Success": [{ "message": string }] } | { "Failure": [{ "message": string, code : number }] }

One way to deal with this is to define a JSON formatting implementation that has some kind of configuration option to specify how certain discriminants should be encoded. For example, maybe the JSON implementation defines

interface Json exposes [JsonFormatting, TagEncoding, format]

JsonFormatting := { encodingsForTags: Dict Str TagEncoding, <other fields...> }

TagEncoding: [
  Default, # wrap the tag payloads as an array keyed by the tag name, like `{ "Success": [{ "message": string }] }`
  InlineAsSingleton { key: Str, value: Str }, # inline the discriminant in the payload, if it's unary, like `{ "type": "success", "message": string }`
]

format : Dict Str TagEncoding -> JsonFormatting

Another option is to write a custom JSON encoder/decoder for this type, but that loses the advantages of the Encoding/Decoding abilities and usage of JSON encoding/decoding packages in the ecosystem.

Status := { type : String, code : Result U64 {}, message : Str }

I wonder if there is something we can do at the language level to better support this kind of pattern, as I believe it occurs quite often.

type Status = { type : 'success', message: string } | { type : 'failure', code: number, message: string }

import typing as t

class Success(t.TypedDict):
  type: t.Literal["success"]
  message: str

class Failure(t.TypedDict):
  type: t.Literal["failure"]
  code: int
  message: str

Status = t.Union[Success, Failure]

#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum Status {
    #[serde(tag = "success")]
    Success { message: String },
    #[serde(tag = "failure")]
    Failure { message: String, code: u64 },
}

Ayaz Hafiz (Mar 30 2023 at 16:57):

One option is to extend the Encoding/Decoding ability API to take a parameter like the TagEncoding described above in the JSON module, for example

# [tag tagName payloads tagEncoding] encodes a tag variant and its payload types
tag : Str, List (Encoder fmt), TagEncoding -> Encoder fmt | fmt has EncoderFormatting

and then have some kind of new annotation syntax you can use for the purposes of deriving Encoding/Decoding for a type, a-la serde's use of Rust annotations or Go's tags on struct fields. For example

Status := [Success { message: Str } , Failure { code : U64, message : Str }] has [Encoding with {
  tagEncoding: InlineDiscriminant,
  renameTags : when tag is
    Success -> "success",
    Failure -> "failure",
}]

which are only applicable for builtin types. I haven't really thought this through, but on the surface it doesn't strike me as a good idea.

Richard Feldman (Mar 30 2023 at 17:12):

TagNameTransformation : [
    ToSnakeCase,
    ToKebabCase,
    ToCamelCase,
    None,
]

TagNameStrategy : [
     # e.g. [Success { message : Str }] ==> { "Success": [{ "message": String }] }
    BecomeFieldName TagNameTransformation,

     # e.g. [Success { message : Str }] ==> { "kind": "Success", "message": string }
    StoredInField Str TagNameTransformation,
]

format : TagNameStrategy -> JsonFormatting

Richard Feldman (Mar 30 2023 at 17:15):

in my experience at least, it's most common to encounter a given JSON payload that has a consistent policy like this

Richard Feldman (Mar 30 2023 at 17:16):

e.g. all field names are kebab-case or snake_case or camelCase, discriminants are handled the same way throughout the payload, etc.

Ayaz Hafiz (Mar 30 2023 at 17:18):

yeah, that works as long as you use different formattings when the behavior changes

Richard Feldman (Mar 30 2023 at 17:18):

right, but in my experience a given payload is going to be consistent within itself and not mix & match

Richard Feldman (Mar 30 2023 at 17:18):

and you already need to specify a formatting per payload that you want to encode or decode

Ayaz Hafiz (Mar 30 2023 at 17:19):

In the specific case I'm considering, there are instances where there are unions discriminated by a key, and cases where they are not (the unions are just disjoint without a common key name)

Richard Feldman (Mar 30 2023 at 17:19):

Ayaz Hafiz (Mar 30 2023 at 17:21):

MyType1 : ... # discriminate on "type"

MyType2 : ... # no discriminant

..

MyCompoundType : ... # deeper references to MyType1 and MyType2

Ayaz Hafiz (Mar 30 2023 at 17:21):

So the payloads are disjoint, but they end up being used as part of a larger structure - if that answers your question

Richard Feldman (Mar 30 2023 at 17:22):

hm, do you have JSON of MyCompoundType? or just separate JSON payloads for MyType1 and MyType2, and then the outputs of those get put into MyCompoundType on the TypeScript side?

Ayaz Hafiz (Mar 30 2023 at 17:23):

Richard Feldman (Mar 30 2023 at 17:24):

Ayaz Hafiz (Mar 30 2023 at 17:25):

Roc would not be able to decode it if the discriminant was missing in that case, though - if you used the derived ability and the single tag encoding configuration

Richard Feldman (Mar 30 2023 at 17:26):

Ayaz Hafiz (Mar 30 2023 at 17:27):

One option is backtracking - for example if you have [Foo {a : Str}, Bar {b : Str}], if you don't see an "a" key in the message, fallback on decoding the "Bar" variant by looking for the "b" key

Ayaz Hafiz (Mar 30 2023 at 17:29):

Another practical option is to have your platform encode/decode the type appropriately if it's a case like this. Since in practice you'll probably be getting these messages over a network or other effectful operation.

Richard Feldman (Mar 30 2023 at 17:30):

Ayaz Hafiz (Mar 30 2023 at 17:32):

If you hand write a decoder, you would either need the Decoding API to be extended to take a tag configuration option, or you wouldn't be able to parameterize over Decoding formatters arbitrarily right. Like, you would have to write a decoder specifically for the JSON case, unless Decode.tag took a tag configuration option

Richard Feldman (Mar 30 2023 at 17:32):

right, but I think having the platform encode/decode the type has the same drawback, yeah?

Ayaz Hafiz (Mar 30 2023 at 17:33):

Richard Feldman (Mar 30 2023 at 17:35):

is there some way the Decoding API could be changed to make this possible without either of the following being true?

Ayaz Hafiz (Mar 30 2023 at 17:58):

yeah im not sure. Naively I would say do what serde does and have some way to specify whether it's tagged/untagged in Decode.tag/Encode.tag, but Im not sure that generalizes to arbitrary formattings. Maybe formattings can ignore that if it's not applicable.

Richard Feldman (Mar 30 2023 at 18:07):

Richard Feldman (Mar 30 2023 at 18:08):

one of the things I think about with encoding/decoding designs is "if you want to use this, do you have to convert all your structural types into nominal types just to get the encoding/decoding behavior you want?"

Richard Feldman (Mar 30 2023 at 18:08):

Ayaz Hafiz (Mar 30 2023 at 18:12):

well, changing the decoding API doesn't need to relate to nominal/structural typing at all. For example, similar to the API I first described, maybe you can define Encode.tag as

Encode.tag : Str, List (Encoder fmt), TagEncodingStrategy, TagNaming -> Encoder fmt | fmt has EncoderFormatting

this API works equally for structural and nominal types, other than that derived implementation must choose a default encoding and naming strategy (but I don't see how they couldn't without language changes)

Martin Stewart (Mar 30 2023 at 20:27):

Could this be solved with a code generation tool? You give it a type, it generates Roc decoders/encoders for you. Then if you need to tweak it, it's easy since it's just user code at that point.

Richard Feldman (Mar 30 2023 at 23:28):

that's an interesting direction...I've been thinking about how roc glue could be expanded to work on arbitrary interface modules instead of just platform modules

Richard Feldman (Mar 30 2023 at 23:29):

and then you could give it a glue spec to generate client encoders and decoders (e.g. in Elm or TypeScript)

Richard Feldman (Mar 30 2023 at 23:29):

Richard Feldman (Mar 30 2023 at 23:32):

Richard Feldman (Mar 30 2023 at 23:33):

Richard Feldman (Mar 30 2023 at 23:34):

that only works for nominal types, since with structural types you'd be saying "here's the shape of my type, and any time in the entire program that any type happens to have this shape, encode it as follows" which (even if abilities worked that way) wouldn't be a great design :sweat_smile:

Richard Feldman (Mar 30 2023 at 23:35):

but then again, to be fair - because of how opaque types work in Roc, it's pretty easy to define an opaque wrapper just for serialization/deserialization and then say "ok now unwrap it and that's the type I'll actually use"

Richard Feldman (Mar 30 2023 at 23:35):

Richard Feldman (Mar 30 2023 at 23:36):

I guess it's annoying if you have a nested data structure which stores a structural type, and you want to serialize the whole data structure while getting some custom behavior for one of its nested structural types

Richard Feldman (Mar 30 2023 at 23:36):

but you want to specify that custom encoding/decoding behavior in terms of the nested type, not the nested data structure that contains it

Ayaz Hafiz (Mar 30 2023 at 23:39):

i feel like that would only really come up during prototyping though. If these are messages over a network you’re likely to explicitly type their structure, at which point yeah there might be little overhead to use an opaque type there

Ayaz Hafiz (Mar 30 2023 at 23:40):

in any case , this distinction for serde’s case only matters with regard to annotations that tell it how to encode/decode the type right? which would have to be a new language addition to Roc if it were to be reflected, which i think we would prefer not to do?

Richard Feldman (Mar 30 2023 at 23:42):

Richard Feldman (Mar 30 2023 at 23:45):

Richard Feldman (Mar 30 2023 at 23:50):

a thing I also don't love about it is that it feels like something to add to the language almost exclusively for JSON specifically, which is a category of language design decision that tends not to age well (e.g. Scala having baked-in syntax for XML, which was about as popular a serialization format at the time as JSON is now)

Richard Feldman (Mar 30 2023 at 23:50):

like binary formats don't care about this, and I assume XML would have explicit discriminants...feels like a JSON-specific thing

Ayaz Hafiz (Mar 30 2023 at 23:52):

Richard Feldman (Mar 30 2023 at 23:52):

Martin Stewart (Mar 31 2023 at 11:16):

What I had in mind was an editor tool. You'd create a type representing the external data you want to decode, and then you press some hotkey to generate a JSON decoder for it. After that it's just user code so you don't regenerate it unless you decide later you want to throw away your old decoder and start again. In other words, no complicated build process.

Stream: ideas

Topic: Supporting discriminants when encoding/decoding unions

Ayaz Hafiz (Mar 30 2023 at 16:47):

Ayaz Hafiz (Mar 30 2023 at 16:57):

Richard Feldman (Mar 30 2023 at 17:12):

Richard Feldman (Mar 30 2023 at 17:15):

Richard Feldman (Mar 30 2023 at 17:16):

Ayaz Hafiz (Mar 30 2023 at 17:18):

Richard Feldman (Mar 30 2023 at 17:18):

Richard Feldman (Mar 30 2023 at 17:18):

Ayaz Hafiz (Mar 30 2023 at 17:19):

Richard Feldman (Mar 30 2023 at 17:19):

Ayaz Hafiz (Mar 30 2023 at 17:21):

Ayaz Hafiz (Mar 30 2023 at 17:21):

Richard Feldman (Mar 30 2023 at 17:22):

Ayaz Hafiz (Mar 30 2023 at 17:23):

Richard Feldman (Mar 30 2023 at 17:24):

Richard Feldman (Mar 30 2023 at 17:24):

Ayaz Hafiz (Mar 30 2023 at 17:25):

Richard Feldman (Mar 30 2023 at 17:26):

Ayaz Hafiz (Mar 30 2023 at 17:27):

Ayaz Hafiz (Mar 30 2023 at 17:29):

Richard Feldman (Mar 30 2023 at 17:30):

Ayaz Hafiz (Mar 30 2023 at 17:32):

Richard Feldman (Mar 30 2023 at 17:32):

Ayaz Hafiz (Mar 30 2023 at 17:33):

Richard Feldman (Mar 30 2023 at 17:35):

Ayaz Hafiz (Mar 30 2023 at 17:58):

Richard Feldman (Mar 30 2023 at 18:07):

Richard Feldman (Mar 30 2023 at 18:08):

Richard Feldman (Mar 30 2023 at 18:08):

Ayaz Hafiz (Mar 30 2023 at 18:12):

Martin Stewart (Mar 30 2023 at 20:27):

Richard Feldman (Mar 30 2023 at 23:28):

Richard Feldman (Mar 30 2023 at 23:29):

Richard Feldman (Mar 30 2023 at 23:29):

Richard Feldman (Mar 30 2023 at 23:29):

Richard Feldman (Mar 30 2023 at 23:32):

Richard Feldman (Mar 30 2023 at 23:33):

Richard Feldman (Mar 30 2023 at 23:34):

Richard Feldman (Mar 30 2023 at 23:35):

Richard Feldman (Mar 30 2023 at 23:35):

Richard Feldman (Mar 30 2023 at 23:36):

Richard Feldman (Mar 30 2023 at 23:36):

Ayaz Hafiz (Mar 30 2023 at 23:39):

Ayaz Hafiz (Mar 30 2023 at 23:40):

Richard Feldman (Mar 30 2023 at 23:42):

Richard Feldman (Mar 30 2023 at 23:45):

Richard Feldman (Mar 30 2023 at 23:50):

Richard Feldman (Mar 30 2023 at 23:50):

Ayaz Hafiz (Mar 30 2023 at 23:52):

Ayaz Hafiz (Mar 30 2023 at 23:52):

Richard Feldman (Mar 30 2023 at 23:52):

Martin Stewart (Mar 31 2023 at 11:16):