Stream: contributing

Topic: Decoding Json fields


view this post on Zulip Luke Boswell (Apr 19 2023 at 22:12):

How should we handle decoding field strings with characters that are not valid in Roc identifiers? I.e. capital letters, symbols @ or - etc?

Say we want to decode into this record { fruit: 2, owner: "Farmer Joe" }. Using the PR branch this input will Succeed " {\n\"fruit\"\t:2\n, \"owner\": \"Farmer Joe\" } " and this input will Fail " {\n\"Fruit\"\t:2\n, \"owner\": \"Farmer Joe\" } ".

One option I see is to actually decode into a Dict Str a instead and then have another method to convert the output to a record? This might also be a good way to handle input that succeeds in part.

@Ayaz Hafiz did you have a particular design in mind when you implemented this? I feel like there may be an obvious solution staring at me here, I'm not sure I fully grok the full capability of the below. Any assistance would be greatly appreciated.

## `record state stepField finalizer` decodes a record field-by-field.
##
## `stepField` returns a decoder for the given field in the record, or
## `Skip` if the field is not a part of the decoded record.
##
## `finalizer` should produce the record value from the decoded `state`.
record : state, (state, Str -> [Keep (Decoder state fmt), Skip]), (state -> Result val DecodeError) -> Decoder val fmt | fmt has DecoderFormatting

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:13):

Yeah this is hard a problem. I previously brought up something similar in #ideas > Supporting discriminants when encoding/decoding unions

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:13):

I think for now we could have the Json config take a mapping of how fields should be renamed when the Json object is constructed

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:15):

I know we’ve also talked about allowing more kinds of cases in record field names, but I think at present that only includes allowing underscores in field names. There are other ideas like having Decode.record take a renaming strategy, or adding some language level way to say a field should be renamed (which i personally think is a bad idea), but those are all separate discussions

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:16):

anyway for now I think adding an optional mapping to the config of Json would unblock users. And folks can fall back to decoding a Dict if need be as well.

view this post on Zulip Luke Boswell (Apr 19 2023 at 23:25):

Ok, I'll need to read this through more thoroughly. Do you have any thoughts on #5294? If this is easy enough to implement with our current design, then it sounds like we can use Dicts and have users convert to a record.

view this post on Zulip Luke Boswell (Apr 19 2023 at 23:27):

Another question, is there anything related here the proposed Record Builder syntax might assist with? It feels like it's similar but I'm not sure.

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:28):

I think we should add a decodeDict ability member as you describe as it would alleviate the problem

view this post on Zulip Ayaz Hafiz (Apr 19 2023 at 23:32):

record builder makes it easier to write effectful decoders i think but i don’t think it changes the Encode/decode api here

view this post on Zulip Ayaz Hafiz (Apr 20 2023 at 00:15):

Oh wait. I think decodeDict can already be implemented in terms of “record”

view this post on Zulip Luke Boswell (Apr 20 2023 at 06:58):

Ok, so I have it working in a crude form. I'm going to play with it some more to see what API I think feels nice. Just thought I would share what I have so far.

I'm thinking the map could be a Dict Str Str instead, though I thought I might try a record first as it is probably more descriptive.

The current API only supports a single mapping, so I'll need to update this to a List to support multiple, or use something like a Dict.

DecodeOption : [
    Default,

    # e.g. map JSON Object name "Fruit" to Roc Record field "fruit"
    DecodeMap { objectName : Str, recordField : Str },
]

Json := {
    decodeOption : DecodeOption,
} has [... stuff]

# Create a Json decoder with options
fromUtf8WithOptions = \{decodeOption? Default} ->
    @Json { decodeOption }

# Test decode of record with two strings ignoring whitespace
expect
    input = Str.toUtf8 " {\n\"Fruit\"\t:2\n, \"owner\": \"Farmer Joe\" } "
    decodeOption = DecodeMap {recordField : "fruit", objectName : "Fruit" }

    actual = Decode.fromBytesPartial input (fromUtf8WithOptions {decodeOption})
    expected = Ok { fruit: 2, owner: "Farmer Joe" }

    actual.result == expected

view this post on Zulip Luke Boswell (Apr 20 2023 at 07:13):

Or maybe it could take a function Str -> Str which maps between them? Then different strategies can be thought of and implemented on the user side of the API. Like if I wanted to capitalise or canel case etc I provide a function which will take the Roc field name and tranform it to the expected Json Object name. Probably better than hard coding all the possible fields. Eventually there should be unicode packages which can help with things like toCamelCase toKebabCase etc.

view this post on Zulip Luke Boswell (Apr 20 2023 at 07:19):

^^ this was inspired by Richard's tag name strategy linked in the previous post above. But applied to Records and Decoding instead of Tag Encoding.

view this post on Zulip Richard Feldman (Apr 20 2023 at 12:24):

yeah I think the common strategies are:

view this post on Zulip Richard Feldman (Apr 20 2023 at 12:24):

if JSON had those listed as options, it feels like that would cover 99% of use cases

view this post on Zulip Richard Feldman (Apr 20 2023 at 12:25):

I could see an argument for Custom (Str -> Str) as another option there

view this post on Zulip Luke Boswell (Apr 20 2023 at 12:30):

Thank you, I'll have a go at implementing that.

view this post on Zulip Luke Boswell (Apr 20 2023 at 12:33):

@Richard Feldman did you have any preference for using a Dict or would a List of these tags be the best way to go? Maybe Dict Str OptionTagsHere or something for configuration. The other issue is that at the moment we don't have a way to do uppercase lower case etc. I can make something quick and dirty for Basic Latin but we haven't got unicode support or anything yet.

view this post on Zulip Richard Feldman (Apr 20 2023 at 16:02):

hm yeah good point about uppercasing being locale-specific - I forgot about that :thinking:

view this post on Zulip Luke Boswell (Apr 21 2023 at 01:06):

Just pushed some updates to that PR. It's not ready for merge as it will need the fixes @Brian Carroll is working on. However thought I would share that it now has the capability to fully decode the following which is pretty neat. This includes a strategy for mapping the field names, which only supports Basic Latin (until we have a unicode package).

# Test complex example from IETF RFC 8259 (2017)
expect
    input =
        """
        {
            "Image": {
                "Width":  800,
                "Height": 600,
                "Title":  "View from 15th Floor",
                "Thumbnail": {
                    "Url":    "http://www.example.com/image/481989943",
                    "Height": 125,
                    "Width":  100
                },
                "Animated" : false,
                "Ids": [116, 943, 234, 38793]
            }
        }
        """
        |> Str.toUtf8

    decoder = jsonWithOptions { fieldNameMapping: PascalCase }
    actual = Decode.fromBytes input decoder
    expected = Ok {
        image: {
            width: 800,
            height: 600,
            title: "View from 15th Floor",
            thumbnail: {
                url: "http://www.example.com/image/481989943",
                height: 125,
                width: 100,
            },
            animated: Bool.false,
            ids: [116, 943, 234, 38793],
        },
    }

    actual == expected

view this post on Zulip Ayaz Hafiz (Apr 21 2023 at 01:17):

abilities are crazy :mind_blown: very cool Luke!


Last updated: Jul 05 2025 at 12:14 UTC