Stream: ideas

Topic: Revamped Encode and Decode


view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 04:39):

Though the current iterations of Encode and Decode have served us well, they have restrictions that make them less than ideal. While they are roughly based off of serde from rust, they are not powerful enough to represent many of the patterns that serde can.

State in Roc

For Decode in roc, it is fundamentally a list of bytes along with a formatting config being converted into a result any type. That said, the error type is only allowed to be TooShort.

For Encode in roc, it is any type with a formatting config to a list of bytes.

Roughly, how does that compare to serde?

For deserialization (decode), serde is any type (that a derserializer uses) to a result of any type with any error.

For serialization (encode), serde is any type to a result of any type (that a serializer uses) with any error.

Roc is missing a ton of flexibility

The core missing piece is that Encode and Decode need to be able to define the an internal state and use that to generate input/output. They can not be restricted to List U8. They can not be restricted in allowed errors variants.

What would this look like?

Luckily, we already have a perfect template for what the correct interface would look like (at least for Encode). Inspect is a ability that can take any type as input and allows the specific ability implementation to define exactly what the output is. I would argue that today, Inspect is a better Encode than Encode is. A perfect example of this is the GuiFormatter. Though a simple proof of concept, it is able to "encode" any data into a tree structure that can be render in a gui. It does this by allowing the implementation of GuiFormatter to define it's own internal state that is built up with ever call and eventual exposed as a tree.

That said, Inspect does not replace Encode. We can't just one to one map. Inspect has a different set of autoderive rules and defaults related to opaque types and functions that don't makes sense in encode. On top of that, both Encode and Decode likely should enforce the final return value being a Result of something. So we still need a bespoke interface.

Important Note

One minor loss of the Inspect api is that without higher kinded types, it is not possible to have a generic Inspect.inspect method that returns the final wanted output type. Instead, inspect returns the Formatter which must be unwrapped by a different function.

That said, I don't see this as a real drawback at all. It just means that the Json library will create a wrapper function that when called generates the formatter and then unwraps it. So the api will be Json.decode someJson instead of Decode.decode someJson. Decode.decode someJson would still work, but it would output a JsonDecoder instead of the final wanted roc type.

Proposed API

I don't want to write out this entire api, so just gonna note the core of the change.

Instead of:
Encoder fmt := List U8, fmt -> List U8 where fmt implements EncoderFormatting
Where we strictly have a List U8

It would be:
Encoder fmt err := fmt -> Result fmt err where fmt implements EncoderFormatting
The fmt is able to store data type that it likes.

Note: I am actually less sure this Decode change is correct. I noticed that serde had to use some sort of visitor builder pattern, so we may need to follow suit. But am not really sure.... Just would need to tinker with types.

Decoder would similarly be:
Decoder val fmt err := fmt -> Result { value: val, rest: fmt } err where fmt implements DecoderFormatting
where fmt is initialized with what is being decoded and would store all the internal state.

From that change, some other minor api differences would be needed, but it would mostly be the same, just with corrected types.


One important difference from Inspect and Encode/Decode. Inspect requires an init method that generates a formatter. I don't think Encode/Decode should have the same requirement. Instead, each api will decide what is required for initialization and expose that. For example, a JsonFormatter may require config options for initialization. So it would have Json.encode : x, Config -> List U8. Where it takes a piece of data to encode, a set of configurations options and then outputs a list of bytes. Internally, it would deal with intialization, running encode, extracting the List U8 result from its formatter, and unwrapping the result (I'm assuming the error type was chosen to be [])


Extra minor note, when we do this change, we should also add a special encoding/decoding method for at least dictionaries. Other options that we should consider adding are listed in the Deserializer trait required methods.

view this post on Zulip Sam Mohr (Jun 26 2024 at 04:53):

It seems like the majority of users would still have a very easy time using this API, but I just want to confirm that: we would still autoderive Encode and Decode, and it would be one extra function call, but otherwise basically the same effort to turn JSON into a Roc datatype?

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 04:56):

I guess two extra calls (though realistically, they would be all wrapped behind a single function call)

# Json.roc

encode : x, Config -> List U8
encode = \val, config ->
  initEncoder config
  |> Encode.encode val
  |> unwrapEncoder

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 04:57):

And yeah still autoderive in the same cases as today.

view this post on Zulip Sam Mohr (Jun 26 2024 at 04:59):

What is unwrapFormatter doing there?

view this post on Zulip Luke Boswell (Jun 26 2024 at 05:00):

Is roc-json the only implementation of Encode/Decode in the wild? If so it shouldn't be too hard to update just that one package and avoid breaking lots of things for users. We need a way to test these changes and roc-json would probably be a good choice to have a pair PR developed.

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 05:45):

unwrapEncoder : Result JsonEncoder [] -> List U8
unwrapEncoder = \res ->
    when res is
        Ok (@JsonEncoder {bytes}) -> bytes

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 05:46):

This is a case where I assume that json encoding chose to have no error types at all. So it's error tag is []. And I am assuming JsonEncoder := {bytes: List U8, config: Config}.

view this post on Zulip Luke Boswell (Jun 26 2024 at 06:53):

Slightly related. But for now I think we might want to throw an error if we are trying to encode something that we cant decode. For example with #5294 we can't decode a Dict. I'd love it if we could include that functionality when we make this larger change, though not sure if that would only complicate things.

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 14:55):

Yeah. That should be doable with this change. Both encode and decode have the option to return errors.

view this post on Zulip Brendan Hansknecht (Jun 26 2024 at 14:56):

It is up to the specific encoder/decoder to pick what errors it might generate

view this post on Zulip Trevor Settles (Jun 28 2024 at 03:08):

I've started on something in PR 6846. So far it's just been brute force changes, after adjusting the types

view this post on Zulip Brendan Hansknecht (Jul 06 2024 at 18:44):

As I am working on sqlite in basic webserver, I realized that even these revamped versions of decode/encode are potentially missing a big feature.

I'm not sure if it fits in standard decode or makes more sense in a separate ability (or always using bespoke code), but effectual decoding would be huge. For SQLite, I have a stmt and would like to decode directly from it. The decoding would be running effects to continue executing the query and load the data.

I'm sure the same would happen with many network protocols where you may want to decode from a network stream instead of waiting for all bytes to be loaded into a roc list before beginning to decode.

view this post on Zulip Brendan Hansknecht (Jul 06 2024 at 18:44):

Hmm... actually, if the state was a task, would that enable effectful decoding. I think it would, but I also expect that it would fail to compile with the current compiler

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 00:34):

One other related update to this api to suggest. This is done Inspect already.

I think we should make encode/decode have a function for both lists and dictionaries.
I also think that they should avoid being type locked to List a and Dict k v. Instead, they will take in any value and a walk function. So list would be an element walk function:

ElemWalker state collection elem : collection, state, (state, elem -> state) -> state

This way a List a, Set a, any other Container a can be encoded/decoded as a list.
For decode, it will instead take a container builder function (like List.append).

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 00:35):

In serde, they call these two encoding types as seq and map. They are used for any sort of container / dict type structure.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 00:38):

Serde api is built different, but for useful context:

// Seq
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        let mut seq = serializer.serialize_seq(Some(self.len()))?;
        for element in self {
            seq.serialize_element(element)?;
        }
        seq.end()
    }

// Map
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        let mut map = serializer.serialize_map(Some(self.len()))?;
        for (k, v) in self {
            map.serialize_entry(k, v)?;
        }
        map.end()
    }

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:41):

https://gist.github.com/bhansconnect/3854d9bc5951ea306e777745a92ec924
cc: @Trevor Settles

I am not 100% sure all of this is correct, but I think it is close. Also including changes from #API Design > Encode with list manifesting, I think this is what our Encode.roc and Decode.roc should be (maybe with an extra helper function or two). The types are a bit more complex in some places to deal with allocations and to make it more flexible.

Specifically, any list like collection can encode/decode the same as a list. Currently, I named this Encode.seq for any sequence. Just matched serde here. But names of course can change. Same with any dict like collection. They can now encode/decode the same as a dict via Encode.map. Again, took the serde name. We can change it.

Then also removed all List ... from the api. Instead requiring the use of lambdas that load values and build the type out. This avoids an allocation and instead makes things a few direct calls to build up the encoder.


Compared to serde we are still missing a few minor pieces and require extra mapping in some cases.

  1. Serde supports named structs (honestly not sure if this matter, would only be important for opaque types if it does. I guess it would be a way to distinguish two specific opaque types with the same internal data)
  2. Serde passes both the tag name and the tag index to the constructor. We only allow for the name due to tags not being nominal types. So opaque types are required for tags to map as integer variants. That said, we don't have any way to encode an integer variant. It has be encoded as something like { tag: U8, data: * }. I think we might need something special here that opaque types can opt into (but that definitely can be added later).
  3. Serde doesn't require field names in the serialized format to decode a record. Instead, the decoder is given a slice of field names and chooses how to use them. So it could ignore the names and just decode sequential like a tuple (many binary formats do this) or it could choose to check the names (like json would). This is likely something we may want to change, but I'm not fully sure. Otherwise, user need to change there decoder type to use a binary format. It would have to use all tuples and no records.

Thoughts? (point 3 being the most important for opinions)

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:43):

Note, for 3 passing a list in isn't as big of a deal as our other lists. The other lists where List of Encoders that could only be generated at runtime. This list would be a constant list of strings that could be compiled directly into the binary.

EDIT: I guess that would make the whole api change to be this:

    record :
        state,
        List Str,
        (state, U64 -> [Next (Decoder state state err), TooLong]),
        (state -> Result val err)
        -> Decoder state val err where state implements DecoderFormatting

Basically the exact same as a tuple, but you also pass in a List Str of field names. The implementation if encodes/decodes field names can look up the name in the list to get the index. If it doesn't encode field names, it can just ignore them

EDIT 2: Since I realized this is simple, I just put it in the gist as well.

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:46):

Have we got a design for encoding/decoding Dicts?

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:47):

Yep! It's in that gist I posted.

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:47):

Ohk, having trouble seeing it

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:47):

https://gist.github.com/bhansconnect/3854d9bc5951ea306e777745a92ec924#file-decode-roc-L92-L97

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:49):

I called it map to match serde...It is generic. So it could work for Dict k v or OrderedMap k v or EtcMap k v

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:49):

maybe I should still call it dict

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:49):

Ohk, I see. So Seq is for a collection that only has values, and Map is for collections with values and keys

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:50):

Yeah. So Seq for List a, Set a, etc

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:51):

May be a bit lame, but I find this simpler

ValueInit
ValueBuilder

KeyValueInit
KeyValueBuilder

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:53):

Is it not possible to also combine record, tags and tuples in the same way? Like a record or tag is just a KeyValue but with Str names, and tuples might be KeyValue but with numbers for names?

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:53):

The original names where:

ElemInit
ElemBuilder

KeyValueInit
KeyValueBuilder

But it doesn't quite make sense. it isn't initializing an Elem or a KeyValue. It is initializing a List elem or Dict key value

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 02:54):

Is it not possible to also combine record, tags and tuples in the same way? Like a record or tag is just a KeyValue but with Str names, and tuples might be KeyValue but with numbers for names?

No. Cause this type of combination function only works if all elements are the same type, all keys are the same type, and all values are the same type. That is not true for tuples and records.

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:57):

Brendan Hansknecht said:

But it doesn't quite make sense. it isn't initializing an Elem or a KeyValue. It is initializing a List elem or Dict key value

Hence using the words Sequence or Mapping to generate of of these things.

view this post on Zulip Luke Boswell (Jul 08 2024 at 02:59):

Would you consider spelling them out in full, considering this isn't going to be seen every day. Like

SequenceInit
SequenceBuilder

MappingInit
MappingBuilder

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:00):

Then Encode.mapping and Encode.sequence. I think that reads really well actually. I would be for it.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:07):

Updated gist with new names.

view this post on Zulip Luke Boswell (Jul 08 2024 at 03:07):

I think this API looks good.

For point 3. -- If I understand correctly the issue is that when we pass the fields of a record to be decoded they won't be ordered. I would have assumed this ordering would be stable, like sorted by field name.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:09):

I think we can just order it by name or order by alignment and name. Either would be fine. Just have to pick something stable and fly with it.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:09):

Given we auto generate the encoder, the auto generated encoders can just pick something and sort

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:10):

For user generated encoders. They can pick any order they please.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:11):

It is decided by the order of the list of names passed to decode for a decoder. For encoders, it is order by the order in with the field encoders are called. This is 100% controllable in code.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:11):

Just can't be field declaration order for roc cause that has no meaning.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:12):

Personally, I think alignment then name makes the most sense cause that is also what we expose to the platform (plus for some formats, it might make a difference on bytes used just like it does on hardware)

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:13):

That said, if we just want simplicity, I see zero problem with alphabetical by default.

view this post on Zulip Trevor Settles (Jul 08 2024 at 03:20):

This looks neat! Ok, just checking my understanding here. This new API would define a new way to encode/decode list-like things and map-like things either streaming, or all in one go. How would records and tuples fit into this? Since they don't contain a single type, but instead contain a mix of types

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 03:21):

Records and tuples still have their own special functions cause they can't use the sequence/mapping functions

view this post on Zulip Richard Feldman (Jul 08 2024 at 10:31):

Brendan Hansknecht said:

Personally, I think alignment then name makes the most sense cause that is also what we expose to the platform (plus for some formats, it might make a difference on bytes used just like it does on hardware)

yeah, although the downside would be that now application authors have to learn about alignment if they want to understand how fields are encoded :big_smile:

view this post on Zulip Richard Feldman (Jul 08 2024 at 10:32):

that said, for binary formats I could imagine that making a significant performance difference :thinking:

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 15:27):

It actually probably won't for most binary formats

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 15:27):

Most pack without any padding no matter the order

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 15:29):

So only a handful of more tailored binary formats would see benefits. Generally the binary formats that are tailored to attempt to match a languages runtime format.

view this post on Zulip Brendan Hansknecht (Jul 08 2024 at 15:30):

Also, I think it mostly would be opaque to application authors. More a funny quirk than something they have to learn about. If they want to control order they have to make an opaque type (or tuple I guess). If they have a standard record, they probably will just think that roc has a strange order and not know why....at least that would be my guess.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 00:08):

One other random thought/question:

Should DecodeResult return the decoder state even on failure? I changed it to Result (state, val) err. My thought was that on failure to decode, the decoder may be in an invalid state. So there isn't a point of returning a failed decoder. But maybe I am missing something.

view this post on Zulip Luke Boswell (Jul 09 2024 at 00:12):

I'm not sure.

view this post on Zulip Trevor Settles (Jul 09 2024 at 00:13):

What about trying to get the results in the middle of a stream? There might be a better way around that though

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 00:18):

What about trying to get the results in the middle of a stream?

Like skipping some part of the stream and then decoding into a result?

view this post on Zulip Trevor Settles (Jul 09 2024 at 03:42):

That was in reference to adding state to the DecodeResult type. (This may be a contrived example) Say you've got a steam of bytes. Calling decode part way through the stream would give part of the List of elems for val. Calling decode on the state from Ok (state, val) would only return elems that haven't already been decoded.

I was trying to come up with some example where the act of decoding itself would cause a change in state itself. If that's not clear, its on me. Its just the seed of an idea I just came up with

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 03:45):

state is actually already in the decode result. The difference is if you get a state on the failure case or not:

Result (state, val) err

vs
{ val: Result val err, rest: state }

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 05:17):

Also, I am now working on implementing a version of the new decoder and encoder just to make sure the types in my gist are correct. Definitely have some pieces that are off.

https://github.com/bhansconnect/roc-msgpack

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 05:20):

I've just been updating the gist as I notice issues.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 05:51):

Something definitely still doesn't feel right with the record encoding signature. Playing around with it as I am trying to implement the encoder.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 06:33):

This is my current iteration of record encoding (not sure if this is the best way, but it avoids the list allocation):

TestRGB := { r : U8, g : U8, b : U8 }
    implements [
        FutureEncoding {
            toFutureEncoder: toFutureEncoderTestRGB,
        },
    ]

toFutureEncoderTestRGB : TestRGB -> FutureEncoder state
toFutureEncoderTestRGB = \@TestRGB { r, g, b } ->
    encodeFields =
        FutureEncode.namedField "r" (FutureEncode.u8 r)
        |> FutureEncode.chain (FutureEncode.namedField "g" (FutureEncode.u8 g))
        |> FutureEncode.chain (FutureEncode.namedField "b" (FutureEncode.u8 b))

    FutureEncode.record 3 encodeFields

The definition seems to be valid. Basically you define how to encode all fields and pass that into the record encoder function. That said, this form breaks the compiler if I try to call encodeFields in the record encoder:

encodeRecord = \size, addFields ->
    encodeHeader size
    # Works if the following line is commented out
    |> FutureEncode.chain addFields

Errors with:

thread 'main' panicked at crates/compiler/mono/src/borrow.rs:396:33:
internal error: entered unreachable code: no borrow signature for LambdaName { name: `18.IdentId(119)`, niche: Niche(Captures([InLayout(146), InLayout(147)])) } layout
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 15:53):

Oh, I think I found a nicer api for records:

toFutureEncoderTestRGB : TestRGB -> FutureEncoder state
toFutureEncoderTestRGB = \@TestRGB { r, g, b } ->
    FutureEncode.record 3 \state, addNamedField ->
        state
        |> addNamedField "r" (FutureEncode.u8 r)
        |> addNamedField "g" (FutureEncode.u8 g)
        |> addNamedField "b" (FutureEncode.u8 b)

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 15:55):

Then the actually encoder implementation looks like this:

encodeRecord : U64, (MsgPack, FutureEncode.NamedFieldFn MsgPack MsgPack -> MsgPack) -> FutureEncoder MsgPack
encodeRecord = \size, addFields ->
    msgPack <- FutureEncode.custom
    msgPack
    |> FutureEncode.appendWith (encodeHeader size)
    |> addFields encodeNamedField

encodeNamedField : MsgPack, Str, FutureEncoder MsgPack -> MsgPack
encodeNamedField = \@MsgPack res, key, value ->
    when res is
        Ok { bytes, encodeFieldNames } ->
            if encodeFieldNames then
                @MsgPack (Ok { bytes, encodeFieldNames })
                |> FutureEncode.appendWith (encodeString key)
                |> FutureEncode.appendWith value
            else
                @MsgPack (Ok { bytes, encodeFieldNames })
                |> FutureEncode.appendWith value

        Err e ->
            @MsgPack (Err e)

Sadly, still can seem to win the type battle. Now hitting:

ambient functions don't unify
Location: crates/compiler/unify/src/unify.rs:201:18

view this post on Zulip Anton (Jul 09 2024 at 17:33):

@Brendan Hansknecht in case it can still be of value; I also just hit no borrow signature for LambdaName and it happened because I called a function (that takes one argument) with zero arguments.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 22:46):

From what I can tell, the issue seems to be related to an encoder capturing another encoder in its lambda closure.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 23:01):

Running with a debug build of the compiler I get:

thread '<unnamed>' panicked at /Users/bren077s/Projects/roc/crates/compiler/unify/src/unify.rs:1217:5:
member signature should not have solved lambda sets

Does that have meaning to anyone?

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 00:02):

After a bit more messing with types and trying slightly different apis, I at least have something which I think is useful:

Full source: procedure `FutureEncode.appendWith` (`FutureEncode.state`, `FutureEncode.122`):
    let `FutureEncode.164` : [C [C U64, C , C , C ], C {List U8, Int1}] = CallByName `MsgPack.163` `FutureEncode.state` `FutureEncode.122`;
    ret `FutureEncode.164`;
IR PROBLEMS FOUND:
── PROC SPECIALIZATION NOT DEFINED ─────────────────────────────────────────────

in appendWith : ([C [C U64, C , C , C ], C {List U8, Int1}], {U64, []}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {}))

0│  procedure `FutureEncode.appendWith` (`FutureEncode.state`, `FutureEncode.122`):
1│>     let `FutureEncode.164` : [C [C U64, C , C , C ], C {List U8, Int1}] = CallByName `MsgPack.163` `FutureEncode.state` `FutureEncode.122`;
2│      ret `FutureEncode.164`;

No specialization

163 : ([C [C U64, C , C , C ], C {List U8, Int1}], {U64, []}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {U64, []}))

was found

The following specializations of MsgPack.163 were built:163 :
([C [C U64, C , C , C ], C {List U8, Int1}], {U64, {U8, U8, U8}}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {U64, {U8, U8, U8}}))

Caused by this test: https://github.com/bhansconnect/roc-msgpack/blob/3d545816c18b9a7fd74746137f280348c123f18c/package/MsgPack.roc#L443-L447

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 00:02):

That said, I still don't really know how to debug it.

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 03:42):

Ok. A little more context that I have derived:

  1. The bug seems to always happen when calling one encoder from within another. So always when you are inside an encoder and call FutureEncode.appendWith
  2. The expected lambdaset definition is wrong. It is missing captures.

More specific on 2:
If you look at the printout above, the wanted specialization is missing {U64, {U8, U8, U8}}. {U64, {U8, U8, U8}} is the actual data captured by the passed in encoder. U64 is the size. {U8, U8, U8} is { r : U8, g : U8, b : U8 } .

So fundamentally we are ending up with a call to the wrong specialization that is missing captures.

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 03:44):

I am uncertain why this affects the modified version of encode, but does not seem to affect Inspect. I think it also should have the same issue. (also possible it affects Inspect it just is more specific than I realize so I haven't hit it)

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 04:00):

@Ayaz Hafiz I know you are very busy nowadays, but if you have any time to give tips or take a quick look, it would be greatly appreciate. I don't know much of what level of complexity this form of error falls into. Like I get the rough idea of what is wrong, but I know nothing of the code that actually does with the wiring here.

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:15):

to be honest i’m not sure i can give any advice without debugging this, which i don’t have the bandwidth for ATM unfortunately. I would say probably the easiest thing to do here would be to finish implementing boxed closures. it will be much simpler and problems like this will likely not disappear, but be much easier diagnose and resolve for anyone

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 04:15):

Fair answer. Do we have any documentation on the state of that

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:16):

there’s a long standing issue with recursive structural types, which may be the problem here too, i discusses a suggestion for how it can be fixed elsewhere

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:16):

umm yes give me 10 minutes i’ll type it up

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:16):

Is the boxed closure work something that a layperson can work on, or does it really need expertise that's expensive to transfer? I should be able to help with that once I finish with built-in Tasks and documentation for the new record builder syntax.

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:17):

my hope is anyone can work on it

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:17):

Hell yeah brother

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:18):

that’s partially why i want it to land/am upset at myself that i didn’t land it when i had more time to give to Roc, it’s a much simpler and easier model than what currently exists (at the expense of other things obviously)

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:18):

Well, the nice thing is that Roc seems to be getting better with respect to its bus factor

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:19):

So you being unavailable isn't blocking the change from going through

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:19):

I think it's a good thing to recognize. And also, don't beat yourself up too much, you've already given so much for this project.

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:27):

alright here's the high level of how i think it should be implemented
https://github.com/roc-lang/rfcs/blob/main/0012-type-erasure.md

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:28):

here's the only PR i made for it: https://github.com/roc-lang/roc/pull/5576

it supports erased closure types for very simple programs

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:29):

here's an example of a program and the corresponding IR I sent to Folkert recently. It's worth understanding it I think
https://gist.github.com/ayazhafiz/dd16a5586b9621dce061c902e6d44bb0

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:34):

The missing pieces are:

view this post on Zulip Sam Mohr (Jul 10 2024 at 04:37):

Okay, cool. Unless someone else starts on this before I get back from my trip, I'll be in Japan until July 23rd, after which point I'll try to David this Goliath.

view this post on Zulip Ayaz Hafiz (Jul 10 2024 at 04:38):

love the metaphor

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 15:55):

Thanks Ayaz!

Looping back to the general encode/decode design. I realized one more inconsistency that I thought might be good to reconsider:

For contains, we explicitly pass in an element encode function elem -> Encoder state. Instead we could skip that by requiring where elem implements Encoding. This is a simpler api with one extra arg, but does have a minor but important difference. If we take elem -> Encoder state it allows for opaque containers to modify or otherwise change what the element encoder function would be when calling Encode.sequence. I'm not sure if this matters in practice, but I am curious what other thinks:
A)

    sequence :
        seq,
        [Size U64, UnknownSize],
        SequenceWalker state seq elem,
        (elem -> Encoder state)
        -> Encoder state where state implements EncoderFormatting

vs
B)

    sequence :
        seq,
        [Size U64, UnknownSize],
        SequenceWalker state seq elem,
        -> Encoder state where elem implements Encoding, state implements EncoderFormatting

With A, you would pass in an elem -> Encoder state function that transforms the element type in some way or ignores fields before encoding. It enables a container of non-encodable things to be encoded if you write a custom mapper.

With B, that power is still technically available, but it has to be obtained by modifying the SequenceWalker to transform the element type. So probably less likely to be used.

Note: with std lib containers and auto-derive, there is no difference between A and B. So only matters for user defined opaque types.

view this post on Zulip Luke Boswell (Aug 08 2024 at 21:12):

Is this the issue for tracking tag decoding?

Would future decode support tag unions?

view this post on Zulip Brendan Hansknecht (Aug 08 2024 at 21:21):

I think deciding supports tags. Maybe it just doesn't auto derive

view this post on Zulip Brendan Hansknecht (Aug 08 2024 at 21:32):

Oh, method is literally missing

view this post on Zulip Brendan Hansknecht (Aug 08 2024 at 21:40):

And we want to support tags both by name and by index for encode and decode. Index will just be alphabetical

view this post on Zulip Richard Feldman (Aug 08 2024 at 21:51):

yeah we never implemented it

view this post on Zulip Luke Boswell (Aug 13 2024 at 07:40):

I added an issue to track Type Erasure from abocve

view this post on Zulip Richard Feldman (Aug 13 2024 at 11:07):

btw I'd definitely recommend anyone talk to @Folkert de Vries about this before starting on implementation stuff, to get tips on how to proceed!

view this post on Zulip Simon Taeter (Feb 26 2025 at 14:33):

Hello there :) Complete beginner here :hello:

I didn't take the time to read the whole discussion here but I had a reflection about the Encode and Hash section of the design article and I believe this is the place to talk about it.

I am writing a lot of Elm and I have to say codecs saved my life. They combine Encoding and Decoding into a single type and guarantee that what you encode will be decoded in the exact same way. Through that lens, mashing the hashing consept into the encoding one makes a lot less sense. Encoding comes with the implicit idea that your data is conserved, only encoded in an other format.

Other notes :

And again on the following paragraph, the use of bot isEq and isNotEq could create strange results if their implementations do not perfectly match (eg. expect isEq a b == True and expect isNotEq a b == True at the same time). Though I guess that it was an example and probably not the actual implementation you would go for.

Though I just got on board with Roc so I hope my remarks are relevant to you :)

view this post on Zulip Brendan Hansknecht (Feb 26 2025 at 17:58):

Welcome!

view this post on Zulip Brendan Hansknecht (Feb 26 2025 at 18:04):

Just took a skim over where our hash and encode abilities landed.

They definitely could be forced to be merged together. That said, they have some pertinent differences. A very simple example is that when hashing a set, you have to hash without considering order. When serializing a set, you have to pick an order to serialize to (most serialization formats only have lists and don't have sets). So I dont think they should be merged in practice. On top of that, encoding does lots of extra work that would just slow down hashing

view this post on Zulip Brendan Hansknecht (Feb 26 2025 at 18:05):

As for codecs, that is a really interesting idea. I do think we will want a separate encode and decode either way though. It is valid to encode to a lossy format that has no equivalent decoder. A simple example is that you could use encode to pretty print a datastructure.

view this post on Zulip Brendan Hansknecht (Feb 26 2025 at 18:06):

That said, enabling codec libraries sounds like a good idea to keep track of

view this post on Zulip Brendan Hansknecht (Feb 26 2025 at 18:06):

Cause most formats will want encode and decode

view this post on Zulip Simon Taeter (Feb 27 2025 at 13:24):

Brendan Hansknecht said:

A simple example is that you could use encode to pretty print a datastructure.

To me pretty print is more of a toStr thing though.

A codec ability might make sense here. Though I believe you want to remove abilities? Curious to know about the proposed alternative.

view this post on Zulip Anthony Bullard (Feb 27 2025 at 13:46):

Abilities as a separate concept is going way, but you can create a type alias that works in a similar fashion

view this post on Zulip Anthony Bullard (Feb 27 2025 at 13:51):

@Simon Taeter Here is a link to the relevant section of the proposal: https://docs.google.com/document/d/1OUd0f4PQjH8jb6i1vEJ5DOnfpVBJbGTjnCakpXAYeT8/edit?tab=t.0#heading=h.x6h2qzxs7o9o


Last updated: Jun 16 2026 at 16:19 UTC