Though the current iterations of Encode and Decode have served us well, they have restrictions that make them less than ideal. While they are roughly based off of serde from rust, they are not powerful enough to represent many of the patterns that serde can.
For Decode in roc, it is fundamentally a list of bytes along with a formatting config being converted into a result any type. That said, the error type is only allowed to be TooShort.
For Encode in roc, it is any type with a formatting config to a list of bytes.
For deserialization (decode), serde is any type (that a derserializer uses) to a result of any type with any error.
For serialization (encode), serde is any type to a result of any type (that a serializer uses) with any error.
The core missing piece is that Encode and Decode need to be able to define the an internal state and use that to generate input/output. They can not be restricted to List U8. They can not be restricted in allowed errors variants.
Luckily, we already have a perfect template for what the correct interface would look like (at least for Encode). Inspect is a ability that can take any type as input and allows the specific ability implementation to define exactly what the output is. I would argue that today, Inspect is a better Encode than Encode is. A perfect example of this is the GuiFormatter. Though a simple proof of concept, it is able to "encode" any data into a tree structure that can be render in a gui. It does this by allowing the implementation of GuiFormatter to define it's own internal state that is built up with ever call and eventual exposed as a tree.
That said, Inspect does not replace Encode. We can't just one to one map. Inspect has a different set of autoderive rules and defaults related to opaque types and functions that don't makes sense in encode. On top of that, both Encode and Decode likely should enforce the final return value being a Result of something. So we still need a bespoke interface.
One minor loss of the Inspect api is that without higher kinded types, it is not possible to have a generic Inspect.inspect method that returns the final wanted output type. Instead, inspect returns the Formatter which must be unwrapped by a different function.
That said, I don't see this as a real drawback at all. It just means that the Json library will create a wrapper function that when called generates the formatter and then unwraps it. So the api will be Json.decode someJson instead of Decode.decode someJson. Decode.decode someJson would still work, but it would output a JsonDecoder instead of the final wanted roc type.
I don't want to write out this entire api, so just gonna note the core of the change.
Instead of:
Encoder fmt := List U8, fmt -> List U8 where fmt implements EncoderFormatting
Where we strictly have a List U8
It would be:
Encoder fmt err := fmt -> Result fmt err where fmt implements EncoderFormatting
The fmt is able to store data type that it likes.
Note: I am actually less sure this
Decodechange is correct. I noticed that serde had to use some sort of visitor builder pattern, so we may need to follow suit. But am not really sure.... Just would need to tinker with types.
Decoder would similarly be:
Decoder val fmt err := fmt -> Result { value: val, rest: fmt } err where fmt implements DecoderFormatting
where fmt is initialized with what is being decoded and would store all the internal state.
From that change, some other minor api differences would be needed, but it would mostly be the same, just with corrected types.
One important difference from Inspect and Encode/Decode. Inspect requires an init method that generates a formatter. I don't think Encode/Decode should have the same requirement. Instead, each api will decide what is required for initialization and expose that. For example, a JsonFormatter may require config options for initialization. So it would have Json.encode : x, Config -> List U8. Where it takes a piece of data to encode, a set of configurations options and then outputs a list of bytes. Internally, it would deal with intialization, running encode, extracting the List U8 result from its formatter, and unwrapping the result (I'm assuming the error type was chosen to be [])
Extra minor note, when we do this change, we should also add a special encoding/decoding method for at least dictionaries. Other options that we should consider adding are listed in the Deserializer trait required methods.
It seems like the majority of users would still have a very easy time using this API, but I just want to confirm that: we would still autoderive Encode and Decode, and it would be one extra function call, but otherwise basically the same effort to turn JSON into a Roc datatype?
I guess two extra calls (though realistically, they would be all wrapped behind a single function call)
# Json.roc
encode : x, Config -> List U8
encode = \val, config ->
initEncoder config
|> Encode.encode val
|> unwrapEncoder
And yeah still autoderive in the same cases as today.
What is unwrapFormatter doing there?
Is roc-json the only implementation of Encode/Decode in the wild? If so it shouldn't be too hard to update just that one package and avoid breaking lots of things for users. We need a way to test these changes and roc-json would probably be a good choice to have a pair PR developed.
unwrapEncoder : Result JsonEncoder [] -> List U8
unwrapEncoder = \res ->
when res is
Ok (@JsonEncoder {bytes}) -> bytes
This is a case where I assume that json encoding chose to have no error types at all. So it's error tag is []. And I am assuming JsonEncoder := {bytes: List U8, config: Config}.
Slightly related. But for now I think we might want to throw an error if we are trying to encode something that we cant decode. For example with #5294 we can't decode a Dict. I'd love it if we could include that functionality when we make this larger change, though not sure if that would only complicate things.
Yeah. That should be doable with this change. Both encode and decode have the option to return errors.
It is up to the specific encoder/decoder to pick what errors it might generate
I've started on something in PR 6846. So far it's just been brute force changes, after adjusting the types
As I am working on sqlite in basic webserver, I realized that even these revamped versions of decode/encode are potentially missing a big feature.
I'm not sure if it fits in standard decode or makes more sense in a separate ability (or always using bespoke code), but effectual decoding would be huge. For SQLite, I have a stmt and would like to decode directly from it. The decoding would be running effects to continue executing the query and load the data.
I'm sure the same would happen with many network protocols where you may want to decode from a network stream instead of waiting for all bytes to be loaded into a roc list before beginning to decode.
Hmm... actually, if the state was a task, would that enable effectful decoding. I think it would, but I also expect that it would fail to compile with the current compiler
One other related update to this api to suggest. This is done Inspect already.
I think we should make encode/decode have a function for both lists and dictionaries.
I also think that they should avoid being type locked to List a and Dict k v. Instead, they will take in any value and a walk function. So list would be an element walk function:
ElemWalker state collection elem : collection, state, (state, elem -> state) -> state
This way a List a, Set a, any other Container a can be encoded/decoded as a list.
For decode, it will instead take a container builder function (like List.append).
In serde, they call these two encoding types as seq and map. They are used for any sort of container / dict type structure.
Serde api is built different, but for useful context:
// Seq
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for element in self {
seq.serialize_element(element)?;
}
seq.end()
}
// Map
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(self.len()))?;
for (k, v) in self {
map.serialize_entry(k, v)?;
}
map.end()
}
https://gist.github.com/bhansconnect/3854d9bc5951ea306e777745a92ec924
cc: @Trevor Settles
I am not 100% sure all of this is correct, but I think it is close. Also including changes from #API Design > Encode with list manifesting, I think this is what our Encode.roc and Decode.roc should be (maybe with an extra helper function or two). The types are a bit more complex in some places to deal with allocations and to make it more flexible.
Specifically, any list like collection can encode/decode the same as a list. Currently, I named this Encode.seq for any sequence. Just matched serde here. But names of course can change. Same with any dict like collection. They can now encode/decode the same as a dict via Encode.map. Again, took the serde name. We can change it.
Then also removed all List ... from the api. Instead requiring the use of lambdas that load values and build the type out. This avoids an allocation and instead makes things a few direct calls to build up the encoder.
Compared to serde we are still missing a few minor pieces and require extra mapping in some cases.
{ tag: U8, data: * }. I think we might need something special here that opaque types can opt into (but that definitely can be added later). Thoughts? (point 3 being the most important for opinions)
Note, for 3 passing a list in isn't as big of a deal as our other lists. The other lists where List of Encoders that could only be generated at runtime. This list would be a constant list of strings that could be compiled directly into the binary.
EDIT: I guess that would make the whole api change to be this:
record :
state,
List Str,
(state, U64 -> [Next (Decoder state state err), TooLong]),
(state -> Result val err)
-> Decoder state val err where state implements DecoderFormatting
Basically the exact same as a tuple, but you also pass in a List Str of field names. The implementation if encodes/decodes field names can look up the name in the list to get the index. If it doesn't encode field names, it can just ignore them
EDIT 2: Since I realized this is simple, I just put it in the gist as well.
Have we got a design for encoding/decoding Dicts?
Yep! It's in that gist I posted.
Ohk, having trouble seeing it
https://gist.github.com/bhansconnect/3854d9bc5951ea306e777745a92ec924#file-decode-roc-L92-L97
I called it map to match serde...It is generic. So it could work for Dict k v or OrderedMap k v or EtcMap k v
maybe I should still call it dict
Ohk, I see. So Seq is for a collection that only has values, and Map is for collections with values and keys
Yeah. So Seq for List a, Set a, etc
May be a bit lame, but I find this simpler
ValueInit
ValueBuilder
KeyValueInit
KeyValueBuilder
Is it not possible to also combine record, tags and tuples in the same way? Like a record or tag is just a KeyValue but with Str names, and tuples might be KeyValue but with numbers for names?
The original names where:
ElemInit
ElemBuilder
KeyValueInit
KeyValueBuilder
But it doesn't quite make sense. it isn't initializing an Elem or a KeyValue. It is initializing a List elem or Dict key value
Is it not possible to also combine record, tags and tuples in the same way? Like a record or tag is just a KeyValue but with Str names, and tuples might be KeyValue but with numbers for names?
No. Cause this type of combination function only works if all elements are the same type, all keys are the same type, and all values are the same type. That is not true for tuples and records.
Brendan Hansknecht said:
But it doesn't quite make sense. it isn't initializing an Elem or a KeyValue. It is initializing a
List elemorDict key value
Hence using the words Sequence or Mapping to generate of of these things.
Would you consider spelling them out in full, considering this isn't going to be seen every day. Like
SequenceInit
SequenceBuilder
MappingInit
MappingBuilder
Then Encode.mapping and Encode.sequence. I think that reads really well actually. I would be for it.
Updated gist with new names.
I think this API looks good.
For point 3. -- If I understand correctly the issue is that when we pass the fields of a record to be decoded they won't be ordered. I would have assumed this ordering would be stable, like sorted by field name.
I think we can just order it by name or order by alignment and name. Either would be fine. Just have to pick something stable and fly with it.
Given we auto generate the encoder, the auto generated encoders can just pick something and sort
For user generated encoders. They can pick any order they please.
It is decided by the order of the list of names passed to decode for a decoder. For encoders, it is order by the order in with the field encoders are called. This is 100% controllable in code.
Just can't be field declaration order for roc cause that has no meaning.
Personally, I think alignment then name makes the most sense cause that is also what we expose to the platform (plus for some formats, it might make a difference on bytes used just like it does on hardware)
That said, if we just want simplicity, I see zero problem with alphabetical by default.
This looks neat! Ok, just checking my understanding here. This new API would define a new way to encode/decode list-like things and map-like things either streaming, or all in one go. How would records and tuples fit into this? Since they don't contain a single type, but instead contain a mix of types
Records and tuples still have their own special functions cause they can't use the sequence/mapping functions
Brendan Hansknecht said:
Personally, I think alignment then name makes the most sense cause that is also what we expose to the platform (plus for some formats, it might make a difference on bytes used just like it does on hardware)
yeah, although the downside would be that now application authors have to learn about alignment if they want to understand how fields are encoded :big_smile:
that said, for binary formats I could imagine that making a significant performance difference :thinking:
It actually probably won't for most binary formats
Most pack without any padding no matter the order
So only a handful of more tailored binary formats would see benefits. Generally the binary formats that are tailored to attempt to match a languages runtime format.
Also, I think it mostly would be opaque to application authors. More a funny quirk than something they have to learn about. If they want to control order they have to make an opaque type (or tuple I guess). If they have a standard record, they probably will just think that roc has a strange order and not know why....at least that would be my guess.
One other random thought/question:
Should DecodeResult return the decoder state even on failure? I changed it to Result (state, val) err. My thought was that on failure to decode, the decoder may be in an invalid state. So there isn't a point of returning a failed decoder. But maybe I am missing something.
I'm not sure.
What about trying to get the results in the middle of a stream? There might be a better way around that though
What about trying to get the results in the middle of a stream?
Like skipping some part of the stream and then decoding into a result?
That was in reference to adding state to the DecodeResult type. (This may be a contrived example) Say you've got a steam of bytes. Calling decode part way through the stream would give part of the List of elems for val. Calling decode on the state from Ok (state, val) would only return elems that haven't already been decoded.
I was trying to come up with some example where the act of decoding itself would cause a change in state itself. If that's not clear, its on me. Its just the seed of an idea I just came up with
state is actually already in the decode result. The difference is if you get a state on the failure case or not:
Result (state, val) err
vs
{ val: Result val err, rest: state }
Also, I am now working on implementing a version of the new decoder and encoder just to make sure the types in my gist are correct. Definitely have some pieces that are off.
https://github.com/bhansconnect/roc-msgpack
I've just been updating the gist as I notice issues.
Something definitely still doesn't feel right with the record encoding signature. Playing around with it as I am trying to implement the encoder.
This is my current iteration of record encoding (not sure if this is the best way, but it avoids the list allocation):
TestRGB := { r : U8, g : U8, b : U8 }
implements [
FutureEncoding {
toFutureEncoder: toFutureEncoderTestRGB,
},
]
toFutureEncoderTestRGB : TestRGB -> FutureEncoder state
toFutureEncoderTestRGB = \@TestRGB { r, g, b } ->
encodeFields =
FutureEncode.namedField "r" (FutureEncode.u8 r)
|> FutureEncode.chain (FutureEncode.namedField "g" (FutureEncode.u8 g))
|> FutureEncode.chain (FutureEncode.namedField "b" (FutureEncode.u8 b))
FutureEncode.record 3 encodeFields
The definition seems to be valid. Basically you define how to encode all fields and pass that into the record encoder function. That said, this form breaks the compiler if I try to call encodeFields in the record encoder:
encodeRecord = \size, addFields ->
encodeHeader size
# Works if the following line is commented out
|> FutureEncode.chain addFields
Errors with:
thread 'main' panicked at crates/compiler/mono/src/borrow.rs:396:33:
internal error: entered unreachable code: no borrow signature for LambdaName { name: `18.IdentId(119)`, niche: Niche(Captures([InLayout(146), InLayout(147)])) } layout
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Oh, I think I found a nicer api for records:
toFutureEncoderTestRGB : TestRGB -> FutureEncoder state
toFutureEncoderTestRGB = \@TestRGB { r, g, b } ->
FutureEncode.record 3 \state, addNamedField ->
state
|> addNamedField "r" (FutureEncode.u8 r)
|> addNamedField "g" (FutureEncode.u8 g)
|> addNamedField "b" (FutureEncode.u8 b)
Then the actually encoder implementation looks like this:
encodeRecord : U64, (MsgPack, FutureEncode.NamedFieldFn MsgPack MsgPack -> MsgPack) -> FutureEncoder MsgPack
encodeRecord = \size, addFields ->
msgPack <- FutureEncode.custom
msgPack
|> FutureEncode.appendWith (encodeHeader size)
|> addFields encodeNamedField
encodeNamedField : MsgPack, Str, FutureEncoder MsgPack -> MsgPack
encodeNamedField = \@MsgPack res, key, value ->
when res is
Ok { bytes, encodeFieldNames } ->
if encodeFieldNames then
@MsgPack (Ok { bytes, encodeFieldNames })
|> FutureEncode.appendWith (encodeString key)
|> FutureEncode.appendWith value
else
@MsgPack (Ok { bytes, encodeFieldNames })
|> FutureEncode.appendWith value
Err e ->
@MsgPack (Err e)
Sadly, still can seem to win the type battle. Now hitting:
ambient functions don't unify
Location: crates/compiler/unify/src/unify.rs:201:18
@Brendan Hansknecht in case it can still be of value; I also just hit no borrow signature for LambdaName and it happened because I called a function (that takes one argument) with zero arguments.
From what I can tell, the issue seems to be related to an encoder capturing another encoder in its lambda closure.
Running with a debug build of the compiler I get:
thread '<unnamed>' panicked at /Users/bren077s/Projects/roc/crates/compiler/unify/src/unify.rs:1217:5:
member signature should not have solved lambda sets
Does that have meaning to anyone?
After a bit more messing with types and trying slightly different apis, I at least have something which I think is useful:
Full source: procedure `FutureEncode.appendWith` (`FutureEncode.state`, `FutureEncode.122`):
let `FutureEncode.164` : [C [C U64, C , C , C ], C {List U8, Int1}] = CallByName `MsgPack.163` `FutureEncode.state` `FutureEncode.122`;
ret `FutureEncode.164`;
IR PROBLEMS FOUND:
── PROC SPECIALIZATION NOT DEFINED ─────────────────────────────────────────────
in appendWith : ([C [C U64, C , C , C ], C {List U8, Int1}], {U64, []}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {}))
0│ procedure `FutureEncode.appendWith` (`FutureEncode.state`, `FutureEncode.122`):
1│> let `FutureEncode.164` : [C [C U64, C , C , C ], C {List U8, Int1}] = CallByName `MsgPack.163` `FutureEncode.state` `FutureEncode.122`;
2│ ret `FutureEncode.164`;
No specialization
163 : ([C [C U64, C , C , C ], C {List U8, Int1}], {U64, []}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {U64, []}))
was found
The following specializations of MsgPack.163 were built:163 :
([C [C U64, C , C , C ], C {List U8, Int1}], {U64, {U8, U8, U8}}) ->
[C [C U64, C , C , C ], C {List U8, Int1}] ((niche {U64, {U8, U8, U8}}))
Caused by this test: https://github.com/bhansconnect/roc-msgpack/blob/3d545816c18b9a7fd74746137f280348c123f18c/package/MsgPack.roc#L443-L447
That said, I still don't really know how to debug it.
Ok. A little more context that I have derived:
FutureEncode.appendWithMore specific on 2:
If you look at the printout above, the wanted specialization is missing {U64, {U8, U8, U8}}. {U64, {U8, U8, U8}} is the actual data captured by the passed in encoder. U64 is the size. {U8, U8, U8} is { r : U8, g : U8, b : U8 } .
So fundamentally we are ending up with a call to the wrong specialization that is missing captures.
I am uncertain why this affects the modified version of encode, but does not seem to affect Inspect. I think it also should have the same issue. (also possible it affects Inspect it just is more specific than I realize so I haven't hit it)
@Ayaz Hafiz I know you are very busy nowadays, but if you have any time to give tips or take a quick look, it would be greatly appreciate. I don't know much of what level of complexity this form of error falls into. Like I get the rough idea of what is wrong, but I know nothing of the code that actually does with the wiring here.
to be honest i’m not sure i can give any advice without debugging this, which i don’t have the bandwidth for ATM unfortunately. I would say probably the easiest thing to do here would be to finish implementing boxed closures. it will be much simpler and problems like this will likely not disappear, but be much easier diagnose and resolve for anyone
Fair answer. Do we have any documentation on the state of that
there’s a long standing issue with recursive structural types, which may be the problem here too, i discusses a suggestion for how it can be fixed elsewhere
umm yes give me 10 minutes i’ll type it up
Is the boxed closure work something that a layperson can work on, or does it really need expertise that's expensive to transfer? I should be able to help with that once I finish with built-in Tasks and documentation for the new record builder syntax.
my hope is anyone can work on it
Hell yeah brother
that’s partially why i want it to land/am upset at myself that i didn’t land it when i had more time to give to Roc, it’s a much simpler and easier model than what currently exists (at the expense of other things obviously)
Well, the nice thing is that Roc seems to be getting better with respect to its bus factor
So you being unavailable isn't blocking the change from going through
I think it's a good thing to recognize. And also, don't beat yourself up too much, you've already given so much for this project.
alright here's the high level of how i think it should be implemented
https://github.com/roc-lang/rfcs/blob/main/0012-type-erasure.md
here's the only PR i made for it: https://github.com/roc-lang/roc/pull/5576
it supports erased closure types for very simple programs
here's an example of a program and the corresponding IR I sent to Folkert recently. It's worth understanding it I think
https://gist.github.com/ayazhafiz/dd16a5586b9621dce061c902e6d44bb0
The missing pieces are:
Okay, cool. Unless someone else starts on this before I get back from my trip, I'll be in Japan until July 23rd, after which point I'll try to David this Goliath.
love the metaphor
Thanks Ayaz!
Looping back to the general encode/decode design. I realized one more inconsistency that I thought might be good to reconsider:
For contains, we explicitly pass in an element encode function elem -> Encoder state. Instead we could skip that by requiring where elem implements Encoding. This is a simpler api with one extra arg, but does have a minor but important difference. If we take elem -> Encoder state it allows for opaque containers to modify or otherwise change what the element encoder function would be when calling Encode.sequence. I'm not sure if this matters in practice, but I am curious what other thinks:
A)
sequence :
seq,
[Size U64, UnknownSize],
SequenceWalker state seq elem,
(elem -> Encoder state)
-> Encoder state where state implements EncoderFormatting
vs
B)
sequence :
seq,
[Size U64, UnknownSize],
SequenceWalker state seq elem,
-> Encoder state where elem implements Encoding, state implements EncoderFormatting
With A, you would pass in an elem -> Encoder state function that transforms the element type in some way or ignores fields before encoding. It enables a container of non-encodable things to be encoded if you write a custom mapper.
With B, that power is still technically available, but it has to be obtained by modifying the SequenceWalker to transform the element type. So probably less likely to be used.
Note: with std lib containers and auto-derive, there is no difference between A and B. So only matters for user defined opaque types.
Is this the issue for tracking tag decoding?
Would future decode support tag unions?
I think deciding supports tags. Maybe it just doesn't auto derive
Oh, method is literally missing
And we want to support tags both by name and by index for encode and decode. Index will just be alphabetical
yeah we never implemented it
I added an issue to track Type Erasure from abocve
btw I'd definitely recommend anyone talk to @Folkert de Vries about this before starting on implementation stuff, to get tips on how to proceed!
Hello there :) Complete beginner here :hello:
I didn't take the time to read the whole discussion here but I had a reflection about the Encode and Hash section of the design article and I believe this is the place to talk about it.
I am writing a lot of Elm and I have to say codecs saved my life. They combine Encoding and Decoding into a single type and guarantee that what you encode will be decoded in the exact same way. Through that lens, mashing the hashing consept into the encoding one makes a lot less sense. Encoding comes with the implicit idea that your data is conserved, only encoded in an other format.
Other notes :
And again on the following paragraph, the use of bot isEq and isNotEq could create strange results if their implementations do not perfectly match (eg. expect isEq a b == True and expect isNotEq a b == True at the same time). Though I guess that it was an example and probably not the actual implementation you would go for.
Though I just got on board with Roc so I hope my remarks are relevant to you :)
Welcome!
Just took a skim over where our hash and encode abilities landed.
They definitely could be forced to be merged together. That said, they have some pertinent differences. A very simple example is that when hashing a set, you have to hash without considering order. When serializing a set, you have to pick an order to serialize to (most serialization formats only have lists and don't have sets). So I dont think they should be merged in practice. On top of that, encoding does lots of extra work that would just slow down hashing
As for codecs, that is a really interesting idea. I do think we will want a separate encode and decode either way though. It is valid to encode to a lossy format that has no equivalent decoder. A simple example is that you could use encode to pretty print a datastructure.
That said, enabling codec libraries sounds like a good idea to keep track of
Cause most formats will want encode and decode
Brendan Hansknecht said:
A simple example is that you could use encode to pretty print a datastructure.
To me pretty print is more of a toStr thing though.
A codec ability might make sense here. Though I believe you want to remove abilities? Curious to know about the proposed alternative.
Abilities as a separate concept is going way, but you can create a type alias that works in a similar fashion
@Simon Taeter Here is a link to the relevant section of the proposal: https://docs.google.com/document/d/1OUd0f4PQjH8jb6i1vEJ5DOnfpVBJbGTjnCakpXAYeT8/edit?tab=t.0#heading=h.x6h2qzxs7o9o
Last updated: Jun 16 2026 at 16:19 UTC