Hi! This idea is to conditionally encode composite datastructures based on their child element's types (Like records, tuples, lists and tag).
Started to write an encoder for TOML using the EncoderFormatting ability, but facing a challenge. Just started based on roc-json, so I'm new to this, but think I've found a genuen design problem.
I want to encode this record
{
a: 1,
b: {
c: 2,
}
}
as
a = 1
[b]
c = 2
When encoding a record, I need to encode the field names differently based on the type of the value of the field. The value of a is not nested, so it's key is just "a = ", but value of b is another record, so it's key is "[b]\n". How can I access this information when encoding the record? Afaik, in the formatter I only get a list of field name-value pairs. Names being strings, values being, err... functions? I only know they are used to encode the values. So I can communicate back to the parent record encoder with the result of the child encoding. My best attempt would be to look at the encoded child's bytes and figure out if that is a record or not. But that feels really bad, since I would basically need to parse the just-written bytes to get the information that was available when it was encoded. Bad for performace and there may be cases where the child's encoded bytes don't tell enough for the parent to decide how to encode.
I propose that the encoder formatting functions should return a record that has the resulting bytes and optionally some other data. I would use it to signal what kind of roc value was encoded.
I thought of a simple, but pretty hacky solution. Sharing it for fun. I could use the end of the buffer to store some information on what was encoded. So when encoding b, the result of encoding it's value would be "c = 2r" where "r" would just be a signal to b's encoder that it's value was a record type. b's encoder would delete that signal byte, encode as needed and also end with a r. For that solution to work, there would need to be a function that could run at the end of the encoding, to get rid of the signal byte. That would need to be part of the Encoding ability, unless we want a big warning from package authors saying "After encoding, run this cleanup function on the resulting bytes"
First 2 thoughts:
For 2, I think the answer is that toml has to serialize to a dictionary like type first. Then from a dictionary to a file.
So basically build up the toml dictionary I. Memory then serialize it. The dictionary with have values that are tagged. The tags can be used to deal with this issue
Oh, I think the biggest difference and what we should change is that serde serialize is closer to inspect than encode
Encode only works with a list of bytes as the data
Inspect works with any type as the data
Serde also works with any type as the data
Like I bet you could right a toml serializer rn in roc using inspect cause it is more flexible than encode
So I think we just need to change away from List U8 as the encoder state and let it be any type the encoder wants
Serde is actually crazy flexible here. Instead of having a singular serializer type, every different type of serializer can return a different type. So a struct serializer can return a different type than a triple serializer.
In roc, I think we probably should stick with one serializer data type and force the user to put tags within the serializer data type if that struct should return a different type than a tuple
Anyway, the summary is that I think we need to make encode closer to inspect. It should not be restricted to a List U8 for the state.
It should be flexible enough to have anything as the state. In the case of toml, the state might be { bytes: List U8, isNested: Bool }
Flexible state sound good to me, tho I'm not familiar with serde. Just talking as someone wanting to use the encoding api.
No worries about knowing serde...it is just roughly what we based encode and decode on
Last updated: Jun 16 2026 at 16:19 UTC