Stream: ideas

Topic: Additional return value for formatter fn-s in Encoding


view this post on Zulip Norbert Hajagos (Jun 14 2024 at 15:59):

Hi! This idea is to conditionally encode composite datastructures based on their child element's types (Like records, tuples, lists and tag).
Started to write an encoder for TOML using the EncoderFormatting ability, but facing a challenge. Just started based on roc-json, so I'm new to this, but think I've found a genuen design problem.
I want to encode this record

{
  a: 1,
  b: {
    c: 2,
  }
}

as

a = 1
[b]
c = 2

When encoding a record, I need to encode the field names differently based on the type of the value of the field. The value of a is not nested, so it's key is just "a = ", but value of b is another record, so it's key is "[b]\n". How can I access this information when encoding the record? Afaik, in the formatter I only get a list of field name-value pairs. Names being strings, values being, err... functions? I only know they are used to encode the values. So I can communicate back to the parent record encoder with the result of the child encoding. My best attempt would be to look at the encoded child's bytes and figure out if that is a record or not. But that feels really bad, since I would basically need to parse the just-written bytes to get the information that was available when it was encoded. Bad for performace and there may be cases where the child's encoded bytes don't tell enough for the parent to decide how to encode.

I propose that the encoder formatting functions should return a record that has the resulting bytes and optionally some other data. I would use it to signal what kind of roc value was encoded.

I thought of a simple, but pretty hacky solution. Sharing it for fun. I could use the end of the buffer to store some information on what was encoded. So when encoding b, the result of encoding it's value would be "c = 2r" where "r" would just be a signal to b's encoder that it's value was a record type. b's encoder would delete that signal byte, encode as needed and also end with a r. For that solution to work, there would need to be a function that could run at the end of the encoding, to get rid of the signal byte. That would need to be part of the Encoding ability, unless we want a big warning from package authors saying "After encoding, run this cleanup function on the resulting bytes"

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:17):

First 2 thoughts:

  1. I feel like this is possible with state on the encoder itself but would need to think about it more/mess around
  2. What does serde do?

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:34):

For 2, I think the answer is that toml has to serialize to a dictionary like type first. Then from a dictionary to a file.

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:35):

So basically build up the toml dictionary I. Memory then serialize it. The dictionary with have values that are tagged. The tags can be used to deal with this issue

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:44):

Oh, I think the biggest difference and what we should change is that serde serialize is closer to inspect than encode

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:45):

Encode only works with a list of bytes as the data

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:45):

Inspect works with any type as the data

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:45):

Serde also works with any type as the data

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:45):

Like I bet you could right a toml serializer rn in roc using inspect cause it is more flexible than encode

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:46):

So I think we just need to change away from List U8 as the encoder state and let it be any type the encoder wants

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:48):

Serde is actually crazy flexible here. Instead of having a singular serializer type, every different type of serializer can return a different type. So a struct serializer can return a different type than a triple serializer.

In roc, I think we probably should stick with one serializer data type and force the user to put tags within the serializer data type if that struct should return a different type than a tuple

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 16:50):

Anyway, the summary is that I think we need to make encode closer to inspect. It should not be restricted to a List U8 for the state.

It should be flexible enough to have anything as the state. In the case of toml, the state might be { bytes: List U8, isNested: Bool }

view this post on Zulip Norbert Hajagos (Jun 14 2024 at 19:43):

Flexible state sound good to me, tho I'm not familiar with serde. Just talking as someone wanting to use the encoding api.

view this post on Zulip Brendan Hansknecht (Jun 14 2024 at 21:25):

No worries about knowing serde...it is just roughly what we based encode and decode on


Last updated: Jun 16 2026 at 16:19 UTC