Decoder APIs likely require streaming · ideas

So this isn't a proposal, more of a general discussion that @Eli Dowling pointed out to me.

Fundamentally, our encoder/decoder api is based on serde, but it is missing one vital piece. There is no way to encode/decode from a reader like type. We only allow for decoding from a list with a non-resumable api.

This can be really important for webservers. They want to start decoding json right as it streams in. They don't want to have to load all of the bytes into a list before even beginning to decode. Loading everything first can be a huge perf hit due to delaying the execution.

On top of that, always requiring a full list for decoding can have heavy memory use effects.

One piece that makes the api design exceptionally tricky in roc is that a reader like type requires tasks in order to function. Whenever it runs out of bytes, it runs a task to load more bytes if they are available. We probably don't want to require all decoding/encoding to use tasks.

Totally open to thoughts and discussion. Any ideas around how we can support this usecase in roc.

Richard Feldman (Apr 20 2024 at 01:26):

Richard Feldman (Apr 20 2024 at 01:27):

like we can already partially decode a List U8 and end up with bytes left over, right?

Richard Feldman (Apr 20 2024 at 01:28):

oh I guess we aren't resumable because we don't give you back the state at the end

Richard Feldman (Apr 20 2024 at 01:29):

Richard Feldman (Apr 20 2024 at 01:30):

that is, we should be able to say "I have some bytes, doesn't matter where I got them from, decode whatever I gave you, and then later I'll give you some more bytes and I want you to pick up where you left off"

Richard Feldman (Apr 20 2024 at 01:30):

in other words, I don't think Task should need to be involved; at best, it would be a convenience wrapper around something that doesn't have Task involved

Richard Feldman (Apr 20 2024 at 01:31):

and I think the only missing piece there is being able to get the state back out the other side, so you can pick up where you left off

Brendan Hansknecht (Apr 20 2024 at 01:33):

Correct. Making it resumable is also a valid solution. Can we do that while keeping perf reasonable? That requires being able to stop in the middle of any nested decode and preferably never redo work.

Brendan Hansknecht (Apr 20 2024 at 01:33):

Eli Dowling (Apr 20 2024 at 01:35):

We could model it like an effects system perhaps? Like when it needs more bytes it returns a state and a continuation that can be run to keep processing once there are more
Is that how tasks are modelled in roc?

Richard Feldman (Apr 20 2024 at 01:39):

Brendan Hansknecht (Apr 20 2024 at 01:42):

So we have a list decoder that calls a record decoder that calls it's second field decoder to decode an integer. The integer decoder runs out of bytes after decoding 5 digits.

How is this represented in a way such that we resume decoding right with decoding the 6th digit without needing to redo any of the list, record, or previous digit decoding?

Richard Feldman (Apr 20 2024 at 01:44):

I haven't thought this through all the way, but I wonder if that could be as simple as something like changing this...

DecodeResult val : {
    result : Result val DecodeError,
    rest : List U8,
}

DecodeResult val : {
    result : Result val DecodeError,
    rest : List U8,
    continue : List U8 -> DecodeResult val,
}

Richard Feldman (Apr 20 2024 at 01:45):

I'm not sure if maybe that should go in the Result val DecodeError or something

Richard Feldman (Apr 20 2024 at 01:46):

but the general idea is that if we get partway through decoding and don't finish, we return a function which closes over the in-progress state (and a slice of the remaining bytes to parse, or maybe even just all the original bytes?)

Richard Feldman (Apr 20 2024 at 01:46):

such that you can just call that function with more bytes as you receive them, and it'll keep making progress

Richard Feldman (Apr 20 2024 at 01:46):

it'd also close over the original Decoder so all the settings would be preserved too

Brendan Hansknecht (Apr 20 2024 at 01:47):

So when the int decoder runs out of bytes, it returns an error that it is to short, an empty rest, and a continuation that captures how to continue decoding the rest of the integer (so presumably the captured state is just the partially loaded int). Then the nested record decoder does the same with a partial record state. And the nested list decoder does the same with a partially filled list.

So essentially builds a chain of nested state. That gets resolved by a chain of closure calls piping in new data?

Richard Feldman (Apr 20 2024 at 01:49):

Richard Feldman (Apr 20 2024 at 01:50):

then suddenly we attempt to do the next thing we want to try, and we discover that we've run out of bytes unexpectedly

Richard Feldman (Apr 20 2024 at 01:51):

well, at that point we already have all the information in scope to continue; the only problem is that we ran out of bytes

Richard Feldman (Apr 20 2024 at 01:51):

so right at that point, we can return a closure which takes more bytes and continues

Richard Feldman (Apr 20 2024 at 01:51):

so I think the key thing is not so much nesting but rather being able to stitch together chunks of bytes

Richard Feldman (Apr 20 2024 at 01:52):

because let's say we had N bytes and that wasn't enough, but then when we get another M bytes it's still (somehow) not enough - like some super long string literal that still isn't closed somehow

Richard Feldman (Apr 20 2024 at 01:52):

well then we need to wait for a third chunk of bytes, but we probably still need to have kept around the previous two chunks of bytes so we can copy out of them

Richard Feldman (Apr 20 2024 at 01:53):

once we eventually finally do find the close quote that ends the gigantic string literal in the JSON

Richard Feldman (Apr 20 2024 at 01:53):

Brendan Hansknecht (Apr 20 2024 at 01:54):

List Decoder calls Elem Decoder which happens to be Record Decoder. Record Decoder calls Field Decoder which happens to be Int Decoder. Int Decoder runs out of bytes.

Richard Feldman (Apr 20 2024 at 01:55):

like I think it's the same amount of nesting as what we currently have, just with the bytes needing to be stitched together somehow

Richard Feldman (Apr 20 2024 at 01:55):

which could potentially be as simple as doing like a List (List U8) or something

Brendan Hansknecht (Apr 20 2024 at 01:56):

Yeah, I guess you would potentially prefer to append bytes to the original list in that case to make one seamless slice for the string.

Otherwise, with this API, I guess you need to make the string by contacting chunks of all the lists into a new allocation

Brendan Hansknecht (Apr 20 2024 at 01:57):

I guess it also opens some API design. Do you want to be able to represent a string that spans multiple lists and only lazily load it when it actually is needed.

Richard Feldman (Apr 20 2024 at 01:57):

yeah my worry about appending bytes to the original string is basically what if it's a gigantic number of bytes, and they're used somewhere else, and now we're cloning all of that :sweat_smile:

Brendan Hansknecht (Apr 20 2024 at 01:57):

Richard Feldman (Apr 20 2024 at 01:58):

I mean I guess if you're okay just mutating a thing which others will look at later, and maybe they're now looking at a broken thing :laughing:

Brendan Hansknecht (Apr 20 2024 at 01:59):

Yeah. If everything is using indices instead of references, you can safely keep appending.

Richard Feldman (Apr 20 2024 at 02:01):

Richard Feldman (Apr 20 2024 at 02:02):

if multiple things have access to a chunk of data, then if appending to it is allowed, multiple things could attempt to append and end up overriding each other's data, right? (since each of them would have a local understanding of the data's length, and where appending "begins")

Brendan Hansknecht (Apr 20 2024 at 02:03):

The only thing that can append is the active decoder when it loads more bytes. So the inner string decoder. Nothing else would be appending at the same time

Richard Feldman (Apr 20 2024 at 02:04):

but if it's shared, we can't know whether 2 of the things sharing it happen to be decoders

Brendan Hansknecht (Apr 20 2024 at 02:05):

Ah, I was thinking that this could be done that way in rust or zig. In roc, it wouldn't work.

Richard Feldman (Apr 20 2024 at 02:05):

like if only one thing is referring to that memory (e.g. &mut or mut in Rust) then that one thing can safely modify it as desired without affecting anything else

Richard Feldman (Apr 20 2024 at 02:06):

but if multiple things are referring to it, then if any of the things that's referring to it modifies it, that can cause bugs for the other things when they go back to it later and also try to modify it

Brendan Hansknecht (Apr 20 2024 at 02:06):

The decoder stores a state List U8. All string elements are just offsets. Other elements decode fully cause they have no underlying data that needs to be referenced. If we pause in the middle of loading a string. We can just append more bytes safely.

Richard Feldman (Apr 20 2024 at 02:07):

Brendan Hansknecht (Apr 20 2024 at 02:07):

Brendan Hansknecht (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:08):

oh, sure - but the problem is when 2 different decoders both decide to append things

Richard Feldman (Apr 20 2024 at 02:08):

Brendan Hansknecht (Apr 20 2024 at 02:08):

There is only ever one decoder per List U8. They own the data they are decoding

Richard Feldman (Apr 20 2024 at 02:08):

so if they both do that, they will stomp on each other and overwrite each others' data, causing all sorts of problems

Richard Feldman (Apr 20 2024 at 02:09):

not necessarily - I can certainly pass the same List U8 to multiple decoders! :big_smile:

Richard Feldman (Apr 20 2024 at 02:09):

Brendan Hansknecht (Apr 20 2024 at 02:09):

Richard Feldman (Apr 20 2024 at 02:10):

Brendan Hansknecht (Apr 20 2024 at 02:10):

Would it every make sense to pass the same List U8 to different decode while also using continuations to process more data? I don't see the point of that

Richard Feldman (Apr 20 2024 at 02:10):

I don't think it's possible (even theoretically, in any language) to safely allow receiving the bytes as a potentially-shared reference (without defensively cloning it) and still appending to it

Brendan Hansknecht (Apr 20 2024 at 02:10):

Eli Dowling (Apr 20 2024 at 02:11):

This might be where a stream type would come in very handy! You get the stream from the platform and the platform can uniquely just keep appending to it as Brendan is saying. And inside roc you can just keep accessing it. That seems to solve the problem?

Richard Feldman (Apr 20 2024 at 02:12):

unfortunately if the decoder is in progress, then it still necessarily has references into the original stream

Richard Feldman (Apr 20 2024 at 02:12):

it also may have created things which are slices into the original list of bytes

Eli Dowling (Apr 20 2024 at 02:13):

So what? As Brendan says if you are only ever appending the decoder shouldn't care.

Richard Feldman (Apr 20 2024 at 02:13):

Richard Feldman (Apr 20 2024 at 02:14):

a separate question: let's say we're appending to the original list of bytes...what happens when we exceed its capacity?

Richard Feldman (Apr 20 2024 at 02:14):

presumably we do the usual thing of making a fresh allocation and copying everything over

Richard Feldman (Apr 20 2024 at 02:14):

that seems undesirable for streaming, where we'd probably rather have a fixed-size buffer and not have to grow-and-copy every time we get a new chunk

Richard Feldman (Apr 20 2024 at 02:15):

as opposed to, say, having a fresh (empty) allocation for each new chunk we receive

Richard Feldman (Apr 20 2024 at 02:15):

Eli Dowling (Apr 20 2024 at 02:15):

Yeah, I was about to say just that. It doesn't solve the issue of the stream just growing forever it needs to be mutated to "take bytes out"

Richard Feldman (Apr 20 2024 at 02:16):

yeah so we might want a List (List U8) representation anyway just because that's a more natural fit for the allocation pattern the host will want to do when streaming anyway

Richard Feldman (Apr 20 2024 at 02:16):

that is, a list of buffers, each having probably the same capacity and similar lengths

Eli Dowling (Apr 20 2024 at 02:28):

This would be a situation where uniqueness guarantees would be super helpful... But then you're going towards the borrow checker

Richard Feldman (Apr 20 2024 at 02:29):

well the specific scenario I was concerned about is the scenario where you genuinely want to share it

Richard Feldman (Apr 20 2024 at 02:29):

Eli Dowling (Apr 20 2024 at 02:32):

I think you should have to explicitly copy the stream. Otherwise you're pretending a stream is something it isn't. Like two people cannot read from a network stream at once, I think pretending otherwise is just adding performance footguns.

Brendan Hansknecht (Apr 20 2024 at 02:41):

Just to clarify here:
Due to using seamless slices for strings, this will never work in safe roc.
You would have to manually use indices into the list to make this potentially safe. (manually make a list view type that is a just some indices. Later have to pass the full list in to load the actually string from it)

That said, a huge point of streaming is to reduce memory footprint.
As such, if you are just appending to the list anyway, I don't think there is much of a gain from streaming.

I think a streaming api would be best suited to not use slices.
Instead, it would always copy the strings out of the original list and into a new allocation.
This leads to the minimal memory use possible.

At the edge where you need to parse part way through string parsing, you would simply copy out the first chunk of the string, load a new chunk of the original list, continue parsing, and eventually append the rest of the bytes of the string on.

Brendan Hansknecht (Apr 20 2024 at 02:47):

If continuation costs are low enough, and the data is streaming in slow enough, there is theoretically a time where the streaming api is faster than the non-streaming api. I would guess this is pretty rare given #ideas > Decoder APIs: Streaming = Bad Perf? and the potential ~3x to 5x slow down due to using streaming apis.

As such, I think any sort of streaming api is likely only going to be worth it from a memory saving perspective. Load in chunks, make more allocations, but reduce peak memory. Instead of load all at once, and use slices to the original allocation.

Brendan Hansknecht (Apr 20 2024 at 02:48):

As a note, rust's streaming api should have less overhead than a roc version. Roc has to capture the decoder state into a closure. Rust just performs a side effect to load more bytes, leaving the decoder state untouched on the stack.

Brendan Hansknecht (Apr 20 2024 at 02:50):

All this long rant to say, lets build a great non-streaming decoder. Once that works, lets try to build a streaming decoder (probably manually without depending on decode) and see what type of perf we can get out of it. Then let's decide if it seams worth including in the standard.

Richard Feldman (Apr 20 2024 at 02:52):

witoldsz (Apr 20 2024 at 09:56):

I remember, back in the early 2000's when learning Java XML parsers (I remember one called Stax), there was a push and pull approach. I can't remember all the details, but it was something like this:

That :point_up: is something different from what we are used to when a decoder itself is responsible to produce an end result. My point is maybe the idea of streaming APIs for decoding should come down to event-like operations? We would not try to retrofit the streaming decoder into a "regular" one.

Imagine handling an almost endless stream of JSON objects and all we need is to extract a small piece of selected objects, or just to count them, or if we have a huge object but all we care is just one property. Having a Stax-like API would reduce memory consumption and improve performance a lot, I guess.

Eli Dowling (Apr 20 2024 at 10:55):

I think you're talking about an SAX parser and It's how RapidJson and Serde work, I do agree that it should be explored.
I tried to implement one in roc actually, sadly it doesn't really work easily becasue roc requires a task to do a side effect which is what sending an event is. eg: In RapidJson you pass the JsonReader an object with functions that get called when a string, or a number or something is decoded.
I did think it might work if you could take the events and push them onto a stream that could then be read by the consumer, but we don't have streams yet.
I tested an implementation for fun, that just pushes all the events onto a big list and then reads them again, but as you would imagine it's horrifically slow.

Eli Dowling (Apr 20 2024 at 10:59):

As for streams. I had a think, and I wonder what people think of this as an idea:

walkStreamUntil:Stream, a,(a,List U8->[Contiune a, Break a])->(Task (a, Stream),[StreamClosed,StreamReused])

countBytes=
     stream|>walkStreamUntil 0 \state,buf->
         Continue ((buf|>List.len)+state)

(count,stream)<- countBytes|>Task.await
# This would have read to the end, but if we had exited after only a few reads we could then read some more from the stream

Brendan Hansknecht (Apr 20 2024 at 14:22):

In a platform task, it should be easy to make a stream that overwrites a buffer if it is unique and just leaves it to roc to free if it isn't unique. So the perf would depend on how roc consumes it, but it would be safe. If roc is keeping many references to the stream, it will keep allocating new buffers. If the roc is written well and avoid referencing the buffer, it will get reused.

If we want to enforce perf, we force the stream API to return an error to roc if the buffer is not unique. Just like File.read will return an error if they file doesn't exist

Brendan Hansknecht (Apr 20 2024 at 14:23):

Brendan Hansknecht (Apr 20 2024 at 14:25):

Or instead of returning a result to roc, the stream could even panic if it isn't unique and give the user a message that explains why that is a bad design and a bug when using streaming apis to keep a reference.

Brendan Hansknecht (Apr 20 2024 at 14:27):

Cause that should get caught during development and be considered a bug that needs to be fixed. Of course, if the user is lazy and doesn't really care they can just introduce explicit copying into the roc to duplicate the stream buffer.

Brendan Hansknecht (Apr 20 2024 at 14:27):

Eli Dowling (Apr 20 2024 at 22:47):

Okay, I've also done a little more investigation on what a streaming decoder api could look like.
Dotnet defintely uses basically the design i proposed. They actually store the current stack frame of the parser, and return that every time the buffer is empty. If it's not yet finished parsing they read more bytes from the stream and run the continuation. https://source.dot.net/#System.Text.Json/System/Text/Json/Serialization/JsonConverterOfT.ReadCore.cs,bd75b9c1b037eafc

@witoldsz I had a proper look at StAX parsers. I see now what you mean, they are basically the same except StAX is pull based as you say. I do think that design may also be helpful but I do worry that calling Stream.next() would be a task and one Task per token in the json might be prohibitively slow. Maybe you could make Stream.next() give you a list of tokens that were parsed and that might reduce the task cost enough

Eli Dowling (Apr 20 2024 at 23:32):

Okay, so If when making Tasks use continuations we make the concept a little more general, we could have the json decoder return:[Continue ContinueDecode, Done DecodeResult] where ContinueDecode: List U8->[Continue ContinueDecode, Done DecodeResult] .

So that ContinueDecode is basically the equivalent to a task (It holds a continuation) but instead of being handled by the platform it's handled by some code within roc.

Then at the top level in the stream reading loop we read from the stream every time we are asked to continue, and then resume the decoder.

reading_from_stream \ continue, buf->
   when continue bytes is
   Effect (WaitForBytes continueParsing)-> Next(continueParsing )
   Result result-> StopReading(result)

string=\bytes, state->
   when bytes is
   ['"', .. as rest] -> {result:Ok state, rest}|>Effect.result
   [a,.. as rest ]-> string rest (state|>List.append a)
   #waitforBytes is the the
   []-> waitForBytes |>Effect.await \ moreBytes-> string moreBytes

We don't get any overhead decoding the entire buf at once, and the same code can be re-used between the stream and non-stream version. The non-stream version just returns TooShort if it gets the continuation

I'm sure most roc folk already know this, but if anyone reading this isn't super familiar with the general idea of algebraic effect handlers, This is the best explanation I've found: https://gopiandcode.uk/logs/log-bye-bye-monads-algebraic-effects.html . "An effect is an exception that allows you to resume from where it was thrown, potentially with some additional data"

Stream: ideas

Topic: Decoder APIs likely require streaming

Brendan Hansknecht (Apr 19 2024 at 23:31):

Richard Feldman (Apr 20 2024 at 01:26):

Richard Feldman (Apr 20 2024 at 01:27):

Richard Feldman (Apr 20 2024 at 01:28):

Richard Feldman (Apr 20 2024 at 01:29):

Richard Feldman (Apr 20 2024 at 01:30):

Richard Feldman (Apr 20 2024 at 01:30):

Richard Feldman (Apr 20 2024 at 01:31):

Brendan Hansknecht (Apr 20 2024 at 01:33):

Brendan Hansknecht (Apr 20 2024 at 01:33):

Brendan Hansknecht (Apr 20 2024 at 01:33):

Eli Dowling (Apr 20 2024 at 01:35):

Richard Feldman (Apr 20 2024 at 01:39):

Brendan Hansknecht (Apr 20 2024 at 01:42):

Richard Feldman (Apr 20 2024 at 01:44):

Richard Feldman (Apr 20 2024 at 01:45):

Richard Feldman (Apr 20 2024 at 01:46):

Richard Feldman (Apr 20 2024 at 01:46):

Richard Feldman (Apr 20 2024 at 01:46):

Brendan Hansknecht (Apr 20 2024 at 01:47):

Richard Feldman (Apr 20 2024 at 01:49):

Richard Feldman (Apr 20 2024 at 01:49):

Richard Feldman (Apr 20 2024 at 01:49):

Richard Feldman (Apr 20 2024 at 01:50):

Richard Feldman (Apr 20 2024 at 01:51):

Richard Feldman (Apr 20 2024 at 01:51):

Richard Feldman (Apr 20 2024 at 01:51):

Richard Feldman (Apr 20 2024 at 01:52):

Richard Feldman (Apr 20 2024 at 01:52):

Richard Feldman (Apr 20 2024 at 01:53):

Richard Feldman (Apr 20 2024 at 01:53):

Brendan Hansknecht (Apr 20 2024 at 01:54):

Richard Feldman (Apr 20 2024 at 01:55):

Richard Feldman (Apr 20 2024 at 01:55):

Richard Feldman (Apr 20 2024 at 01:55):

Richard Feldman (Apr 20 2024 at 01:55):

Brendan Hansknecht (Apr 20 2024 at 01:56):

Brendan Hansknecht (Apr 20 2024 at 01:57):

Richard Feldman (Apr 20 2024 at 01:57):

Brendan Hansknecht (Apr 20 2024 at 01:57):

Richard Feldman (Apr 20 2024 at 01:58):

Richard Feldman (Apr 20 2024 at 01:58):

Brendan Hansknecht (Apr 20 2024 at 01:59):

Richard Feldman (Apr 20 2024 at 02:01):

Richard Feldman (Apr 20 2024 at 02:02):

Brendan Hansknecht (Apr 20 2024 at 02:03):

Richard Feldman (Apr 20 2024 at 02:04):

Brendan Hansknecht (Apr 20 2024 at 02:05):

Richard Feldman (Apr 20 2024 at 02:05):

Richard Feldman (Apr 20 2024 at 02:05):

Richard Feldman (Apr 20 2024 at 02:06):

Richard Feldman (Apr 20 2024 at 02:06):

Brendan Hansknecht (Apr 20 2024 at 02:06):

Richard Feldman (Apr 20 2024 at 02:07):

Richard Feldman (Apr 20 2024 at 02:07):

Brendan Hansknecht (Apr 20 2024 at 02:07):

Brendan Hansknecht (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:08):

Brendan Hansknecht (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:08):

Richard Feldman (Apr 20 2024 at 02:09):

Richard Feldman (Apr 20 2024 at 02:09):

Brendan Hansknecht (Apr 20 2024 at 02:09):

Richard Feldman (Apr 20 2024 at 02:10):

Brendan Hansknecht (Apr 20 2024 at 02:10):

Richard Feldman (Apr 20 2024 at 02:10):

Brendan Hansknecht (Apr 20 2024 at 02:10):

Eli Dowling (Apr 20 2024 at 02:11):

Richard Feldman (Apr 20 2024 at 02:12):

Richard Feldman (Apr 20 2024 at 02:12):

Eli Dowling (Apr 20 2024 at 02:13):

Richard Feldman (Apr 20 2024 at 02:13):

Richard Feldman (Apr 20 2024 at 02:14):

Richard Feldman (Apr 20 2024 at 02:14):

Richard Feldman (Apr 20 2024 at 02:14):

Richard Feldman (Apr 20 2024 at 02:15):