Stream: ideas

Topic: Decoder APIs likely require streaming


view this post on Zulip Brendan Hansknecht (Apr 19 2024 at 23:31):

So this isn't a proposal, more of a general discussion that @Eli Dowling pointed out to me.

Fundamentally, our encoder/decoder api is based on serde, but it is missing one vital piece. There is no way to encode/decode from a reader like type. We only allow for decoding from a list with a non-resumable api.

This can be really important for webservers. They want to start decoding json right as it streams in. They don't want to have to load all of the bytes into a list before even beginning to decode. Loading everything first can be a huge perf hit due to delaying the execution.

On top of that, always requiring a full list for decoding can have heavy memory use effects.


One piece that makes the api design exceptionally tricky in roc is that a reader like type requires tasks in order to function. Whenever it runs out of bytes, it runs a task to load more bytes if they are available. We probably don't want to require all decoding/encoding to use tasks.


Totally open to thoughts and discussion. Any ideas around how we can support this usecase in roc.

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:26):

I might be missing something, but isn't that just a convenience? :thinking:

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:27):

like we can already partially decode a List U8 and end up with bytes left over, right?

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:28):

oh I guess we aren't resumable because we don't give you back the state at the end

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:29):

but it seems like the I/O piece is a convenience

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:30):

that is, we should be able to say "I have some bytes, doesn't matter where I got them from, decode whatever I gave you, and then later I'll give you some more bytes and I want you to pick up where you left off"

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:30):

in other words, I don't think Task should need to be involved; at best, it would be a convenience wrapper around something that doesn't have Task involved

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:31):

and I think the only missing piece there is being able to get the state back out the other side, so you can pick up where you left off

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:33):

Correct. Making it resumable is also a valid solution. Can we do that while keeping perf reasonable? That requires being able to stop in the middle of any nested decode and preferably never redo work.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:33):

I don't think that is an easy task

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:33):

Much much easier to just run a Task and load a new chunk of bytes

view this post on Zulip Eli Dowling (Apr 20 2024 at 01:35):

We could model it like an effects system perhaps? Like when it needs more bytes it returns a state and a continuation that can be run to keep processing once there are more
Is that how tasks are modelled in roc?

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:39):

it's how we want them to be, but they aren't modeled that way yet

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:42):

So we have a list decoder that calls a record decoder that calls it's second field decoder to decode an integer. The integer decoder runs out of bytes after decoding 5 digits.

How is this represented in a way such that we resume decoding right with decoding the 6th digit without needing to redo any of the list, record, or previous digit decoding?

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:44):

I haven't thought this through all the way, but I wonder if that could be as simple as something like changing this...

DecodeResult val : {
    result : Result val DecodeError,
    rest : List U8,
}

...to this:

DecodeResult val : {
    result : Result val DecodeError,
    rest : List U8,
    continue : List U8 -> DecodeResult val,
}

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:45):

I'm not sure if maybe that should go in the Result val DecodeError or something

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:46):

but the general idea is that if we get partway through decoding and don't finish, we return a function which closes over the in-progress state (and a slice of the remaining bytes to parse, or maybe even just all the original bytes?)

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:46):

such that you can just call that function with more bytes as you receive them, and it'll keep making progress

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:46):

it'd also close over the original Decoder so all the settings would be preserved too

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:47):

So when the int decoder runs out of bytes, it returns an error that it is to short, an empty rest, and a continuation that captures how to continue decoding the rest of the integer (so presumably the captured state is just the partially loaded int). Then the nested record decoder does the same with a partial record state. And the nested list decoder does the same with a partially filled list.

So essentially builds a chain of nested state. That gets resolved by a chain of closure calls piping in new data?

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:49):

I'm not sure if it needs to be significantly more nested than it is today

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:49):

like for example, imagine we're running a decoder over a bunch of bytes

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:49):

we have all this state in memory about the progress we've made so far

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:50):

then suddenly we attempt to do the next thing we want to try, and we discover that we've run out of bytes unexpectedly

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:51):

well, at that point we already have all the information in scope to continue; the only problem is that we ran out of bytes

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:51):

so right at that point, we can return a closure which takes more bytes and continues

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:51):

so I think the key thing is not so much nesting but rather being able to stitch together chunks of bytes

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:52):

because let's say we had N bytes and that wasn't enough, but then when we get another M bytes it's still (somehow) not enough - like some super long string literal that still isn't closed somehow

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:52):

well then we need to wait for a third chunk of bytes, but we probably still need to have kept around the previous two chunks of bytes so we can copy out of them

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:53):

once we eventually finally do find the close quote that ends the gigantic string literal in the JSON

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:53):

I'm not sure what the best way would be to make that part work :thinking:

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:54):

It isn't all in scope. It is represented in a nested form already.

List Decoder calls Elem Decoder which happens to be Record Decoder. Record Decoder calls Field Decoder which happens to be Int Decoder. Int Decoder runs out of bytes.

So I'm pretty sure we will be returning a nested closure.

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:55):

oh sure, I just mean it's not more nested to be resumable

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:55):

if that makes sense

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:55):

like I think it's the same amount of nesting as what we currently have, just with the bytes needing to be stitched together somehow

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:55):

which could potentially be as simple as doing like a List (List U8) or something

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:56):

I'm not sure what the best way would be to make that part work :thinking:

Yeah, I guess you would potentially prefer to append bytes to the original list in that case to make one seamless slice for the string.

Otherwise, with this API, I guess you need to make the string by contacting chunks of all the lists into a new allocation

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:57):

I guess it also opens some API design. Do you want to be able to represent a string that spans multiple lists and only lazily load it when it actually is needed.

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:57):

yeah my worry about appending bytes to the original string is basically what if it's a gigantic number of bytes, and they're used somewhere else, and now we're cloning all of that :sweat_smile:

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:57):

For sure. It doesn't really work in roc.

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:58):

does it work anywhere? haha

view this post on Zulip Richard Feldman (Apr 20 2024 at 01:58):

I mean I guess if you're okay just mutating a thing which others will look at later, and maybe they're now looking at a broken thing :laughing:

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 01:59):

Yeah. If everything is using indices instead of references, you can safely keep appending.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:01):

hm, I don't follow

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:02):

if multiple things have access to a chunk of data, then if appending to it is allowed, multiple things could attempt to append and end up overriding each other's data, right? (since each of them would have a local understanding of the data's length, and where appending "begins")

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:03):

The only thing that can append is the active decoder when it loads more bytes. So the inner string decoder. Nothing else would be appending at the same time

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:04):

but if it's shared, we can't know whether 2 of the things sharing it happen to be decoders

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:05):

Ah, I was thinking that this could be done that way in rust or zig. In roc, it wouldn't work.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:05):

well it's the same thing there though

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:05):

like if only one thing is referring to that memory (e.g. &mut or mut in Rust) then that one thing can safely modify it as desired without affecting anything else

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:06):

which corresponds to refcount of 1 in Roc

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:06):

but if multiple things are referring to it, then if any of the things that's referring to it modifies it, that can cause bugs for the other things when they go back to it later and also try to modify it

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:06):

The decoder stores a state List U8. All string elements are just offsets. Other elements decode fully cause they have no underlying data that needs to be referenced. If we pause in the middle of loading a string. We can just append more bytes safely.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:07):

of you mean append more bytes to the string in progress

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:07):

I thought we were talking about appending to the original input List U8

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:07):

Yes, appending to the original List U8

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:08):

Cause we aren't changing the index of anything. So all indices are still valid.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:08):

oh, sure - but the problem is when 2 different decoders both decide to append things

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:08):

they each have their own local idea of what the length of the List U8 is

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:08):

which is where they start "appending" from

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:08):

There is only ever one decoder per List U8. They own the data they are decoding

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:08):

so if they both do that, they will stomp on each other and overwrite each others' data, causing all sorts of problems

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:09):

not necessarily - I can certainly pass the same List U8 to multiple decoders! :big_smile:

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:09):

and unless they clone it, it'll be a shared reference

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:09):

That's a design choice

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:10):

but it's not avoidable

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:10):

Would it every make sense to pass the same List U8 to different decode while also using continuations to process more data? I don't see the point of that

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:10):

I don't think it's possible (even theoretically, in any language) to safely allow receiving the bytes as a potentially-shared reference (without defensively cloning it) and still appending to it

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:10):

They would just be doing the same work twice for no reason

view this post on Zulip Eli Dowling (Apr 20 2024 at 02:11):

This might be where a stream type would come in very handy! You get the stream from the platform and the platform can uniquely just keep appending to it as Brendan is saying. And inside roc you can just keep accessing it. That seems to solve the problem?

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:12):

unfortunately if the decoder is in progress, then it still necessarily has references into the original stream

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:12):

it also may have created things which are slices into the original list of bytes

view this post on Zulip Eli Dowling (Apr 20 2024 at 02:13):

So what? As Brendan says if you are only ever appending the decoder shouldn't care.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:13):

hm, potentially yeah in that scenario

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:14):

a separate question: let's say we're appending to the original list of bytes...what happens when we exceed its capacity?

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:14):

presumably we do the usual thing of making a fresh allocation and copying everything over

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:14):

that seems undesirable for streaming, where we'd probably rather have a fixed-size buffer and not have to grow-and-copy every time we get a new chunk

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:15):

as opposed to, say, having a fresh (empty) allocation for each new chunk we receive

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:15):

sized to whatever our buffer size is

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:15):

and then work in terms of that instead of appending to the previous one

view this post on Zulip Eli Dowling (Apr 20 2024 at 02:15):

Yeah, I was about to say just that. It doesn't solve the issue of the stream just growing forever it needs to be mutated to "take bytes out"

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:16):

yeah so we might want a List (List U8) representation anyway just because that's a more natural fit for the allocation pattern the host will want to do when streaming anyway

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:16):

that is, a list of buffers, each having probably the same capacity and similar lengths

view this post on Zulip Eli Dowling (Apr 20 2024 at 02:28):

This would be a situation where uniqueness guarantees would be super helpful... But then you're going towards the borrow checker

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:29):

well the specific scenario I was concerned about is the scenario where you genuinely want to share it

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:29):

in other words, where it just isn't unique, period

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:29):

not just that we couldn't tell whether it would be at compile time

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:29):

if that makes sense

view this post on Zulip Eli Dowling (Apr 20 2024 at 02:32):

I think you should have to explicitly copy the stream. Otherwise you're pretending a stream is something it isn't. Like two people cannot read from a network stream at once, I think pretending otherwise is just adding performance footguns.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:41):

Just to clarify here:
Due to using seamless slices for strings, this will never work in safe roc.
You would have to manually use indices into the list to make this potentially safe. (manually make a list view type that is a just some indices. Later have to pass the full list in to load the actually string from it)

That said, a huge point of streaming is to reduce memory footprint.
As such, if you are just appending to the list anyway, I don't think there is much of a gain from streaming.

I think a streaming api would be best suited to not use slices.
Instead, it would always copy the strings out of the original list and into a new allocation.
This leads to the minimal memory use possible.

At the edge where you need to parse part way through string parsing, you would simply copy out the first chunk of the string, load a new chunk of the original list, continue parsing, and eventually append the rest of the bytes of the string on.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:47):

If continuation costs are low enough, and the data is streaming in slow enough, there is theoretically a time where the streaming api is faster than the non-streaming api. I would guess this is pretty rare given #ideas > Decoder APIs: Streaming = Bad Perf? and the potential ~3x to 5x slow down due to using streaming apis.

As such, I think any sort of streaming api is likely only going to be worth it from a memory saving perspective. Load in chunks, make more allocations, but reduce peak memory. Instead of load all at once, and use slices to the original allocation.

Would 100% need to:

  1. improve the perf of our current parsing
  2. actually compare the two (probably with a chain of lists in memory to see the max possible perf of the streaming api)
  3. If 2 had okish perf, actually compare with some sort of io based workload and see the results.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:48):

As a note, rust's streaming api should have less overhead than a roc version. Roc has to capture the decoder state into a closure. Rust just performs a side effect to load more bytes, leaving the decoder state untouched on the stack.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 02:50):

All this long rant to say, lets build a great non-streaming decoder. Once that works, lets try to build a streaming decoder (probably manually without depending on decode) and see what type of perf we can get out of it. Then let's decide if it seams worth including in the standard.

view this post on Zulip Richard Feldman (Apr 20 2024 at 02:52):

love it, makes great sense to me!

view this post on Zulip witoldsz (Apr 20 2024 at 09:56):

I remember, back in the early 2000's when learning Java XML parsers (I remember one called Stax), there was a push and pull approach. I can't remember all the details, but it was something like this:

That :point_up: is something different from what we are used to when a decoder itself is responsible to produce an end result. My point is maybe the idea of streaming APIs for decoding should come down to event-like operations? We would not try to retrofit the streaming decoder into a "regular" one.

Imagine handling an almost endless stream of JSON objects and all we need is to extract a small piece of selected objects, or just to count them, or if we have a huge object but all we care is just one property. Having a Stax-like API would reduce memory consumption and improve performance a lot, I guess.

view this post on Zulip Eli Dowling (Apr 20 2024 at 10:55):

I think you're talking about an SAX parser and It's how RapidJson and Serde work, I do agree that it should be explored.
I tried to implement one in roc actually, sadly it doesn't really work easily becasue roc requires a task to do a side effect which is what sending an event is. eg: In RapidJson you pass the JsonReader an object with functions that get called when a string, or a number or something is decoded.
I did think it might work if you could take the events and push them onto a stream that could then be read by the consumer, but we don't have streams yet.
I tested an implementation for fun, that just pushes all the events onto a big list and then reads them again, but as you would imagine it's horrifically slow.

view this post on Zulip Eli Dowling (Apr 20 2024 at 10:59):

As for streams. I had a think, and I wonder what people think of this as an idea:

walkStreamUntil:Stream, a,(a,List U8->[Contiune a, Break a])->(Task (a, Stream),[StreamClosed,StreamReused])

countBytes=
     stream|>walkStreamUntil 0 \state,buf->
         Continue ((buf|>List.len)+state)

(count,stream)<- countBytes|>Task.await
# This would have read to the end, but if we had exited after only a few reads we could then read some more from the stream

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 14:22):

In a platform task, it should be easy to make a stream that overwrites a buffer if it is unique and just leaves it to roc to free if it isn't unique. So the perf would depend on how roc consumes it, but it would be safe. If roc is keeping many references to the stream, it will keep allocating new buffers. If the roc is written well and avoid referencing the buffer, it will get reused.

If we want to enforce perf, we force the stream API to return an error to roc if the buffer is not unique. Just like File.read will return an error if they file doesn't exist

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 14:23):

So I think this could be done safely in roc with pretty minimal overhead

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 14:25):

Or instead of returning a result to roc, the stream could even panic if it isn't unique and give the user a message that explains why that is a bad design and a bug when using streaming apis to keep a reference.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 14:27):

Cause that should get caught during development and be considered a bug that needs to be fixed. Of course, if the user is lazy and doesn't really care they can just introduce explicit copying into the roc to duplicate the stream buffer.

view this post on Zulip Brendan Hansknecht (Apr 20 2024 at 14:27):

I feel that something like this should work quite well

view this post on Zulip Eli Dowling (Apr 20 2024 at 22:47):

Okay, I've also done a little more investigation on what a streaming decoder api could look like.
Dotnet defintely uses basically the design i proposed. They actually store the current stack frame of the parser, and return that every time the buffer is empty. If it's not yet finished parsing they read more bytes from the stream and run the continuation. https://source.dot.net/#System.Text.Json/System/Text/Json/Serialization/JsonConverterOfT.ReadCore.cs,bd75b9c1b037eafc

@witoldsz I had a proper look at StAX parsers. I see now what you mean, they are basically the same except StAX is pull based as you say. I do think that design may also be helpful but I do worry that calling Stream.next() would be a task and one Task per token in the json might be prohibitively slow. Maybe you could make Stream.next() give you a list of tokens that were parsed and that might reduce the task cost enough

view this post on Zulip Eli Dowling (Apr 20 2024 at 23:32):

Okay, so If when making Tasks use continuations we make the concept a little more general, we could have the json decoder return:[Continue ContinueDecode, Done DecodeResult] where ContinueDecode: List U8->[Continue ContinueDecode, Done DecodeResult] .

So that ContinueDecode is basically the equivalent to a task (It holds a continuation) but instead of being handled by the platform it's handled by some code within roc.

Then at the top level in the stream reading loop we read from the stream every time we are asked to continue, and then resume the decoder.

It might look something a little like this:

reading_from_stream \ continue, buf->
   when continue bytes is
   Effect (WaitForBytes continueParsing)-> Next(continueParsing )
   Result result-> StopReading(result)

where the pasing function would look something like

string=\bytes, state->
   when bytes is
   ['"', .. as rest] -> {result:Ok state, rest}|>Effect.result
   [a,.. as rest ]-> string rest (state|>List.append a)
   #waitforBytes is the the
   []-> waitForBytes |>Effect.await \ moreBytes-> string moreBytes

We don't get any overhead decoding the entire buf at once, and the same code can be re-used between the stream and non-stream version. The non-stream version just returns TooShort if it gets the continuation

I'm sure most roc folk already know this, but if anyone reading this isn't super familiar with the general idea of algebraic effect handlers, This is the best explanation I've found: https://gopiandcode.uk/logs/log-bye-bye-monads-algebraic-effects.html . "An effect is an exception that allows you to resume from where it was thrown, potentially with some additional data"

This is all just an idea, but I think it could be a good one ;)


Last updated: Jun 16 2026 at 16:19 UTC