So this isn't a proposal, more of a general discussion that @Eli Dowling pointed out to me.
Fundamentally, our encoder/decoder api is based on serde, but it is missing one vital piece. There is no way to encode/decode from a reader like type. We only allow for decoding from a list with a non-resumable api.
This can be really important for webservers. They want to start decoding json right as it streams in. They don't want to have to load all of the bytes into a list before even beginning to decode. Loading everything first can be a huge perf hit due to delaying the execution.
On top of that, always requiring a full list for decoding can have heavy memory use effects.
One piece that makes the api design exceptionally tricky in roc is that a reader like type requires tasks in order to function. Whenever it runs out of bytes, it runs a task to load more bytes if they are available. We probably don't want to require all decoding/encoding to use tasks.
Totally open to thoughts and discussion. Any ideas around how we can support this usecase in roc.
I might be missing something, but isn't that just a convenience? :thinking:
like we can already partially decode a List U8 and end up with bytes left over, right?
oh I guess we aren't resumable because we don't give you back the state at the end
but it seems like the I/O piece is a convenience
that is, we should be able to say "I have some bytes, doesn't matter where I got them from, decode whatever I gave you, and then later I'll give you some more bytes and I want you to pick up where you left off"
in other words, I don't think Task should need to be involved; at best, it would be a convenience wrapper around something that doesn't have Task involved
and I think the only missing piece there is being able to get the state back out the other side, so you can pick up where you left off
Correct. Making it resumable is also a valid solution. Can we do that while keeping perf reasonable? That requires being able to stop in the middle of any nested decode and preferably never redo work.
I don't think that is an easy task
Much much easier to just run a Task and load a new chunk of bytes
We could model it like an effects system perhaps? Like when it needs more bytes it returns a state and a continuation that can be run to keep processing once there are more
Is that how tasks are modelled in roc?
it's how we want them to be, but they aren't modeled that way yet
So we have a list decoder that calls a record decoder that calls it's second field decoder to decode an integer. The integer decoder runs out of bytes after decoding 5 digits.
How is this represented in a way such that we resume decoding right with decoding the 6th digit without needing to redo any of the list, record, or previous digit decoding?
I haven't thought this through all the way, but I wonder if that could be as simple as something like changing this...
DecodeResult val : {
result : Result val DecodeError,
rest : List U8,
}
...to this:
DecodeResult val : {
result : Result val DecodeError,
rest : List U8,
continue : List U8 -> DecodeResult val,
}
I'm not sure if maybe that should go in the Result val DecodeError or something
but the general idea is that if we get partway through decoding and don't finish, we return a function which closes over the in-progress state (and a slice of the remaining bytes to parse, or maybe even just all the original bytes?)
such that you can just call that function with more bytes as you receive them, and it'll keep making progress
it'd also close over the original Decoder so all the settings would be preserved too
So when the int decoder runs out of bytes, it returns an error that it is to short, an empty rest, and a continuation that captures how to continue decoding the rest of the integer (so presumably the captured state is just the partially loaded int). Then the nested record decoder does the same with a partial record state. And the nested list decoder does the same with a partially filled list.
So essentially builds a chain of nested state. That gets resolved by a chain of closure calls piping in new data?
I'm not sure if it needs to be significantly more nested than it is today
like for example, imagine we're running a decoder over a bunch of bytes
we have all this state in memory about the progress we've made so far
then suddenly we attempt to do the next thing we want to try, and we discover that we've run out of bytes unexpectedly
well, at that point we already have all the information in scope to continue; the only problem is that we ran out of bytes
so right at that point, we can return a closure which takes more bytes and continues
so I think the key thing is not so much nesting but rather being able to stitch together chunks of bytes
because let's say we had N bytes and that wasn't enough, but then when we get another M bytes it's still (somehow) not enough - like some super long string literal that still isn't closed somehow
well then we need to wait for a third chunk of bytes, but we probably still need to have kept around the previous two chunks of bytes so we can copy out of them
once we eventually finally do find the close quote that ends the gigantic string literal in the JSON
I'm not sure what the best way would be to make that part work :thinking:
It isn't all in scope. It is represented in a nested form already.
List Decoder calls Elem Decoder which happens to be Record Decoder. Record Decoder calls Field Decoder which happens to be Int Decoder. Int Decoder runs out of bytes.
So I'm pretty sure we will be returning a nested closure.
oh sure, I just mean it's not more nested to be resumable
if that makes sense
like I think it's the same amount of nesting as what we currently have, just with the bytes needing to be stitched together somehow
which could potentially be as simple as doing like a List (List U8) or something
I'm not sure what the best way would be to make that part work :thinking:
Yeah, I guess you would potentially prefer to append bytes to the original list in that case to make one seamless slice for the string.
Otherwise, with this API, I guess you need to make the string by contacting chunks of all the lists into a new allocation
I guess it also opens some API design. Do you want to be able to represent a string that spans multiple lists and only lazily load it when it actually is needed.
yeah my worry about appending bytes to the original string is basically what if it's a gigantic number of bytes, and they're used somewhere else, and now we're cloning all of that :sweat_smile:
For sure. It doesn't really work in roc.
does it work anywhere? haha
I mean I guess if you're okay just mutating a thing which others will look at later, and maybe they're now looking at a broken thing :laughing:
Yeah. If everything is using indices instead of references, you can safely keep appending.
hm, I don't follow
if multiple things have access to a chunk of data, then if appending to it is allowed, multiple things could attempt to append and end up overriding each other's data, right? (since each of them would have a local understanding of the data's length, and where appending "begins")
The only thing that can append is the active decoder when it loads more bytes. So the inner string decoder. Nothing else would be appending at the same time
but if it's shared, we can't know whether 2 of the things sharing it happen to be decoders
Ah, I was thinking that this could be done that way in rust or zig. In roc, it wouldn't work.
well it's the same thing there though
like if only one thing is referring to that memory (e.g. &mut or mut in Rust) then that one thing can safely modify it as desired without affecting anything else
which corresponds to refcount of 1 in Roc
but if multiple things are referring to it, then if any of the things that's referring to it modifies it, that can cause bugs for the other things when they go back to it later and also try to modify it
The decoder stores a state List U8. All string elements are just offsets. Other elements decode fully cause they have no underlying data that needs to be referenced. If we pause in the middle of loading a string. We can just append more bytes safely.
of you mean append more bytes to the string in progress
I thought we were talking about appending to the original input List U8
Yes, appending to the original List U8
Cause we aren't changing the index of anything. So all indices are still valid.
oh, sure - but the problem is when 2 different decoders both decide to append things
they each have their own local idea of what the length of the List U8 is
which is where they start "appending" from
There is only ever one decoder per List U8. They own the data they are decoding
so if they both do that, they will stomp on each other and overwrite each others' data, causing all sorts of problems
not necessarily - I can certainly pass the same List U8 to multiple decoders! :big_smile:
and unless they clone it, it'll be a shared reference
That's a design choice
but it's not avoidable
Would it every make sense to pass the same List U8 to different decode while also using continuations to process more data? I don't see the point of that
I don't think it's possible (even theoretically, in any language) to safely allow receiving the bytes as a potentially-shared reference (without defensively cloning it) and still appending to it
They would just be doing the same work twice for no reason
This might be where a stream type would come in very handy! You get the stream from the platform and the platform can uniquely just keep appending to it as Brendan is saying. And inside roc you can just keep accessing it. That seems to solve the problem?
unfortunately if the decoder is in progress, then it still necessarily has references into the original stream
it also may have created things which are slices into the original list of bytes
So what? As Brendan says if you are only ever appending the decoder shouldn't care.
hm, potentially yeah in that scenario
a separate question: let's say we're appending to the original list of bytes...what happens when we exceed its capacity?
presumably we do the usual thing of making a fresh allocation and copying everything over
that seems undesirable for streaming, where we'd probably rather have a fixed-size buffer and not have to grow-and-copy every time we get a new chunk
as opposed to, say, having a fresh (empty) allocation for each new chunk we receive
sized to whatever our buffer size is
and then work in terms of that instead of appending to the previous one
Yeah, I was about to say just that. It doesn't solve the issue of the stream just growing forever it needs to be mutated to "take bytes out"
yeah so we might want a List (List U8) representation anyway just because that's a more natural fit for the allocation pattern the host will want to do when streaming anyway
that is, a list of buffers, each having probably the same capacity and similar lengths
This would be a situation where uniqueness guarantees would be super helpful... But then you're going towards the borrow checker
well the specific scenario I was concerned about is the scenario where you genuinely want to share it
in other words, where it just isn't unique, period
not just that we couldn't tell whether it would be at compile time
if that makes sense
I think you should have to explicitly copy the stream. Otherwise you're pretending a stream is something it isn't. Like two people cannot read from a network stream at once, I think pretending otherwise is just adding performance footguns.
Just to clarify here:
Due to using seamless slices for strings, this will never work in safe roc.
You would have to manually use indices into the list to make this potentially safe. (manually make a list view type that is a just some indices. Later have to pass the full list in to load the actually string from it)
That said, a huge point of streaming is to reduce memory footprint.
As such, if you are just appending to the list anyway, I don't think there is much of a gain from streaming.
I think a streaming api would be best suited to not use slices.
Instead, it would always copy the strings out of the original list and into a new allocation.
This leads to the minimal memory use possible.
At the edge where you need to parse part way through string parsing, you would simply copy out the first chunk of the string, load a new chunk of the original list, continue parsing, and eventually append the rest of the bytes of the string on.
If continuation costs are low enough, and the data is streaming in slow enough, there is theoretically a time where the streaming api is faster than the non-streaming api. I would guess this is pretty rare given #ideas > Decoder APIs: Streaming = Bad Perf? and the potential ~3x to 5x slow down due to using streaming apis.
As such, I think any sort of streaming api is likely only going to be worth it from a memory saving perspective. Load in chunks, make more allocations, but reduce peak memory. Instead of load all at once, and use slices to the original allocation.
Would 100% need to:
As a note, rust's streaming api should have less overhead than a roc version. Roc has to capture the decoder state into a closure. Rust just performs a side effect to load more bytes, leaving the decoder state untouched on the stack.
All this long rant to say, lets build a great non-streaming decoder. Once that works, lets try to build a streaming decoder (probably manually without depending on decode) and see what type of perf we can get out of it. Then let's decide if it seams worth including in the standard.
love it, makes great sense to me!
I remember, back in the early 2000's when learning Java XML parsers (I remember one called Stax), there was a push and pull approach. I can't remember all the details, but it was something like this:
That :point_up: is something different from what we are used to when a decoder itself is responsible to produce an end result. My point is maybe the idea of streaming APIs for decoding should come down to event-like operations? We would not try to retrofit the streaming decoder into a "regular" one.
Imagine handling an almost endless stream of JSON objects and all we need is to extract a small piece of selected objects, or just to count them, or if we have a huge object but all we care is just one property. Having a Stax-like API would reduce memory consumption and improve performance a lot, I guess.
I think you're talking about an SAX parser and It's how RapidJson and Serde work, I do agree that it should be explored.
I tried to implement one in roc actually, sadly it doesn't really work easily becasue roc requires a task to do a side effect which is what sending an event is. eg: In RapidJson you pass the JsonReader an object with functions that get called when a string, or a number or something is decoded.
I did think it might work if you could take the events and push them onto a stream that could then be read by the consumer, but we don't have streams yet.
I tested an implementation for fun, that just pushes all the events onto a big list and then reads them again, but as you would imagine it's horrifically slow.
As for streams. I had a think, and I wonder what people think of this as an idea:
walkStreamUntil:Stream, a,(a,List U8->[Contiune a, Break a])->(Task (a, Stream),[StreamClosed,StreamReused])
countBytes=
stream|>walkStreamUntil 0 \state,buf->
Continue ((buf|>List.len)+state)
(count,stream)<- countBytes|>Task.await
# This would have read to the end, but if we had exited after only a few reads we could then read some more from the stream
Stream type as an Id to a stream rather than the stream itself. That way it's just like a filePath. If you call File.Delete myPath and then File.Read myPath you accept that that will cause an error. Same as calling stream functions. So perhaps it doesn't make sense to return a new stream?In a platform task, it should be easy to make a stream that overwrites a buffer if it is unique and just leaves it to roc to free if it isn't unique. So the perf would depend on how roc consumes it, but it would be safe. If roc is keeping many references to the stream, it will keep allocating new buffers. If the roc is written well and avoid referencing the buffer, it will get reused.
If we want to enforce perf, we force the stream API to return an error to roc if the buffer is not unique. Just like File.read will return an error if they file doesn't exist
So I think this could be done safely in roc with pretty minimal overhead
Or instead of returning a result to roc, the stream could even panic if it isn't unique and give the user a message that explains why that is a bad design and a bug when using streaming apis to keep a reference.
Cause that should get caught during development and be considered a bug that needs to be fixed. Of course, if the user is lazy and doesn't really care they can just introduce explicit copying into the roc to duplicate the stream buffer.
I feel that something like this should work quite well
Okay, I've also done a little more investigation on what a streaming decoder api could look like.
Dotnet defintely uses basically the design i proposed. They actually store the current stack frame of the parser, and return that every time the buffer is empty. If it's not yet finished parsing they read more bytes from the stream and run the continuation. https://source.dot.net/#System.Text.Json/System/Text/Json/Serialization/JsonConverterOfT.ReadCore.cs,bd75b9c1b037eafc
@witoldsz I had a proper look at StAX parsers. I see now what you mean, they are basically the same except StAX is pull based as you say. I do think that design may also be helpful but I do worry that calling Stream.next() would be a task and one Task per token in the json might be prohibitively slow. Maybe you could make Stream.next() give you a list of tokens that were parsed and that might reduce the task cost enough
Okay, so If when making Tasks use continuations we make the concept a little more general, we could have the json decoder return:[Continue ContinueDecode, Done DecodeResult] where ContinueDecode: List U8->[Continue ContinueDecode, Done DecodeResult] .
So that ContinueDecode is basically the equivalent to a task (It holds a continuation) but instead of being handled by the platform it's handled by some code within roc.
Then at the top level in the stream reading loop we read from the stream every time we are asked to continue, and then resume the decoder.
It might look something a little like this:
reading_from_stream \ continue, buf->
when continue bytes is
Effect (WaitForBytes continueParsing)-> Next(continueParsing )
Result result-> StopReading(result)
where the pasing function would look something like
string=\bytes, state->
when bytes is
['"', .. as rest] -> {result:Ok state, rest}|>Effect.result
[a,.. as rest ]-> string rest (state|>List.append a)
#waitforBytes is the the
[]-> waitForBytes |>Effect.await \ moreBytes-> string moreBytes
We don't get any overhead decoding the entire buf at once, and the same code can be re-used between the stream and non-stream version. The non-stream version just returns TooShort if it gets the continuation
I'm sure most roc folk already know this, but if anyone reading this isn't super familiar with the general idea of algebraic effect handlers, This is the best explanation I've found: https://gopiandcode.uk/logs/log-bye-bye-monads-algebraic-effects.html . "An effect is an exception that allows you to resume from where it was thrown, potentially with some additional data"
This is all just an idea, but I think it could be a good one ;)
Last updated: Jun 16 2026 at 16:19 UTC