I pull this to a second thread so it doesn't derail the first one.
Apparently according to serde_json, using from_reader is generally a lot slower than just loading the entire list/str before hand:
Note that counter to intuition, this function is usually slower than reading a file completely into memory and then applying
from_strorfrom_sliceon it. See issue #160.
So this may actually be a pattern that isn't actually worrying a ton about. The perf hit of using the streaming decoder can apparently be huge (like 3 to 5x slower). For files, you essentially always just want to mmap the entire file into memory (this is super great for roc where you can have a seamless slice that points into the mmap).
For webservers, to avoid ddos attacks, it is generally recommended to limit the size of the body. Given the size of the body is limited, it may actually be better to just load the entire body into memory and then decode. Obviously, there will exist some network speed where it is faster to do the streaming decode.
Probably the biggest concern with not a having a streaming decode is if you have some sort of trusted generated input of unknown length. It may not be reasonable to load everything in one go. You might run out of memory. In that specific case, loading in chunks may be required.
Another example is simd-json. From what I can tell, it does not work with a continuation style api. It requires that the entire data be loaded into a single allocation up front. It then does lazy parsing of the json.
So, just more food for thought, maybe we don't actually want a streaming api?
huh, wow!
I never knew about that
if it's a footgun, then at a minimum we probably shouldn't add it eagerly :sweat_smile:
and instead wait to really confirm there's a use case where it's actually the right choice, even with the knowledge of this :point_up:
Oh wow, reading through that issue more, some cases even with buffered readers on serde_json can be more than 10x slower.
Oh wow, this one is crazy (EDIT: huge, but not that crazy):
Benchmarks in dotnet don't agree with that at all sadly:
| Method | Mean | Error | StdDev |
|----------------- |---------:|---------:|---------:|
| IdOnlyJsonStream | 38.68 ms | 0.548 ms | 0.512 ms |
| IdOnlyReadFIle | 42.02 ms | 0.840 ms | 1.843 ms |
I'm using 21MB of json. I'll try to make a rust benchmark too, but I'm not as familiar with streaming there.
If you'd like to audit the highly complex code that got us here :sweat_smile: :
[<Benchmark>]
member _.IdOnlyJsonStream() =
let jsonStream = File.OpenRead("/home/eli/Code/roc/lsp/small-json.json")
let objects = JsonSerializer.Deserialize<MyObject ResizeArray>(jsonStream)
let item=(objects.Item (objects.Count-1)).id
item
[<Benchmark>]
member _.IdOnlyReadFIle() =
let json = File.ReadAllBytes("/home/eli/Code/roc/lsp/small-json.json")
let objects = JsonSerializer.Deserialize<MyObject ResizeArray>(json)
let item=(objects.Item (objects.Count-1)).id
item
Edit: Using ReadAllBytes instead
In this case it's what I'd expect, a tiny bit faster because the data is decoded as it's being read and reading data takes time
Does donet let you mmap a file? If so, how does that compare?
Ok, so @Eli Dowling and I did some more digging. The original perf numbers of 5x to 10x are definitely wrong. The reader api is still not as fast in rust, but later commits have sped it up tremendously (mostly by adding and reusing buffers).
These are the rough findings. Would have to do more testing to fully confirm all of them:
simd-json. Many of the techniques can be applied to a streaming setup. That said, lazy decoding and avoiding all data copying is only possible in a non-streaming setup (still has single copy to load a string with escaped characters if the string is used).So this definitely makes the tradeoff feel much more nuanced.
I think my suggestion in #ideas > Decoder APIs likely require streaming still stays the same, but I now think that a streaming api could definitely be useful. It seems in many cases the perf may be negligible but the memory usage difference significant.
I do think it is worth remembering there are some applications that would be simply impossible without some kind of stream api..
eg: Summing all the values in a 1gb json file on a little cloud instance with only 500mb of memory, or streaming a video file over the network or any number of other things.
I think a streams design in roc is worth keeping in mind.
Yeah, theoretically mmap partially helps for the "1gb json file on a little cloud instance with only 500mb of memory", but the streaming decode/encode is definitely needed for video/audio streams.
would streaming video/audio streams use Decoding though? :thinking:
No likely not, but this conversation had kind of wondered into "should we support streams" area as well. So I just wanted to reiterate their importance :sweat_smile:
Last updated: Jun 16 2026 at 16:19 UTC