so historically I have been resistant to the idea of having json as a builtin serialization format, largely because serialization formats historically come and go, and (for example) Scala's decision to make xml part of the syntax apparently ended up being considered a mistake in hindsight.
however, I've been reconsidering this based on 2 factors:
so I wonder about the idea of having some Roc-specific builtin serialization format that can be an obvious default choice for Roc programs that are in charge of their own serialization needs (I would not like the idea of encouraging JSON as a default for Roc programs)
and then having a JSON builtin to acknowledge the reality that if you want to build a modern web server, which is one of Roc's main use cases, you're almost certainly going to be speaking JSON to another server, and it's valuable for that to be as fast as possible
as a note on context, someone was telling me about a project they worked on where they had to get multi-MB json files over the network, and no matter what they did their response latency was terrible because the json parsing took so long.
I realized this is (sadly) a common scenario these days, and it's a shame that Roc would be unhelpful in that situation. "Rewrite it in Rust I guess?" feels really unfortunate when this is a use case we explicitly want to be great at, and in every other way Roc would probably be a great choice.
for example, nodejs now uses SIMDjson, so anyone in that scenario could accurately say "if I switch to Roc my server will be slower than nodejs" just because the JSON parsing is so much slower in Roc, the rest of the language doesn't even register on a flame graph
hmm... This one is at a weird place for me. Cryptography is special mostly because of the security implications. So I don't think it is a fair comparison at all.
What specifically I take some issue with is that Roc really should work to enable performance for these kinds of use cases. I think making json builtin for perf is kinda disapointing. Maybe it can't reach simdjson speeds, but it should be able to reach speeds that can saturate a network card. I feel like making json a builtin is just one of many places where roc will hit perf limits and it is important to figure out how to surpass those within roc instead of by working outside of roc.
At the same time, being practical, it makes a lot of sense (though I would really push for also including a number of faster protocols if we add json just to help push roc users to those as well).
and if this were an esoteric edge case that rarely came up in practice, I don't think it would be worth bringing up, but unfortunately it feels closer to the default than obscure :sweat_smile:
So to me this kinda feels like giving up on roc perf in a major way, but obviously the ceiling in roc will always be lower than the ceiling in c++/rust/zig
I think we should try to make a fast json and optimize that before we seriously consider making it a builtin. We've barely scrathed the surface on performance there, and its a good way to find and tackle of the issues.
As a concrete example, I think it is really important for us to figure out simd. We should be able to follow in the footsteps of mojo and generate a much better simd story in roc. I think we could unlock a ton of performance within roc via figuring out simd. If we always push simd and perf off to c++/rust/zig, we will never build out a critical piece of the perf story.
Brendan Hansknecht said:
What specifically I take some issue with is that Roc really should work to enable performance for these kinds of use cases. I think making json builtin for perf is kinda disapointing. Maybe it can't reach simdjson speeds, but it should be able to reach speeds that can saturate a network card. I feel like making json a builtin is just one of many places where roc will hit perf limits and it is important to figure out how to surpass those within roc instead of by working outside of roc.
I don't think it's possible to have a memory safe JSON parsing implementation (in any language) that achieves I/O bottleneck perf, but I'd love to be wrong about that!
Why do you think that?
one of the points Daniel Lemire (SIMDjson author) makes repeatedly is that everyone thinks they're I/O bound until they measure, and then it turns out they are not remotely close to I/O bound
Our experiments with simdJson suggested we could use a similar approach
I think we could do a lot better than our own status quo with a SIMDjson-like parser but I don't think it would be I/O bound
Mojo allows memory unsafety; if we did too, then I think it would be totally doable!
Daniel Lemire (SIMDjson author) makes repeatedly is that everyone thinks they're I/O bound until they measure
This is talking about contexts that aren't trivially parallizable, right? As in, webservers are regularly io bound even with slow json parsing
Mojo allows memory unsafety; if we did too, then I think it would be totally doable!
I'm not sure why memory unsafety is so important for this.
I think it's a common misconception that webservers are regularly I/O bound :big_smile:
I bet for many people that is true, but I think it is still pretty common. I have seen a python webserver that saturates a 10GB/s nic
But that was from a co-worker trying to prove a point around python perf. So idk. I need to dig into it more.
maybe an interesting thing to consider might be: "if you start with a fully builtin JSON implementation, is there some way you can take it apart into primitives which are perhaps seemingly oddly specific, but still not quite coupled to JSON"
like for example an API where you specify things like byte delimiters for strings and lists and records etc
so for json you'd say like "ok use double quotes to delimit strings, allow backlashes for escaping..." etc
and then after inlining you basically end up with SIMDjson
So maybe a builtin simd lexer/parser?
yeah something like that
Yeah, that sounds much more the right way to go if we can do it (and we can't give generic simd overall)
Also, with the i/o bound comment. I think the important note is that 1 cpu is rarely anywhere near i/o bound (with modern ssd or network), but modern computes will have 100s of cpu all pulling through a single i/o path.
It's only 10MB/s per core for 128 cores to saturate a 10Gbps nic.
For reference a modern ssd, which can be hard to saturate with a single core has a sustained rate of about 100MB/s.
So we have a buffer of about 10x
Even rapid json at 4x slower than simd json for raw single threaded benchmarks still parses json at 500MB/s (these numbers come from the simdjson perf benchmarks) which is more than enough perf for this.
hm, interesting!
:thinking: I wonder if we could do some optimizations based on the new Decoding API
like detect certain properties of the ability members and optimize to SIMD tokenization and parsing as long as they don't violate certain properties
for example, one of the beneficial optimizations is that if you're only ever parsing utf8 strings out of it, and also you're only ever splitting on bytes in the ASCII range (which we know statically bc we know the ranges of number literals in the type system) then we can validate the whole thing for utf8 using SIMD, like SIMDjson does, instead of having to do it on small unaligned chunks wherever the strings happen to be
so for example maybe we could have a new primitive like:
Str.parseUtf8 : List U8, {
open : U8,
close : U8,
escape ? [Supported U8, Unsupported],
} -> Result { parsed : Str, rest : List U8 } [...]
hm, I think one big step that would help is having separate abilities for encoding/decoding bytes vs strings
e.g. a Parse ability for going from strings to values
that way the UTF-8 validation could be baked in as its own step
I don't have a good enough background with different types of parsers.. but I feel like there might be a way to build up a data structure which describes the parser for utf-8 segments, and then do some kind of analysis and transformation, that ends up in a form where it's basically simdjson like.
Apologies for how hand-wavy that statement is...
I don't think it's possible to have a memory safe JSON parsing implementation (in any language) that achieves I/O bottleneck perf
Can it be done in rust without using the unsafe features?
https://docs.rs/simd-json/latest/simd_json/#safety
simd-jsonuses a lot of unsafe code.
There are a few reasons for this:
* SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such assimd-json.
* We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost. This is a more considered path insimd-json.
https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/SIMD.20API/near/279933993
Where did we get to with the SIMD for roc ideas?
1. Introduce a new Simd builtin module which has:
* an opaque SIMD vector type called Simd which automatically stores its data with appropriate alignment necessary for SIMD operations
* ways to convert them to/from tuples, for example:
Simd.new2 : (a, a) -> Simd a [L2]
Simd.new4 : (a, a, a, a) -> Simd a [L4]
* The [L2] and [L4] types are ordinary closed tag unions, serving as phantom types that represent SIMD vector lane counts of 2 and 4, respectively.
* Since only certain combinations of SIMD operations are supported by the hardware, it makes sense to only expose constructors for those. There's no need to expose a Simd.new17, for example, because no CPU supports that.
* SIMD-powered operations on those types, for example:
Simd.mulWrap : Simd (Num a), Simd (Num a) -> Simd (Num a)
so breaking down what SIMDjson does, it's basically:
"s are, or a 64-bit bitmap where it's all 0s except where the {s are.OpenCurlyBrace, Comma, etc.)this part:
do a clever fancy thing with the bitmaps to figure out where the strings are, including taking escapes into account
I don't think can possibly be done in userspace without memory unsafety
(technically UTF-8 unsafety, which translates into memory unsafety because other things assume Str is valid UTF-8 and if that assumption is violated, it could lead to memory unsafety)
Luke Boswell said:
1. Introduce a new Simd builtin module which has: * an opaque SIMD vector type called Simd which automatically stores its data with appropriate alignment necessary for SIMD operations * ways to convert them to/from tuples, for example: Simd.new2 : (a, a) -> Simd a [L2] Simd.new4 : (a, a, a, a) -> Simd a [L4] * The [L2] and [L4] types are ordinary closed tag unions, serving as phantom types that represent SIMD vector lane counts of 2 and 4, respectively. * Since only certain combinations of SIMD operations are supported by the hardware, it makes sense to only expose constructors for those. There's no need to expose a Simd.new17, for example, because no CPU supports that. * SIMD-powered operations on those types, for example: Simd.mulWrap : Simd (Num a), Simd (Num a) -> Simd (Num a)
Has anyone thought about using an ISPC/higher level/SPMD feel for vector code? It'd be nice to not have to write intrinsic-like code or have the compiler handle the lane width etc.
Yeah, I think we need it parameterized in a way that programmers still have control but generally don't have to dig under the hood.
I still think mojo is the model here. simd and number types are the same allowing for easy upgrading from one to the other. On top of that, simd algorithms should almost never consider the width even when written explicitly. That said, if you have a reason to use width 4, you can. And the compiler will map that to whatever existing simd hardware as best as possible.
At certain points simd requires algorithmic changes (often removing early exits and using select operators instead of if conditionals for example), but you want minimal churn otherwise.
You also 100% want control to be able to explicitly vectorize a function over a range of inputs or simd execute an element wise function across a list.
With roc, this will be interesting to map cause we don't have any sort of metaprogramming. Which I think tools like simd often benefit from significantly.
Brendan Hansknecht said:
At certain points simd requires algorithmic changes (often removing early exits and using select operators instead of if conditionals for example), but you want minimal churn otherwise.
You also 100% want control to be able to explicitly vectorize a function over a range of inputs or simd execute an element wise function across a list.
This is what I like ISPC for. Being able to use existing constructs (i.e conditionals) and know its using the dual in the SPMD way. Like using an if and the compiler handles the execution mask for me or automatically doing a wide for loop. For me it makes writing SIMD easier and has comparable or better performance than a hand-tuned SIMD program. But it adds a lot more to what the compiler has to do to deliver that experience.
Looking at ISPC code and some example docs, I think mojo and ISPC give very similar tooling. Though mojo definitely exposes more control if wanted. But yeah, same vein of solution. At least in the simplest use cases.
Brendan Hansknecht said:
Yeah, I think we need it parameterized in a way that programmers still have control but generally don't have to dig under the hood.
I still think mojo is the model here. simd and number types are the same allowing for easy upgrading from one to the other. On top of that, simd algorithms should almost never consider the width even when written explicitly. That said, if you have a reason to use width 4, you can. And the compiler will map that to whatever existing simd hardware as best as possible.
Mojo defines overflow as undefined behavior, right? If we did this, we'd either have to do the same (or define it as wrapping) or else polymorphism would surprisingly work differently in terms of overflow for scalar integers (crash on overflow) and vectorized ones (wrap, because there's no way to implement simd crash on overflow without destroying the perf benefits of SIMD)
Something that's also important to consider in this is being able to operate the parser on a data stream.
It's all very well to be able to parse json at 100MB/s but if you need to load the whole thing into memory first to do that, you're leaving a lot of perf on the table.
Plus if you want to consume gigabytes of json, having to put it all in ram at once is just not always feasible.
Streaming decoding and encoding definitely can be a big win for overall server perf :)
Sure, but performant streaming operates in chunks. Once the chunk size is large enough, that is really no different from just loading a gigantic JSON into memory in terms of perf (bar cache misses once your size is too large).
So I would argue they are the exact same problem for simd design. Or almost exactly the same.
Definitely for the simd part. I was more talking about the overall implementation of the parser.
Implementing parsers that work on streaming data often has to work quite differently (as we discovered last time we spoke about this, how do you realize you have no more data, poll for more and then resume where you left off).
I just felt it important to note, that simd is something that can make parsing much faster that doesn't have a solution in roc, and resumable parsing is also an unsolved problem that could make parsing much faster, but I'm not sure is possible efficiently in roc right now.
:)
Last updated: Jun 16 2026 at 16:19 UTC