json builtin · ideas · Zulip Chat Archive

Stream: ideas

Topic: json builtin

Richard Feldman (Oct 02 2024 at 19:44):

so historically I have been resistant to the idea of having json as a builtin serialization format, largely because serialization formats historically come and go, and (for example) Scala's decision to make xml part of the syntax apparently ended up being considered a mistake in hindsight.

Richard Feldman (Oct 02 2024 at 19:44):

however, I've been reconsidering this based on 2 factors:

the same objection applies even more strongly to cryptography (cryptographic algorithms get cracked or otherwise become obsolete for performance reasons even more frequently than serialization formats do)
there's an old joke, "most of what our servers' CPUs do is serialize and deserialize JSON, and occasionally they do something that isn't that" - and the fastest JSON implementations (based on SIMDjson) require memory unsafety, which roc should never add
it's one thing to say "well just don't use JSON if you want your servers to go faster" but JSON specifically is so common in third party APIs that many servers have no choice in the matter.

Richard Feldman (Oct 02 2024 at 19:51):

so I wonder about the idea of having some Roc-specific builtin serialization format that can be an obvious default choice for Roc programs that are in charge of their own serialization needs (I would not like the idea of encouraging JSON as a default for Roc programs)

Richard Feldman (Oct 02 2024 at 19:52):

and then having a JSON builtin to acknowledge the reality that if you want to build a modern web server, which is one of Roc's main use cases, you're almost certainly going to be speaking JSON to another server, and it's valuable for that to be as fast as possible

Richard Feldman (Oct 02 2024 at 20:00):

as a note on context, someone was telling me about a project they worked on where they had to get multi-MB json files over the network, and no matter what they did their response latency was terrible because the json parsing took so long.

I realized this is (sadly) a common scenario these days, and it's a shame that Roc would be unhelpful in that situation. "Rewrite it in Rust I guess?" feels really unfortunate when this is a use case we explicitly want to be great at, and in every other way Roc would probably be a great choice.

Richard Feldman (Oct 02 2024 at 20:05):

for example, nodejs now uses SIMDjson, so anyone in that scenario could accurately say "if I switch to Roc my server will be slower than nodejs" just because the JSON parsing is so much slower in Roc, the rest of the language doesn't even register on a flame graph

Brendan Hansknecht (Oct 02 2024 at 20:06):

hmm... This one is at a weird place for me. Cryptography is special mostly because of the security implications. So I don't think it is a fair comparison at all.

What specifically I take some issue with is that Roc really should work to enable performance for these kinds of use cases. I think making json builtin for perf is kinda disapointing. Maybe it can't reach simdjson speeds, but it should be able to reach speeds that can saturate a network card. I feel like making json a builtin is just one of many places where roc will hit perf limits and it is important to figure out how to surpass those within roc instead of by working outside of roc.

At the same time, being practical, it makes a lot of sense (though I would really push for also including a number of faster protocols if we add json just to help push roc users to those as well).

Richard Feldman (Oct 02 2024 at 20:06):

and if this were an esoteric edge case that rarely came up in practice, I don't think it would be worth bringing up, but unfortunately it feels closer to the default than obscure :sweat_smile:

Brendan Hansknecht (Oct 02 2024 at 20:08):

So to me this kinda feels like giving up on roc perf in a major way, but obviously the ceiling in roc will always be lower than the ceiling in c++/rust/zig

Luke Boswell (Oct 02 2024 at 20:08):

I think we should try to make a fast json and optimize that before we seriously consider making it a builtin. We've barely scrathed the surface on performance there, and its a good way to find and tackle of the issues.

Brendan Hansknecht (Oct 02 2024 at 20:09):

As a concrete example, I think it is really important for us to figure out simd. We should be able to follow in the footsteps of mojo and generate a much better simd story in roc. I think we could unlock a ton of performance within roc via figuring out simd. If we always push simd and perf off to c++/rust/zig, we will never build out a critical piece of the perf story.

Richard Feldman (Oct 02 2024 at 20:10):

Brendan Hansknecht said:

What specifically I take some issue with is that Roc really should work to enable performance for these kinds of use cases. I think making json builtin for perf is kinda disapointing. Maybe it can't reach simdjson speeds, but it should be able to reach speeds that can saturate a network card. I feel like making json a builtin is just one of many places where roc will hit perf limits and it is important to figure out how to surpass those within roc instead of by working outside of roc.

I don't think it's possible to have a memory safe JSON parsing implementation (in any language) that achieves I/O bottleneck perf, but I'd love to be wrong about that!

Luke Boswell (Oct 02 2024 at 20:11):

Why do you think that?

Richard Feldman (Oct 02 2024 at 20:11):

one of the points Daniel Lemire (SIMDjson author) makes repeatedly is that everyone thinks they're I/O bound until they measure, and then it turns out they are not remotely close to I/O bound

Luke Boswell (Oct 02 2024 at 20:11):

Our experiments with simdJson suggested we could use a similar approach

Richard Feldman (Oct 02 2024 at 20:12):

I think we could do a lot better than our own status quo with a SIMDjson-like parser but I don't think it would be I/O bound

Richard Feldman (Oct 02 2024 at 20:13):

Mojo allows memory unsafety; if we did too, then I think it would be totally doable!

Brendan Hansknecht (Oct 02 2024 at 20:13):

Daniel Lemire (SIMDjson author) makes repeatedly is that everyone thinks they're I/O bound until they measure

This is talking about contexts that aren't trivially parallizable, right? As in, webservers are regularly io bound even with slow json parsing

Brendan Hansknecht (Oct 02 2024 at 20:14):

Mojo allows memory unsafety; if we did too, then I think it would be totally doable!

I'm not sure why memory unsafety is so important for this.

Richard Feldman (Oct 02 2024 at 20:14):

I think it's a common misconception that webservers are regularly I/O bound :big_smile:

Brendan Hansknecht (Oct 02 2024 at 20:15):

I bet for many people that is true, but I think it is still pretty common. I have seen a python webserver that saturates a 10GB/s nic

Brendan Hansknecht (Oct 02 2024 at 20:19):

But that was from a co-worker trying to prove a point around python perf. So idk. I need to dig into it more.

Richard Feldman (Oct 02 2024 at 20:19):

maybe an interesting thing to consider might be: "if you start with a fully builtin JSON implementation, is there some way you can take it apart into primitives which are perhaps seemingly oddly specific, but still not quite coupled to JSON"

Richard Feldman (Oct 02 2024 at 20:20):

like for example an API where you specify things like byte delimiters for strings and lists and records etc

Richard Feldman (Oct 02 2024 at 20:21):

so for json you'd say like "ok use double quotes to delimit strings, allow backlashes for escaping..." etc

Richard Feldman (Oct 02 2024 at 20:21):

and then after inlining you basically end up with SIMDjson

Luke Boswell (Oct 02 2024 at 20:24):

So maybe a builtin simd lexer/parser?

Richard Feldman (Oct 02 2024 at 20:24):

yeah something like that

Brendan Hansknecht (Oct 02 2024 at 20:24):

Yeah, that sounds much more the right way to go if we can do it (and we can't give generic simd overall)

Brendan Hansknecht (Oct 02 2024 at 20:27):

Also, with the i/o bound comment. I think the important note is that 1 cpu is rarely anywhere near i/o bound (with modern ssd or network), but modern computes will have 100s of cpu all pulling through a single i/o path.

Brendan Hansknecht (Oct 02 2024 at 20:31):

It's only 10MB/s per core for 128 cores to saturate a 10Gbps nic.

For reference a modern ssd, which can be hard to saturate with a single core has a sustained rate of about 100MB/s.

Brendan Hansknecht (Oct 02 2024 at 20:31):

So we have a buffer of about 10x

Brendan Hansknecht (Oct 02 2024 at 20:37):

Even rapid json at 4x slower than simd json for raw single threaded benchmarks still parses json at 500MB/s (these numbers come from the simdjson perf benchmarks) which is more than enough perf for this.

Richard Feldman (Oct 02 2024 at 22:42):

hm, interesting!

Richard Feldman (Oct 02 2024 at 22:43):

:thinking: I wonder if we could do some optimizations based on the new Decoding API

Richard Feldman (Oct 02 2024 at 22:44):

like detect certain properties of the ability members and optimize to SIMD tokenization and parsing as long as they don't violate certain properties

Richard Feldman (Oct 02 2024 at 22:53):

for example, one of the beneficial optimizations is that if you're only ever parsing utf8 strings out of it, and also you're only ever splitting on bytes in the ASCII range (which we know statically bc we know the ranges of number literals in the type system) then we can validate the whole thing for utf8 using SIMD, like SIMDjson does, instead of having to do it on small unaligned chunks wherever the strings happen to be

Richard Feldman (Oct 02 2024 at 23:02):

so for example maybe we could have a new primitive like:

Str.parseUtf8 : List U8, {
    open : U8,
    close : U8,
    escape ? [Supported U8, Unsupported],
} -> Result { parsed : Str, rest : List U8 } [...]

Richard Feldman (Oct 03 2024 at 00:40):

hm, I think one big step that would help is having separate abilities for encoding/decoding bytes vs strings

Richard Feldman (Oct 03 2024 at 00:40):

e.g. a Parse ability for going from strings to values

Richard Feldman (Oct 03 2024 at 00:41):

that way the UTF-8 validation could be baked in as its own step

Luke Boswell (Oct 03 2024 at 00:44):

I don't have a good enough background with different types of parsers.. but I feel like there might be a way to build up a data structure which describes the parser for utf-8 segments, and then do some kind of analysis and transformation, that ends up in a form where it's basically simdjson like.

Luke Boswell (Oct 03 2024 at 00:44):

Apologies for how hand-wavy that statement is...

Luke Boswell (Oct 03 2024 at 01:14):

I don't think it's possible to have a memory safe JSON parsing implementation (in any language) that achieves I/O bottleneck perf

Can it be done in rust without using the unsafe features?

Luke Boswell (Oct 03 2024 at 01:15):

https://docs.rs/simd-json/latest/simd_json/#safety

Luke Boswell (Oct 03 2024 at 01:15):

simd-json uses a lot of unsafe code.
There are a few reasons for this:
* SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such as simd-json.
* We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost. This is a more considered path in simd-json.

Luke Boswell (Oct 03 2024 at 01:17):

https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/SIMD.20API/near/279933993

Where did we get to with the SIMD for roc ideas?

Luke Boswell (Oct 03 2024 at 01:19):

1. Introduce a new Simd builtin module which has:

* an opaque SIMD vector type called Simd which automatically stores its data with appropriate alignment necessary for SIMD operations

* ways to convert them to/from tuples, for example:
  Simd.new2 : (a, a) -> Simd a [L2]
  Simd.new4 : (a, a, a, a) -> Simd a [L4]

* The [L2] and [L4] types are ordinary closed tag unions, serving as phantom types that represent SIMD vector lane counts of 2 and 4, respectively.

* Since only certain combinations of SIMD operations are supported by the hardware, it makes sense to only expose constructors for those. There's no need to expose a Simd.new17, for example, because no CPU supports that.

* SIMD-powered operations on those types, for example:
  Simd.mulWrap : Simd (Num a), Simd (Num a) -> Simd (Num a)

Richard Feldman (Oct 03 2024 at 01:31):

so breaking down what SIMDjson does, it's basically:

break down the input into 64B chunks (each one 16B-aligned)
for each 64B chunk, use SIMD to very quickly make 64-bit bitmaps of where "interesting characters" are - for example, a 64-bit bitmap where it's all 0s except 1s where the "s are, or a 64-bit bitmap where it's all 0s except where the {s are.
do a clever fancy thing with the bitmaps to figure out where the strings are, including taking escapes into account
translate the remaining bits in the bitmaps into a (heap-allocated) list of tokens (e.g. OpenCurlyBrace, Comma, etc.)
traverse the tokens to parse as normal

Richard Feldman (Oct 03 2024 at 01:32):

this part:

do a clever fancy thing with the bitmaps to figure out where the strings are, including taking escapes into account

I don't think can possibly be done in userspace without memory unsafety

Richard Feldman (Oct 03 2024 at 01:33):

(technically UTF-8 unsafety, which translates into memory unsafety because other things assume Str is valid UTF-8 and if that assumption is violated, it could lead to memory unsafety)

Der Schutz (Oct 05 2024 at 07:27):

Luke Boswell said:

1. Introduce a new Simd builtin module which has:

* an opaque SIMD vector type called Simd which automatically stores its data with appropriate alignment necessary for SIMD operations

* ways to convert them to/from tuples, for example:
  Simd.new2 : (a, a) -> Simd a [L2]
  Simd.new4 : (a, a, a, a) -> Simd a [L4]

* The [L2] and [L4] types are ordinary closed tag unions, serving as phantom types that represent SIMD vector lane counts of 2 and 4, respectively.

* Since only certain combinations of SIMD operations are supported by the hardware, it makes sense to only expose constructors for those. There's no need to expose a Simd.new17, for example, because no CPU supports that.

* SIMD-powered operations on those types, for example:
  Simd.mulWrap : Simd (Num a), Simd (Num a) -> Simd (Num a)

Has anyone thought about using an ISPC/higher level/SPMD feel for vector code? It'd be nice to not have to write intrinsic-like code or have the compiler handle the lane width etc.

Brendan Hansknecht (Oct 05 2024 at 15:57):

Yeah, I think we need it parameterized in a way that programmers still have control but generally don't have to dig under the hood.

I still think mojo is the model here. simd and number types are the same allowing for easy upgrading from one to the other. On top of that, simd algorithms should almost never consider the width even when written explicitly. That said, if you have a reason to use width 4, you can. And the compiler will map that to whatever existing simd hardware as best as possible.

Brendan Hansknecht (Oct 05 2024 at 16:01):

At certain points simd requires algorithmic changes (often removing early exits and using select operators instead of if conditionals for example), but you want minimal churn otherwise.

You also 100% want control to be able to explicitly vectorize a function over a range of inputs or simd execute an element wise function across a list.

Brendan Hansknecht (Oct 05 2024 at 16:04):

With roc, this will be interesting to map cause we don't have any sort of metaprogramming. Which I think tools like simd often benefit from significantly.

Der Schutz (Oct 05 2024 at 16:25):

Brendan Hansknecht said:

At certain points simd requires algorithmic changes (often removing early exits and using select operators instead of if conditionals for example), but you want minimal churn otherwise.

You also 100% want control to be able to explicitly vectorize a function over a range of inputs or simd execute an element wise function across a list.

This is what I like ISPC for. Being able to use existing constructs (i.e conditionals) and know its using the dual in the SPMD way. Like using an if and the compiler handles the execution mask for me or automatically doing a wide for loop. For me it makes writing SIMD easier and has comparable or better performance than a hand-tuned SIMD program. But it adds a lot more to what the compiler has to do to deliver that experience.

Brendan Hansknecht (Oct 05 2024 at 16:53):

Looking at ISPC code and some example docs, I think mojo and ISPC give very similar tooling. Though mojo definitely exposes more control if wanted. But yeah, same vein of solution. At least in the simplest use cases.

Richard Feldman (Oct 05 2024 at 18:38):

Brendan Hansknecht said:

Yeah, I think we need it parameterized in a way that programmers still have control but generally don't have to dig under the hood.

I still think mojo is the model here. simd and number types are the same allowing for easy upgrading from one to the other. On top of that, simd algorithms should almost never consider the width even when written explicitly. That said, if you have a reason to use width 4, you can. And the compiler will map that to whatever existing simd hardware as best as possible.

Mojo defines overflow as undefined behavior, right? If we did this, we'd either have to do the same (or define it as wrapping) or else polymorphism would surprisingly work differently in terms of overflow for scalar integers (crash on overflow) and vectorized ones (wrap, because there's no way to implement simd crash on overflow without destroying the perf benefits of SIMD)

Eli Dowling (Oct 06 2024 at 06:44):

Something that's also important to consider in this is being able to operate the parser on a data stream.

It's all very well to be able to parse json at 100MB/s but if you need to load the whole thing into memory first to do that, you're leaving a lot of perf on the table.
Plus if you want to consume gigabytes of json, having to put it all in ram at once is just not always feasible.

Streaming decoding and encoding definitely can be a big win for overall server perf :)

Brendan Hansknecht (Oct 06 2024 at 06:48):

Sure, but performant streaming operates in chunks. Once the chunk size is large enough, that is really no different from just loading a gigantic JSON into memory in terms of perf (bar cache misses once your size is too large).

So I would argue they are the exact same problem for simd design. Or almost exactly the same.

Eli Dowling (Oct 17 2024 at 07:25):

Definitely for the simd part. I was more talking about the overall implementation of the parser.

Implementing parsers that work on streaming data often has to work quite differently (as we discovered last time we spoke about this, how do you realize you have no more data, poll for more and then resume where you left off).

I just felt it important to note, that simd is something that can make parsing much faster that doesn't have a solution in roc, and resumable parsing is also an unsolved problem that could make parsing much faster, but I'm not sure is possible efficiently in roc right now.
:)

Last updated: Jul 23 2026 at 13:15 UTC