This is an attempt to describe a problem I think I have with a design for a fast JSON decoder. I am a fair way off being blocked on this, as I haven't implemented the stage-1 pre-processing code and I don't know exactly what that will look like, however I'm trying to scope out how that stage might integrate with the decode ability as there are things from SIMDjson etc that don't translate directly into the Roc context.
Imagine I want to decode JSON into the following Roc type
Person : {
name : Str,
contacts : {
email : Str,
phone : Str,
}
}
So the user passes in a List U8
bytes into Decode.fromBytes
etc, and Roc will call the decodeRecord
function in the JSON implementation which is currently implemented using Decode.custom
something like the following;
decodeRecord = \initialState, stepField, finalizer -> Decode.custom \bytes, @Json { ... decoder state } ->
# Recursively build up record from object field:value pairs
decodeFields = \recordState, bytesBeforeField -> ...
When this implementation gets to the contacts
field, it will retrieve a decoder and call Decode.decodeWith
passing in the sublist of List U8
bytes for the contacts
field. In this case this will be decodeRecord
because this field is also an object.
The idea I currently have for implementing a fast JSON decoder is to have a preprocessing step to identify the document structure and then use that information to slice into the original input bytes.
One problem with this idea is that Decode.custom
is provided a List U8
bytes and this is the only information we have to work with. So if we preprocessed the input in an earlier stage (function call) we don't have that information available.
One idea might be to preprocess the JSON document and store the original input bytes and field indexes in the decoder state @Json {inputBytes : List U8, fieldSlices : ... }
, then maybe have some special sequence of bytes that flags to use the preprocessed information to get the bytes we want to process and proceed with decoding. Or maybe this special sequence includes the information required to slice into the original input bytes.
I'm not sure if this is a good problem description... I am likely missing something obvious and feel like we can probably do what we need with the current implementation.
It may also not be that important to solve this, I probably should use benchmarks to test some ideas. If the preprocess stage is fast enough it may not be that bad to run it each time we decode a new object/record and still use the current recursive descent strategy.
What are the limitations of storing the offset information in the decoder state? That is where my head was at. I think I do not totally follow what the downsides of that approach are.
I've had a bit of a breakthrough and made some progress putting things into the decoder state. :octopus:
Unfortunately I've hit a compiler bug with roc check
:cry:
% roc check package/Core.roc
An internal compiler expectation was broken.
This is definitely a compiler bug.
Please file an issue here: https://github.com/roc-lang/roc/issues/new/choose
thread '<unnamed>' panicked at 'ambient lambda set function import is not a function, found: Error', crates/compiler/solve/src/module.rs:182:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I've uploaded the relevant code to this gist, and you can see on line 2070 where i have isolated the issue to.
I've tried a bunch of different things, re-structuring the code with functions and type annotations etc, but I can't seem to get it to type check.
@Ayaz Hafiz could this be the first application of Checkmate? :smiley:
not the first sadly, but definitely a perfect candidate for checkmate! :chess:
Do you have a minimal reproducer Luke?
Sorry for derailing, but what's checkmate?
A tool that Ayaz made to debug the type solver (checker + inference + specialization engine), you can find it here.
I think Ayaz also has it available live (hosted) somewhere.
I can try minimising, it's the Decode.decodeWith part that cuases it, so I think I have to keep all the other unrelated decode ability functions around.
I tried implementing the function using List.walkUntil
instead of recursion but still get the same issue.
decodeRecordPreProcessed = \stepField, finalizer, initialState, @Json ds ->
when ds.structure is
JsonObject fields -> decodeRecordPreProcessedHelp stepField finalizer (@Json ds) initialState fields
_ -> crash "unreachable, pre-processed string index"
# Check each field/value pair of the object and decode if it is required
decodeRecordPreProcessedHelp = \stepField, finalizer, @Json ds, initialState, recordFieldValues ->
help = \recordState, recordFieldValue ->
result =
# Decode the field name
fieldNameStr <- decodeObjectFieldName recordFieldValue.field (@Json ds) |> Result.map
# Retrieve value decoder for the current field
when stepField recordState fieldNameStr is
# Skip the field and value, leave record state unchanged
Skip ->
recordState
# Decode the value using the decoder from the recordState
Keep valueDecoder ->
# UNCOMMENT TO 'STOP COMPILER BUG'
# { result: Err TooShort, rest: [] }
# COMMENING OUT BELOW TO 'STOP COMPILER BUG'
Decode.decodeWith [] valueDecoder (objectFieldValueDecoder (@Json {ds & structure: recordFieldValue.value}))
when result is
Err _ ->
# Return early, failed to decode the field
Break recordState
Ok updatedRecordState ->
# Decode the next field, passing updated recordState
Continue updatedRecordState
finalRecordState = List.walkUntil recordFieldValues initialState help
# Build final record
when finalizer finalRecordState is
Ok record -> { result: Ok record, rest: [] }
Err _ -> { result: Err TooShort, rest: [] }
Thanks for the update Luke. I’ll take a look tomorrow morning (central US time), but i suspect a minimal reproducer will still be necessary
Last updated: Jul 05 2025 at 12:14 UTC