Stream: ideas

Topic: Bidirectional parsing/prettyprinting


view this post on Zulip Qqwy / Marten (Oct 03 2022 at 15:30):

I recently fell into a rabbit hole of how to do bidirectional parsing.

The core idea is that you define one 'Codec' (AKA 'Grammar' or 'Schema') which is the bidirectional translation to-and-from your datastructure and the external format.
That is, rather than have a separate definition of e.g. "how to go from JSON to Post" and "how to go from Post to JSON", you only specify the structure once.

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 15:33):

The basic datastructure to be able to do this is known as a BiMap:

BiMap e a b : { forward : (a -> Result b e), backward : (b -> Result a e)

So you can transform data between an a and a b, but either transformation might fail.

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 15:36):

Example: If you'd want to parse a float from a JSON AST, you'd have for instance a BiMap e JSON F64, where forward would complain if the JSON does not contain a float, and backward would complain if the floating point value is NaN or Infinity which is not allowed inside JSON.

And then there is a higher-abstraction datatype commonly called 'Codec', 'Schema' or 'Grammar' in which you can combine multiple of these bimaps to use it with your custom datatypes. (i.e. match key-values in the AST with fields in your datatypes), usually written as something along the lines of

Codec r w a : { read : r a, write : a -> w a}

(invalid Roc as-is because it uses higher kindred datatypes. But concrete Codec types can be made by using concrete examples for 'r' and 'w', datatypes containing the appropriate environment to parse an a from resp. print an a into.)

view this post on Zulip Ayaz Hafiz (Oct 03 2022 at 15:41):

not sure if this is part of what you read, but "A Monadic Framework for Bidirectional Programming" is the title I know best wrt to this: https://poisson.chat/mfbp/poster.pdf

view this post on Zulip Ayaz Hafiz (Oct 03 2022 at 15:42):

I could have sworn there was a poster at sigplan about this as well last year but I can't find it now

view this post on Zulip Ayaz Hafiz (Oct 03 2022 at 15:42):

swift-parsing seems to implement this

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 15:43):

Another good example is tomland which is a bidirectional TOML parser-prettyprinter library written in Haskell. (link goes to the blog post explaining its internals).

view this post on Zulip lue (Oct 03 2022 at 16:10):

I'm writing one called elm-morph if you want a reference (WIP).
It's type is

BiMap e a b : { forward : (a -> Result b e), backward : (b -> a) }

instead of

BiMap e a b : { forward : (a -> Result b e), backward : (b -> Result a e) }

which is ofc less powerful but still a good fit for most cases

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:10):

Very cool!

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:17):

lue said:

I'm writing one called elm-morph if you want a reference (WIP).
It's type is

BiMap e a b : { forward : (a -> Result b e), backward : (b -> a) }

instead of

BiMap e a b : { forward : (a -> Result b e), backward : (b -> Result a e) }

:+1:
As long as your concrete datatype is fully representable in the serialized format then the backward function indeed does not need to return a Result.

Do you handle examples like Infinity/NaN for floats that cannot be written down in JSON some other way?

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:18):

(Another example: Integers larger than 2^53 are also not portably representable in JSON)

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:19):

Your library looks great by the way. There already is so much well-documented code there 🤯

view this post on Zulip lue (Oct 03 2022 at 16:32):

I let elm/json handle all the ugliness ;). Which means even I don't know what elm encodes NaN etc. to

That's pretty meh, I agree

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:33):

The cherry on top of these bidirectional definitions by the way, is that you can also read the codec structures directly and create other schema formats from them (e.g. a "Json Schema", a TreeSitter schema, etc.) that could be used when describing an API or to use in external tools (like your IDE) so they can warn for common mistakes immediately. (Like: "This field name is not known" or "the hostname field in your configuration file is only valid if it contains a string that can be parsed as an URL")

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:34):

lue said:

I let elm/json handle all the ugliness ;). Which means even I don't know what elm encodes NaN etc. to

That's pretty meh, I agree

Now you've made me curious. Time to see how Elm/json does it under the hood :happy:

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:35):

Ah. They are silently turned into null :sad: https://github.com/elm/json/blob/0206c00884af953f2cba8823fee111ee71a0330e/src/Json/Encode.elm#L106.
To be honest, I consider this a 🦶 :gun:

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 16:46):

At least it matches JSON.stringify in plain JS, so I guess it makes sense :shrug:

view this post on Zulip lue (Oct 03 2022 at 16:52):

Thanks for investigating :smile:

I agree it's pretty bad behavior.

view this post on Zulip Qqwy / Marten (Oct 03 2022 at 17:45):

By the way: This same pattern can be used to talk about how a datastructure might map to a record in a relational database table!


Last updated: Jun 16 2026 at 16:19 UTC