Stream: ideas

Topic: A parser library?


view this post on Zulip jan kili (Sep 12 2022 at 16:43):

In a recent meeting, @Richard Feldman suggested that Roc might never provide first-party support for regular expressions. My JS-soaked brain was shocked at this, but now I see that FPLs like Elm (& Haskell?) prefer parser libraries like elm-parser. Do we expect to want a roc-parser port of elm-parser? If so, should that be first-party (Roc standard library / builtin) or third-party (one or more decentralized libraries, perhaps JanCVanB/roc-parser)?

view this post on Zulip jan kili (Sep 12 2022 at 16:43):

To be clear, parsers sound lovely and I'm intrigued :)

view this post on Zulip Ayaz Hafiz (Sep 12 2022 at 16:49):

Yeah, libraries for building parsers via parser combinators are very common in FP. That's the approach elm-parser also seems to use, but I'm not terribly familiar with it. I think we'll want a community library around it, but I don't think a general parser library is something that should be provided by the standard library.

view this post on Zulip Brian Carroll (Sep 12 2022 at 16:51):

Oh yes there will be a parser library at some point for sure! For one, thing there's no way to stop anyone writing one! It's all pure functions, no builtin support needed, so no reason for it to be in the standard lib.

view this post on Zulip Ayaz Hafiz (Sep 12 2022 at 16:52):

There are many approaches to parsing, for example for parsing PLs lex/yacc and its derivatives are very popular (in fact, OCaml's parser is entirely generated by a yacc-like tool). They all have their tradeoffs and performance characteristics can vary drastically depending on what you're parsing, so it's good to have a variety of different options offered by the community.

view this post on Zulip Brian Carroll (Sep 12 2022 at 16:53):

In Roc it's interesting because you can go with parser combinators in the style of elm-parser or Haskell attoparsec. But you can probably also do something that compiles to a more imperative approach.

view this post on Zulip Brian Carroll (Sep 12 2022 at 16:55):

It should be really efficient to "walk" over the bytes and accumulate some state along the way. But I don't know how nice it would be to compose sub-parsers together. Probably you end up with something like a "pull parser", which are meant to be fast.

view this post on Zulip J.Teeuwissen (Sep 12 2022 at 17:05):

What about included read and show functions (like in Haskell, which also sees extensive parser combinators usage). Of which the show function turns a value into a string, which can be extremely helpful while debugging or quickly storing state.

view this post on Zulip Brian Carroll (Sep 12 2022 at 17:09):

Yeah we have some things like that called Encoding and Decoding "abilities".
https://github.com/roc-lang/roc/blob/main/crates/compiler/builtins/roc/Encode.roc
https://github.com/roc-lang/roc/blob/main/crates/compiler/builtins/roc/Decode.roc

view this post on Zulip Brian Carroll (Sep 12 2022 at 17:13):

Mainly focused on things like JSON or CSV for now, though debug printing has been discussed at some point.

view this post on Zulip J.Teeuwissen (Sep 12 2022 at 17:16):

Would e.g. JSON require an implementation for a specific record or would generics (not the <T> kind but these) be used for such tasks?

view this post on Zulip Ayaz Hafiz (Sep 12 2022 at 17:21):

It will require an implementation, Roc does not provide a mechanism for runtime type information like Haskell's Generic does. However, the compiler will derive an implementation for structural types when they are used for encoding/decoding. For example, the following works:

app "test" imports [Encode, Decode, Json] provides [main] to "./platform"

main =
    when Str.toUtf8 "{\"outer\":{\"inner\":\"a\"},\"other\":{\"one\":\"b\",\"two\":10}}" |> Decode.fromBytes Json.fromUtf8 is
        Ok {outer: {inner: "a"}, other: {one: "b", two: 10u8}} -> "ab10"
        _ -> "something went wrong"

Internally, the compiler creates an implementation for decoding the record being matched in the when branch, which is the implementation used at runtime

view this post on Zulip Qqwy / Marten (Sep 12 2022 at 18:38):

JanCVanB said:

In a recent meeting, Richard Feldman suggested that Roc might never provide first-party support for regular expressions. My JS-soaked brain was shocked at this, but now I see that FPLs like Elm (& Haskell?) prefer parser libraries like elm-parser. Do we expect to want a roc-parser port of elm-parser? If so, should that be first-party (Roc standard library / builtin) or third-party (one or more decentralized libraries, perhaps JanCVanB/roc-parser)?

(opt-in) arbitrary look-ahead (which elm-parser calls 'backtracking') and providing context is not something unique to elm-parser but something seen in most Parsec descendants. (c.f. the try and label functions in the OG Parsec.)

The example parser for CSV that I started to write is very much in this style as well and could easily support it :blush:.

view this post on Zulip Qqwy / Marten (Sep 12 2022 at 18:41):

A big disadvantage of PCRE-style regular expressions is that they do not compose. You have a very large chunk of 'special syntax' which cannot be split up into smaller functions. Another is that, except when your language has special support to either evaluate them at compile-time or do dynamic compilation at runtime, such a regex string has to be turned into a parser automaton over and over again each time it is used.

view this post on Zulip Qqwy / Marten (Sep 12 2022 at 18:41):

Of course, there are advantages to supporting PCRE as well. The main I can think of is familiarity.


Last updated: Jun 16 2026 at 16:19 UTC