Writing Parsers with Roc · beginners

I've started learning how to write parsers using Roc from the examples for my AoC puzzles and have found it quite enjoyable.

I began by trying to parse a single character for example the EOL character, or a comma character. e.g.

eolParser : Parser (List U8) {}
eolParser =
    buildPrimitiveParser (\input ->
        first = List.first input
        if first == Ok '\n' then
            Ok { val : {}, input : List.dropFirst input }
        else
            Err (ParsingFailure "Not a comma")
    )

I then started playing around with different combinations of const apply and map to build up more complex parsers. found it was pretty easy to work my way through and rapidly iterate on ideas. I found it easy to write smaller unit tests (examples below), and use roc check and roc test to build up more complex functionality.

expect
    input = Str.toUtf8 "\n"
    parse eolParser input List.isEmpty == Ok {}

expect
    input = Str.toUtf8 ",,,20,03\n"
    parser =
        const (\_ -> \a -> \_ -> \b -> \_ -> a*b)
        |> apply (oneOrMore commaParser)
        |> apply numberParser
        |> apply commaParser
        |> apply numberParser
        |> apply eolParser
    parse parser input List.isEmpty == Ok 60

Still a long way to go, but I thought I would share my experience so far for anyone who may be interested.

Luke Boswell (Nov 30 2022 at 09:32):

Also, I'm pretty keen to have functionality like check and test in the roc editor, as I think that would really speed up and improve the experience. I imagine we can get it to the point were after we write a test, it can run it in the background and indicate if it passes or fails in the UI. :grinning:

Artur Swiderski (Nov 30 2022 at 20:41):

Luke Boswell (Nov 30 2022 at 20:56):

Artur Swiderski (Nov 30 2022 at 23:02):

Luke Boswell (Nov 30 2022 at 23:21):

Ayaz Hafiz (Dec 01 2022 at 03:03):

Chris Duncan (Dec 03 2022 at 22:13):

Heyo. :wave:🏻 I participated in the previous Advent of Code and I'm wanting to participate for this month's. This Parser library looks great. Is it available as a Module that I can import into my project? The first in every AoC puzzle is parsing the input, so this library would be great to use.

Brian Carroll (Dec 03 2022 at 23:08):

We're still working on the module import system! For now I think you might have to build your app in the parser platform directory

Brendan Hansknecht (Dec 03 2022 at 23:09):

I think the parser is a standalone interface, so you should be able to copy the file to your app directory and the import it directly.

Luke Boswell (Dec 04 2022 at 00:46):

Just copy and paste the Parser.roc file next to your app and then import the file directly. Here is an example of how I am doing it. Works really well. In future I'm sure there will probably be different parser packages which you can import from a URL, but for now I've been doing this. :grinning:

FWIW I'm finding AoC really helpful to level up my parser knowledge. It's one thing to read about them; but I have found they are so much simpler to create and use than I imagined. They're almost too simple. I think Roc makes it so easy, that people will use a parser for everything.

Andy Kluger (Dec 04 2022 at 23:57):

Luke Boswell (Dec 05 2022 at 01:09):

Andy Kluger (Dec 05 2022 at 01:30):

Luke Boswell (Dec 06 2022 at 01:57):

For anyone interested I have figured out how to write a skip function for parsing. I renamed apply to keep and re-wrote it without the backpassing so I could follow along with the logic a bit more easily. I've copied the functions I am using below.

keep : Parser input (a -> b), Parser input a -> Parser input b
keep = \funParser, valParser ->
    buildPrimitiveParser \input ->
        when parsePartial funParser input is
            Err msg -> Err msg
            Ok { val: funVal, input: rest } ->
                when parsePartial valParser rest is
                    Err msg2 -> Err msg2
                    Ok { val: val, input: rest2 } ->
                        Ok { val: funVal val, input: rest2 }

skip : Parser input a, Parser input * -> Parser input a
skip = \funParser, skipParser ->
    buildPrimitiveParser \input ->
        when parsePartial funParser input is
            Err msg -> Err msg
            Ok { val: funVal, input: rest } ->
                when parsePartial skipParser rest is
                    Err msg2 -> Err msg2
                    Ok { val: _, input: rest2 } -> Ok { val: funVal, input: rest2 }

Using these I can re-write my parser combinators to the following which removes the unecessary \_ -> curried functions.

assignmentPairParser : Parser (List U8) AssignmentPair
assignmentPairParser =
    const
        (\a -> \b -> \c -> \d -> {
            startElfA: a,
            endElfA: b,
            startElfB: c,
            endElfB: d,
        })
    |> skip (many (codepoint '\n'))
    |> keep numberParser
    |> skip (codepoint '-')
    |> keep numberParser
    |> skip (codepoint ',')
    |> keep numberParser
    |> skip (codepoint '-')
    |> keep numberParser

Luke Boswell (Dec 11 2022 at 08:20):

An update on my progress trying to implement a markdown parser in pure Roc. I'm aiming for the Cmark spec, though I am just focussed on getting the bare bones together, and prioritising the basic elements like paragraphs and headings. I would love to get this functional enough for the Roc tutorial and other website content, though there is currently a few things blocking progress.

I've updated the parser based on what I've learn't so far. I would appreciate any feedback on this. I'm sure there is a lot of room for improvement. I know the parsers are pretty inefficient with a lot of allocations, based on discussion with brendan from his AoC analysis, but I'm hopeful they will play nicely with seamless slices when they land.

I've added the following combinators into the Core.roc module specifically for the use case of eating through a line that is whitespace. These feel pretty resource heavy, is there a better way to do this? Might this be something that eventually Roc could automagically parellise under the hood?

# Parser/Core.roc
eatWhile : Parser (List a) (List a) -> Parser (List a) (List a)
eatWhileNot : Parser (List a) (List a) -> Parser (List a) (List a)

Running roc test on spec-tests.roc currently crashes with the below. This is due to the following test. I've not figured what is causing this. Note there is another test failing which is due to Issue #4732 related to the ordering of list pattern matching in a when statement.

thread 'main' panicked at 'internal error: entered unreachable code: Something had a Struct layout, but instead of a Record type, it had: Structure(TagUnion(UnionLabels { length: 5, labels_start: 191, values_start: 1008, _marker: PhantomData }, 3678))', /Users/luke/Documents/GitHub/roc/crates/repl_eval/src/eval.rs:899:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

expect
    input = Str.toUtf8 "abc\n"
    got = parsePartial paragraphParser input
    got == Ok {val : Paragraph input, input : ['\n']}

I've been experimenting with different options trying to find an ergonomic and hopefully efficient parser pattern. I've currently settled on the following which feels nice, and I think should handle Utf8 bytes nicely.

lineEnding : Parser (List U8) (List U8)
lineEnding =
    input <- buildPrimitiveParser

    when input is
        [10, ..] -> Ok {val : utf8 LF, input : List.drop input 1}
        [13,10, ..] -> Ok {val : utf8 CRLF, input : List.drop input 2}
        ...

However, everything explodes when you run roc dev spec-tests.roc at the moment. It gives the following error. I need to investigate this further, but I think it is related to the paragraphParser use of lineEnding with eatWhileNot.

thread 'main' panicked at 'There was no entry for `23.IdentId(34)` in scope Scope { symbols: {`23.IdentId(35)`: (Buil
----
... another 200 lines
----
No predecessors!\n  %joinpointarg = phi { [0 x i64], [48 x i8], i8, [7 x i8] }* , !dbg !471\n", llvm_type: "label" }, [PhiValue { phi_value: Value { name: "joinpointarg", address: 0x600003b99598, is_const: false, is_null: false, is_undef: false, llvm_value: "  %joinpointarg = phi { [0 x i64], [48 x i8], i8, [7 x i8] }* , !dbg !471", llvm_type: "{ [0 x i64], [48 x i8], i8, [7 x i8] }*" } }])} }', crates/compiler/gen_llvm/src/llvm/build.rs:2746:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Stream: beginners

Topic: Writing Parsers with Roc

Luke Boswell (Nov 30 2022 at 09:28):