Stream: beginners

Topic: Roc "multipart/form-data" parser


view this post on Zulip Adrian (Apr 09 2024 at 10:06):

I'm having trouble, parsing multipart/form-data, since the builtin List lacks a splitSequence function.

view this post on Zulip Adrian (Apr 09 2024 at 10:07):

Is there a known parser for this format? Should I just bite the bullet and implement it with roc-parser, or am I misusing roc's Lists

view this post on Zulip Adrian (Apr 09 2024 at 10:17):

Should I write the zig primitive for the standard library?

view this post on Zulip Luke Boswell (Apr 09 2024 at 10:56):

I don't know if anyone has looked at parsing that before.

view this post on Zulip Luke Boswell (Apr 09 2024 at 10:58):

https://roc-lang.github.io/basic-webserver/Http/#parseFormUrlEncoded

view this post on Zulip Luke Boswell (Apr 09 2024 at 10:59):

I added a parser for FormUrlEncoded data, I'm not sure if that helps.

view this post on Zulip Luke Boswell (Apr 09 2024 at 11:02):

I think it would be great to have a parser written in Roc :grinning:

view this post on Zulip Adrian (Apr 09 2024 at 11:23):

Luke Boswell said:

https://roc-lang.github.io/basic-webserver/Http/#parseFormUrlEncoded

This is great! Sadly it doesn't solve my issue, since i'm parsing the format:

-----------------------------306137057221634204471155786395
Content-Disposition: form-data; name="text"

abc
-----------------------------306137057221634204471155786395
Content-Disposition: form-data; name="file"; filename=""
Content-Type: application/octet-stream


-----------------------------306137057221634204471155786395--

view this post on Zulip Adrian (Apr 09 2024 at 11:25):

Luke Boswell said:

I think it would be great to have a parser written in Roc :grinning:

I'm a bit worried about the performance :face_with_spiral_eyes:.
I'll try the roc only approach and if it doesn't work out I'll probably send a pr to either roc-builtins or roc-webserver.

view this post on Zulip Adrian (Apr 09 2024 at 12:18):

Here's my attempt, I don't think it's too performant

listSplit : List a, List a -> List (List a) where a implements Eq
listSplit = \inp, sep ->
    if List.len sep == 0 then
        [inp]
    else
        inp
        |> List.walk
            { out: [], buf: [], consume: [] }
            \{out, buf, consume}, elem ->
                if elem == List.get sep (List.len consume) |> unwrap then
                    { consume: consume |> List.append elem, out, buf} |> tryToConsume sep
                else
                    {consume:[], out, buf: buf |> List.concat consume|> List.append elem}
        |> \{out,buf,consume} -> out|>List.append (List.concat buf consume)

tryToConsume = \{out,buf,consume},sep->
            if consume == sep then
                {out: out|> List.append buf, buf:[],consume:[]}
            else
                {out,buf,consume}

expect listSplit ("a/b/c/d"|>Str.toUtf8) ['/'] == ["a","b","c","d"]|>List.map Str.toUtf8
expect listSplit ("a/b/c/d"|>Str.toUtf8) ['/','/'] == ["a/b/c/d"]|>List.map Str.toUtf8
expect listSplit ("/a//b/c/d/"|>Str.toUtf8) ['/'] == ["","a","","b","c","d",""]|>List.map Str.toUtf8
expect listSplit ("//a////b//c//d/"|>Str.toUtf8) ['/','/'] == ["","a","","b","c","d/"]|>List.map Str.toUtf8

unwrap = \a->
       when a is
            Ok e -> e
            Err e -> crash "unwraped Err $(Inspect.toStr e)"

view this post on Zulip Adrian (Apr 09 2024 at 12:19):

If any roc wizards would look at this, that would be just fine by me

view this post on Zulip Brendan Hansknecht (Apr 09 2024 at 14:53):

So this function is just Str.split, but on a list of bytes?

view this post on Zulip Adrian (Apr 09 2024 at 15:15):

Jep, that's why I was asking if I should implement it the same way (in zig) or just in roc

view this post on Zulip Adrian (Apr 09 2024 at 15:16):

And if the std lib would accept it

view this post on Zulip Brendan Hansknecht (Apr 09 2024 at 15:37):

It should be fine to implement in Roc with good performance. All of the sub pieces needed for list are there.

I know that separately, some changes are planned for List.split that never got implemented. We wanted to make the signatures cleaner and more consistent: https://github.com/roc-lang/roc/issues/6164

That said, I think those changes would be around changing the current List.split to List.splitAt. Then implementing a new split function that is more simple to the string version, but still for single element splits. List.splitBy: List a, a -> List (List a). So no explicit plans for a 'List.splitSequences, but would still be a good general discussion to have. Maybe it should fully match Str.split` for flexibility.

view this post on Zulip Brendan Hansknecht (Apr 09 2024 at 15:40):

Anyway, for a performant version in Roc, it will be very important to use List.take and List.drop or List.sublist to extract the splits. I should be able to write something up later today. May also want to use a smart search algorithm like https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm, but that would be more complex to get write and use in roc. Linear scan is probably a fine start.

view this post on Zulip Adrian (Apr 10 2024 at 14:59):

Should I mark as complete until the List.split rework?

view this post on Zulip Brendan Hansknecht (Apr 10 2024 at 15:39):

I don't think anyone is specifically working on the List.split rework. Someone claimed it in december, but it never progressed. Overall, it is open to ideas an contributors.

As for a performant implementation in pure roc, this should be a solid base:

listSplitSeq = \list, seq ->
    if List.isEmpty seq then
        [list]
    else
        listSplitSeqHelper [] list seq 0 0 0

listSplitSeqHelper : List (List a), List a, List a, U64, U64, U64 -> List (List a) where a implements Eq & Inspect
listSplitSeqHelper = \out, list, seq, lastMatch, i, j ->
    when (List.get list (i + j), List.get seq j) is
        (Ok x, Ok y) if x == y ->
            # Still matching correct
            listSplitSeqHelper out list seq lastMatch i (j + 1)

        (Ok _, Ok _) ->
            # Failure to match
            listSplitSeqHelper out list seq lastMatch (i + 1) 0

        # (Ok _, Err OutOfBounds) ->
        (Ok _, Err _) ->
            # Managed to match the whole subseq
            subList = List.sublist list {start: lastMatch, len: i - lastMatch}
            nextOut = List.append out subList

            listSplitSeqHelper nextOut list seq (i + j) (i + j) 0

        (Err OutOfBounds, Err OutOfBounds) ->
            # Match all the way to the end, append last match and extra empty list
            subList = List.sublist list {start: lastMatch, len: i - lastMatch}
            nextOut = List.append out subList

            List.append nextOut []

        (Err OutOfBounds, _) ->
            # No more possible matches, append remaining
            subList = List.sublist list {start: lastMatch, len: i - lastMatch + 1}
            List.append out subList

It is just a linear scan, which is best for short sequences to match. To make it more optimal would be to add the fancy algorithm if the key is larger or the input list is really large. This might be worth making a builtin only for all of the extra bounds checks we could remove (especially if we add in the fancier algorithm).

view this post on Zulip Adrian (Apr 10 2024 at 16:30):

This is great! One minor question is it possible for anything to match (Ok _, Ok _) and not match (Ok x, Ok y). To me this seems like the same case?

view this post on Zulip Brendan Hansknecht (Apr 10 2024 at 17:25):

The (Ok x, Ok y) has a guard clause. It only matches if x == y

view this post on Zulip Brendan Hansknecht (Apr 10 2024 at 17:26):

Oh, one extra note List (List a) and List Str may have some general perf issues currently. I am actively working on a fix, but we have a refcounting issue with nested container types.

view this post on Zulip Adrian (Apr 10 2024 at 17:43):

Perhaps the quicket fix is to just send a patch to basic-webserver platform and implement the parser directly. Is there any other way to run native code other than platoform and builtins?

view this post on Zulip Brendan Hansknecht (Apr 10 2024 at 18:48):

Is there any other way to run native code other than platoform and builtins?

nope (well, depends on what you mean by "native code". Both builtins and regular roc code compile the same way. Builtins just have minor abilities to cut corners. Platform is the only place where side effects can happen at all)

view this post on Zulip Brendan Hansknecht (Apr 10 2024 at 19:10):

Perhaps the quicket fix is to just send a patch to basic-webserver platform and implement the parser directly

Eh, I think it would be best to keep it in roc or the standard library. If this does lead to measurable perf issues, that just more pressure to improve roc.

view this post on Zulip Adrian (Apr 10 2024 at 20:28):

Seems like the most sane action: https://github.com/roc-lang/basic-webserver/blob/3f1028a742180da649f73361b8f389b98e10ca7a/platform/Http.roc#L147
In the future sending files over http might make this a bottleneck

view this post on Zulip Brendan Hansknecht (Apr 11 2024 at 01:02):

I don't follow. That is just normal roc code.

view this post on Zulip Adrian (Apr 11 2024 at 07:17):

Brendan Hansknecht said:

I don't follow. That is just normal roc code.

Oh, sorry; It was meant as an example of the prevous parser also being just native roc.

view this post on Zulip Adrian (Apr 11 2024 at 07:17):

:sweat_smile:


Last updated: Jul 05 2025 at 12:14 UTC