I'm having trouble, parsing multipart/form-data
, since the builtin List
lacks a splitSequence
function.
Is there a known parser for this format? Should I just bite the bullet and implement it with roc-parser, or am I misusing roc's Lists
Should I write the zig primitive for the standard library?
I don't know if anyone has looked at parsing that before.
https://roc-lang.github.io/basic-webserver/Http/#parseFormUrlEncoded
I added a parser for FormUrlEncoded data, I'm not sure if that helps.
I think it would be great to have a parser written in Roc :grinning:
Luke Boswell said:
https://roc-lang.github.io/basic-webserver/Http/#parseFormUrlEncoded
This is great! Sadly it doesn't solve my issue, since i'm parsing the format:
-----------------------------306137057221634204471155786395
Content-Disposition: form-data; name="text"
abc
-----------------------------306137057221634204471155786395
Content-Disposition: form-data; name="file"; filename=""
Content-Type: application/octet-stream
-----------------------------306137057221634204471155786395--
Luke Boswell said:
I think it would be great to have a parser written in Roc :grinning:
I'm a bit worried about the performance :face_with_spiral_eyes:.
I'll try the roc only approach and if it doesn't work out I'll probably send a pr to either roc-builtins or roc-webserver.
Here's my attempt, I don't think it's too performant
listSplit : List a, List a -> List (List a) where a implements Eq
listSplit = \inp, sep ->
if List.len sep == 0 then
[inp]
else
inp
|> List.walk
{ out: [], buf: [], consume: [] }
\{out, buf, consume}, elem ->
if elem == List.get sep (List.len consume) |> unwrap then
{ consume: consume |> List.append elem, out, buf} |> tryToConsume sep
else
{consume:[], out, buf: buf |> List.concat consume|> List.append elem}
|> \{out,buf,consume} -> out|>List.append (List.concat buf consume)
tryToConsume = \{out,buf,consume},sep->
if consume == sep then
{out: out|> List.append buf, buf:[],consume:[]}
else
{out,buf,consume}
expect listSplit ("a/b/c/d"|>Str.toUtf8) ['/'] == ["a","b","c","d"]|>List.map Str.toUtf8
expect listSplit ("a/b/c/d"|>Str.toUtf8) ['/','/'] == ["a/b/c/d"]|>List.map Str.toUtf8
expect listSplit ("/a//b/c/d/"|>Str.toUtf8) ['/'] == ["","a","","b","c","d",""]|>List.map Str.toUtf8
expect listSplit ("//a////b//c//d/"|>Str.toUtf8) ['/','/'] == ["","a","","b","c","d/"]|>List.map Str.toUtf8
unwrap = \a->
when a is
Ok e -> e
Err e -> crash "unwraped Err $(Inspect.toStr e)"
If any roc wizards would look at this, that would be just fine by me
So this function is just Str.split, but on a list of bytes?
Jep, that's why I was asking if I should implement it the same way (in zig) or just in roc
And if the std lib would accept it
It should be fine to implement in Roc with good performance. All of the sub pieces needed for list are there.
I know that separately, some changes are planned for List.split
that never got implemented. We wanted to make the signatures cleaner and more consistent: https://github.com/roc-lang/roc/issues/6164
That said, I think those changes would be around changing the current List.split
to List.splitAt
. Then implementing a new split function that is more simple to the string version, but still for single element splits. List.splitBy: List a, a -> List (List a)
. So no explicit plans for a 'List.splitSequences, but would still be a good general discussion to have. Maybe it should fully match
Str.split` for flexibility.
Anyway, for a performant version in Roc, it will be very important to use List.take
and List.drop
or List.sublist
to extract the splits. I should be able to write something up later today. May also want to use a smart search algorithm like https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm, but that would be more complex to get write and use in roc. Linear scan is probably a fine start.
Should I mark as complete until the List.split rework?
I don't think anyone is specifically working on the List.split
rework. Someone claimed it in december, but it never progressed. Overall, it is open to ideas an contributors.
As for a performant implementation in pure roc, this should be a solid base:
listSplitSeq = \list, seq ->
if List.isEmpty seq then
[list]
else
listSplitSeqHelper [] list seq 0 0 0
listSplitSeqHelper : List (List a), List a, List a, U64, U64, U64 -> List (List a) where a implements Eq & Inspect
listSplitSeqHelper = \out, list, seq, lastMatch, i, j ->
when (List.get list (i + j), List.get seq j) is
(Ok x, Ok y) if x == y ->
# Still matching correct
listSplitSeqHelper out list seq lastMatch i (j + 1)
(Ok _, Ok _) ->
# Failure to match
listSplitSeqHelper out list seq lastMatch (i + 1) 0
# (Ok _, Err OutOfBounds) ->
(Ok _, Err _) ->
# Managed to match the whole subseq
subList = List.sublist list {start: lastMatch, len: i - lastMatch}
nextOut = List.append out subList
listSplitSeqHelper nextOut list seq (i + j) (i + j) 0
(Err OutOfBounds, Err OutOfBounds) ->
# Match all the way to the end, append last match and extra empty list
subList = List.sublist list {start: lastMatch, len: i - lastMatch}
nextOut = List.append out subList
List.append nextOut []
(Err OutOfBounds, _) ->
# No more possible matches, append remaining
subList = List.sublist list {start: lastMatch, len: i - lastMatch + 1}
List.append out subList
It is just a linear scan, which is best for short sequences to match. To make it more optimal would be to add the fancy algorithm if the key is larger or the input list is really large. This might be worth making a builtin only for all of the extra bounds checks we could remove (especially if we add in the fancier algorithm).
This is great! One minor question is it possible for anything to match (Ok _, Ok _)
and not match (Ok x, Ok y)
. To me this seems like the same case?
The (Ok x, Ok y)
has a guard clause. It only matches if x == y
Oh, one extra note List (List a)
and List Str
may have some general perf issues currently. I am actively working on a fix, but we have a refcounting issue with nested container types.
Perhaps the quicket fix is to just send a patch to basic-webserver platform and implement the parser directly. Is there any other way to run native code other than platoform and builtins?
Is there any other way to run native code other than platoform and builtins?
nope (well, depends on what you mean by "native code". Both builtins and regular roc code compile the same way. Builtins just have minor abilities to cut corners. Platform is the only place where side effects can happen at all)
Perhaps the quicket fix is to just send a patch to basic-webserver platform and implement the parser directly
Eh, I think it would be best to keep it in roc or the standard library. If this does lead to measurable perf issues, that just more pressure to improve roc.
Seems like the most sane action: https://github.com/roc-lang/basic-webserver/blob/3f1028a742180da649f73361b8f389b98e10ca7a/platform/Http.roc#L147
In the future sending files over http might make this a bottleneck
I don't follow. That is just normal roc code.
Brendan Hansknecht said:
I don't follow. That is just normal roc code.
Oh, sorry; It was meant as an example of the prevous parser also being just native roc.
:sweat_smile:
Last updated: Jul 05 2025 at 12:14 UTC