spin off of #ideas>Should we have Str.walkUtf8*?
With lists, it is quite common to use pattern matching with recursion to write parsers. Ex:
fn = \list, sum ->
when list is
['0', .. as rest] -> fn rest sum
['1', .. as rest] -> fn rest (sum + 1)
...
['9', .. as rest] -> fn rest (sum + 9)
_ -> sum
This is currently not possible to do with pattern matching on strings.
That said, we have the primitives to enable a user to write this manually via Str.startsWith and Str.splitFirst or Str.replaceFirst.
Why not enable direct pattern matching on Str with rest?
I am not sure the right syntax here, but it should enable matching the start, end, and getting the rest.
Here is a port of the same code in a random syntax:
fn = \list, sum ->
when list is
"0" with .. as rest -> fn rest sum
"1" with .. as rest -> fn rest (sum + 1)
...
"9" with .. as rest -> fn rest (sum + 9)
_ -> sum
I really don't know a good syntax for this, but I think it would be a really nice add to the language if we can find a syntax.
Would enable recursive pattern matching parsing that supports arbitrary unicode and small strings without an allocation.
Also would mean you can directly match "😀" with .. as rest instead of [0x01, 0xF6, 0x00, .. as rest]
one idea that sounds ergonomic (but potentially tricky to implement) would be to allow interpolation in string patterns, e.g.:
when str is
"blah\(rest)" -> ...
This would be great!
there's definitely a rabbit hole there haha
like what if I do this:
when str is
"foo\(bar)baz\(blah)" -> ...
"blah\(rest)" -> ...
at that point it's partway to running a series of regular expressions on each string
which on its own is doable (although certainly more complex than what we have today) but then what about exhaustiveness checking?
also there are various edge cases which should probably generate warnings, e.g. here \(y) would always match empty string because \(x) would have consumed everything
when str is
"\(x)\(y)" -> ...
regarding exhaustiveness, "\(x)" -> on its own would be equivalent to x ->, so that's exhaustive
but then also any of the "\(x)\(y)\(z)" -> variations would be exhaustive too (but with warnings about unused identifiers)
I guess actually exhaustiveness wouldn't be that tricky, in the sense that as soon as you have any non-interpolation content in the string literal, it's no longer exhaustive
It's pretty cool, but I wonder how often it'd be actually useful. I have a feeling people will start using this, realize it's not quite enough for their use case, and demand full regex support :upside_down:
The way this works in erlang/Elixir is quite powerful and maybe be worth looking at for inspiration :)
<> is the concat operator or special syntax for the pattern matching specifically
If concat "one" <> rest would be equivalent to "one\(rest)" in current roc. So that would suggest using interpolation for variable binding in roc.
Yeah, I would definitely save the string pattern matching to regexes or weakened form that's syntax sugar over list pattern matching
https://www.egison.org/ is interesting but doing this kind of stuff is super complicated
I think a good start that could be expanded later if we want is just matching the prefix, rest, and suffix
Agus Zubiaga said:
It's pretty cool, but I wonder how often it'd be actually useful.
one way it could be used: for matching URLs, it could save splitting the string on "/" first:
["users", username, "posts", postId] ->
vs
"/users/\(username)/posts/\(postId)" ->
also could facilitate breaking down URLs into pieces and matching on separate parts separately, e.g.
"/users/\(username)/\(rest)" ->
when rest is
"posts/\(postId)" -> ...
"profile" -> ...
heh, the backslash-escapes for interpolation look awkward there, but if we had a different interpolation syntax e.g...
"/users/${username}/posts/${postId}" -> ...
"/users/${username}/${rest}" ->
when rest is
"comments/${commentId}" -> ...
"profile" -> ...
If we had a concatenation operator, we could do:
fn = \str, sum ->
when str is
"0" ++ rest -> fn rest sum
"1" ++ rest -> fn rest (sum + 1)
...
"9" ++ rest -> fn rest (sum + 9)
_ -> sum
Is there an inherent advantage of that over using interpolation? I think it boils down to the same thing.
Brendan Hansknecht said:
<>is the concat operator or special syntax for the pattern matching specifically
The same operator is used for concatenation of strings (binaries). It can only be used last in a pattern match though so it is like a “rest” param.
The cool thing is that you can pattern match and bind variables on the bytes. “Extract a 2-byte integer to variable a, followed by the byte value 57 and bind the rest to the variable b”
It is very convenient for parsing binary protocols and formats.
Richard Feldman said:
also could facilitate breaking down URLs into pieces and matching on separate parts separately, e.g.
"/users/\(username)/\(rest)" -> when rest is "posts/\(postId)" -> ... "profile" -> ...
Wouldn't this leave room for ambiguities, like:
"/users/foo/bar/baz/"
could match into
username = "foo"
rest = "bar/baz"
or
username = "foo/bar"
rest = "baz"
you could make rules to make one of these the "principle" one but that doesn't feel satisfying imo
it's a good question - I think it would have to be a greedy match (so, it would always match username = "foo")
If I'd have to pick I'd probably land on that too, but if I encountered this feature for the first time I feel like I'd have to google or test this behavior before using it-- and that's if the potential ambiguity even occurred to me
but then again I've never used Erlang/Elixr or another language that already has this so this might be the unfamiliarity talking
i think my preference would be guarantees that converting a string to an list of bytes, codepoints, or grapheme clusters would be be especially convenient when pattern matching. That way we could use the already effective list matching syntax while remaining confident about performance.
Generally, converting a Str to List U8 should just be a refcount increment, but for other key functions, given something like...
when Str.graphemes str is
["a", "b", ..] -> 1
["a", ..] -> 2
["c", ..] -> 3
I would hope (in the eventual future) that the above would not actually split the whole string into graphemes, but would instead decode at most the first two graphemes.
I would hope the same would apply with pattern matching on the result of Str.toScalars as well.
A Str to a List U8 only works that way for big strings
As for graphemes, while theoretically doable, hard (impossible?) to make generically apply (though maybe it only needs to apply for common builtins)
Okay, fair. For strings small enough to have special handling, the copying cost isn't too consequential, I guess
And I'm not suggesting this be a generic thing. Literally just optimization support for just those 2-3 Str functions with pattern matching as a roadmap item.
Ah, yeah. That should be doable with some grit to update the pattern matching code for variable lengths
That way we could use the already effective list matching syntax
Typing out a pattern as a list is significantly more annoying to me than being able to just type a string. Using lists also stops you from being able to copy and paste existing strings for use in a pattern.
Using interpolation to pull data back out of strings seems neat but also not great to me. The CUE language has identical interpolation syntax, and is also quite focused on pattern matching, including matching based off of interpolated strings.
Because of how it works, I think CUE could get there (i.e. match on "\(a)+\(b)" and then constrain a and b in different ways, such as based on a regex). It's very complex to achieve, and it's not there yet, and it might be that they decide to not support that kind of transitive-constraint string matching at all.
Conversely, Roc is not a constraint language, and so can't rely on an accumulation of rules to give the matching fidelity some people might hope for. Given that conversations about interpretation-based formatting have been reasonably non-committal, I don't see Roc adding, e.g. richer interpolation syntax to say: match "\(x)\(y)" but x must be composed of numeric digits, such that an input of "123a" sets x to "123" and y to "a". We could use pattern match guards for very limited use cases, but they wouldn't be powerful enough to solve meaningful problems in this space, since guards can only validate after the fact, not guide the bounds of each interpolation match.
These are things that existing techniques, including list matching, can do pretty well, provided that we just make sure they're pretty cheap (i.e. not re-splitting a whole string on each recursive step).
Also, what if I want to use interpolation based on already defined variables, not to match within the string, but to construct the string to be matched as a whole?
# toy example
isAlreadyConcatenated = \str, a, b ->
when str is
"\(a)\(b)" -> true
"\(b)\(a)" -> true
_ -> false
We can always add interpolation-based matching later, but I suspect that, even later, we'll determine that it creates more of an expectation of capability then we're willing to support. i.e. "that's all it does? / that's all it can be used for?" At that point it probably isn't worth supporting unless we're willing to make this kind of string matching, specifically, a highlight feature of the language (i.e. start intentionally claiming and supporting some perl use-case territory).
Hmm :thinking:. To me, I feel that I would find the simple case super convenient and nice in general. I feel like I might occasionally hit edge cases that want more power, but I would expect that to be rare enough that I can write custom functions for those cases.
I do agree that the syntax is super flexible which can be quite dangerous for expectations
I just saw this article that might be relevant. Apparently OCaml has a package to implement binary pattern matching, vaguely like Erlang has. OCaml is similar enough to Roc that maybe it could translate?
https://practicalocaml.com/parsing-with-binary-string-pattern-matching/
Has any more thought been put into this? I think it would be a good feature to have, regardless of the exact syntax
Last updated: Jun 16 2026 at 16:19 UTC