Stream: beginners

Topic: Comparing strings


view this post on Zulip Andy Ferris (May 04 2024 at 06:34):

How do I compare strings in Roc? I am looking for something along the lines of "a" < "b" returning Bool.true, or equivalently a cmp function.

As a follow-up, if there is no builtin for this, how do I co-iterate two strings? I see Str.walkUtf8. Is there a zip function, an iterate ability, etc? Why is walkUtf8 giving U8 on each iteration instead of a codepoint (U32)?

view this post on Zulip Anton (May 04 2024 at 09:18):

Hi @Andy Ferris,
We've made several deliberate decisions about how Strings work in Roc but they may seem annoying at first.

How do I compare strings in Roc? I am looking for something along the lines of "a" < "b" returning Bool.true, or equivalently a cmp function.

We talk a bit about this in "String equality and normalization" here.
We really need to look at finishing up the github.com/roc-lang/unicode package.

how do I co-iterate two strings? I see Str.walkUtf8. Is there a zip function, an iterate ability, etc?

If possible we recommend going with Str.toUtf8, zip can be achieved with List.map2. If you can, working with List U8 will yield superior performance.

Why is walkUtf8 giving U8 on each iteration instead of a codepoint (U32)?

I believe this was done because new unicode code points can be added in new versions of unicode and we didn't want to be required to make a new Roc release when this happens, so we chose to put codepoint stuff in roc-lang/unicode.
Perhaps also because we did not want to make it too easy to reach for, in several cases codepoints may not really be what you should use.

There are a lot of gotchas in working with strings, so we wanted to encourage users to educate themselves.

I'll make it a priority to get the unicode package ready for release.

Feel free to ask additional questions.

view this post on Zulip Richard Feldman (May 04 2024 at 11:08):

an important question here is what you're looking to use the comparison for! There are a lot of considerations for string comparison :big_smile:

view this post on Zulip Richard Feldman (May 04 2024 at 11:10):

for example, if you just want to compare the bytes as fast as possible but the actual ordering doesn't really matter (e.g. to put them into a tree data structure where the goal is a set of performance characteristics and not displaying to the user), the best comparison function for that is different from the best comparison function for sorting alphabetically (for user display)

view this post on Zulip Richard Feldman (May 04 2024 at 11:11):

and correctly sorting alphabetically also requires giving the comparison function an extra argument because different locales have different rules for sorting the same string sequences, e.g. in Danish, the sequence "aa" is sorted as a single character, and it comes after "z"

view this post on Zulip Anton (May 04 2024 at 11:27):

We should make a flowchart some time to help users figure out what they need for their string operations.

view this post on Zulip Luke Boswell (May 04 2024 at 18:24):

Also checkout @Hannes new package https://github.com/Hasnep/roc-ascii

view this post on Zulip Luke Boswell (May 04 2024 at 18:25):

We dont have the Sortable ability, but I imagine we could implement it for ASCII and then be able to compare strings using that.

view this post on Zulip Luke Boswell (May 04 2024 at 18:25):

As in the Sortable Ability hasnt been implemented yet but is planned I think.

view this post on Zulip Jasper Woudenberg (May 04 2024 at 18:56):

Luke Boswell said:

As in the Sortable Ability hasnt been implemented yet but is planned I think.

I am working on it! (but slowly :sweat_smile:).

view this post on Zulip Hannes (May 06 2024 at 00:59):

I've just released v0.2.0 of roc-ascii which adds the Ascii.compare and Ascii.sortAsc functions for sorting strings in "ASCIIbetical" order, i.e. sorting by ASCII codepoints. It's essentially the same as alphabetical order, but with some quirks around case and punctuation.

If all your strings are known to be ASCII, then that could be an option :)

view this post on Zulip Hristo (May 06 2024 at 06:24):

Nice!

I have StrTools.cmpU8Based in roc-tools, but I haven't been mentioning roc-tools here (I did notice that it was captured in your package index, @Hannes) because it's still severely lacking docstrings.


Last updated: Jul 05 2025 at 12:14 UTC