Stream: beginners

Topic: ✔ case-insensitive string comparison


view this post on Zulip Chuck Daniels (Dec 19 2024 at 22:41):

Hi there! I'm new to Roc, and I'm having fun with it on Exercism. Thanks for the nice language!

Is there a convenient way to perform case-insensitive string comparisons using the stdlib? If not, is there some reasonably idiomatic way to do so?

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:47):

If you know you only have ASCII then it's easy to roll you're own using the builtins.

If you have utf-8/unicode then there isn't an easy way to do it just with the builtins.

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:48):

If you can use a package then I'd recommend you checkout @Hannes's https://github.com/Hasnep/roc-ascii

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:48):

The longer term plan is for working with unicode is https://github.com/roc-lang/unicode

view this post on Zulip Chuck Daniels (Dec 19 2024 at 22:49):

Yep, just ASCII. What would be an idiomatic way to do that?

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:49):

The unicode package is a WIP and doesn't have the Str comparison functions your looking for

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:50):

Can you use a package in exercism? @Isaac Van Doren would know

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:50):

I'll make an example for you, 1 sec

view this post on Zulip Sam Mohr (Dec 19 2024 at 22:51):

I'd say Ascii.toLowercase left == Ascii.toLowercase right

view this post on Zulip Chuck Daniels (Dec 19 2024 at 22:51):

I don't know if I can use a package in exercism.

view this post on Zulip Sam Mohr (Dec 19 2024 at 22:51):

For a simple approach (ASCII only)

view this post on Zulip Sam Mohr (Dec 19 2024 at 22:53):

toLowercase : Str -> List U8
toLowercase = \str ->
    casingDelta = 'a' - 'A'
    Str.toUtf8 str
    |> List.map \byte ->
        if byte >= 'A' && byte <= 'Z' then
            byte + casingDelta
        else
            byte

insensitiveEqual : Str, Str -> Bool
insensitiveEqual = \left, right ->
    toLowercase left == toLowercase right

view this post on Zulip Isaac Van Doren (Dec 19 2024 at 22:53):

You can use packages in Exercism but they have to be explicitly added to that exercise in the language track. If it seems like many users of an exercise would want a particular package we can certainly add one

view this post on Zulip Isaac Van Doren (Dec 19 2024 at 22:54):

In this case hand-rolling might be straightforward enough

view this post on Zulip Notification Bot (Dec 19 2024 at 22:58):

Chuck Daniels has marked this topic as resolved.

view this post on Zulip Luke Boswell (Dec 19 2024 at 22:59):

Looks like @Sam Mohr beat me to it...

module [to_upper, to_lower]

to_upper : Str -> Str
to_upper = \str ->
    str
    |> Str.toUtf8
    |> List.map to_upper_help
    |> Str.fromUtf8
    |> unwrap

to_lower : Str -> Str
to_lower = \str ->
    str
    |> Str.toUtf8
    |> List.map to_lower_help
    |> Str.fromUtf8
    |> unwrap

expect to_upper "hello" == "HELLO"
expect to_upper "Hello" == "HELLO"

expect to_lower "HELLO" == "hello"
expect to_lower "Hello" == "hello"

to_upper_help = \c -> if c >= 'a' && c <= 'z' then c - delta else c
to_lower_help = \c -> if c >= 'A' && c <= 'Z' then c + delta else c
delta = 'a' - 'A'

unwrap : Result a _ -> a
unwrap = \result ->
    when result is
        Ok a -> a
        Err _ -> crash "quick and dirty error handling"
$ roc test conversion.roc
0 failed and 4 passed in 523 ms.

view this post on Zulip Sam Mohr (Dec 19 2024 at 23:00):

A cleaner quick-and-dirty solution

view this post on Zulip Luke Boswell (Dec 20 2024 at 00:06):

I've added a Good First Issue for someone to add this to our Examples repo

https://github.com/roc-lang/examples/issues/222

view this post on Zulip Agus Zubiaga (Dec 20 2024 at 03:33):

Nothing wrong with that solution, but I'd like to suggest an alternative. ASCII letters are conveniently arranged so that only one bit flips between lowercase and uppercase. So you can actually do this branchless with a bitwise operation:

to_lower_help = \c -> Num.bitwiseOr 32 c

# or:

to_lower_help = \c -> Num.bitwiseOr ('A' - 'a') c

view this post on Zulip Agus Zubiaga (Dec 20 2024 at 03:46):

Actually, now that I think about it, this probably shouldn’t be used in the examples. Some non-letter characters would become different ones :sweat_smile:

view this post on Zulip Agus Zubiaga (Dec 20 2024 at 03:47):

So it’s only safe if you know you’re only working with a restricted subset. Probably just an interesting trick then.

view this post on Zulip Richard Feldman (Dec 20 2024 at 05:14):

you can check first, e.g.

if c >= 'a' && c <= 'Z' then
    Num.bitwiseOr ('A' - 'a') c
else
    c

view this post on Zulip Chuck Daniels (Dec 20 2024 at 11:21):

Richard Feldman said:

you can check first, e.g.

if c >= 'a' && c <= 'Z' then
    Num.bitwiseOr ('A' - 'a') c
else
    c

I think you meant c >= 'A' && c <= 'Z', although I would write it as 'A' <= c && c <= 'Z'.

view this post on Zulip Agus Zubiaga (Dec 20 2024 at 11:38):

You can check but then idk how much better it’s than the original solution

view this post on Zulip Chuck Daniels (Dec 20 2024 at 13:50):

I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.

Here's my approach:

asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000

charToLower : U8 -> U8
charToLower = \char ->
    if 'A' <= char && char <= 'Z' then
        char |> Num.bitwiseXor asciiLowercaseBit # Set lowercase bit
    else
        char

charToUpper : U8 -> U8
charToUpper = \char ->
    if 'a' <= char && char <= 'z' then
        char |> Num.bitwiseXor asciiLowercaseBit # Clear lowercase bit
    else
        char

view this post on Zulip Chuck Daniels (Dec 20 2024 at 14:36):

Or this, for some additional functions:

asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000

isUpper : U8 -> Bool
isUpper = \char -> 'A' <= char && char <= 'Z'

isLower : U8 -> Bool
isLower = \char -> 'a' <= char && char <= 'a'

toggleCase : U8 -> U8
toggleCase = \char -> char |> Num.bitwiseXor asciiLowercaseBit # Toggle lowercase bit

charToLower : U8 -> U8
charToLower = \char -> if char |> isUpper then char |> toggleCase else char

charToUpper : U8 -> U8
charToUpper = \char -> if char |> isLower then char |> toggleCase else char

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:17):

I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.

If llvm can manage to optimize it to simd, bitwise would likely be significantly faster (it probably will fail if you use a conditional instead of doing fully bitwise math).

Otherwise, they should be useing the exact same processor units, so no perf diff (I think they both should be single cycle and limited by ram loads and stores in a loop, which should be pipelined)

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:19):

So the fast version would be something like:

(Num.bitwiseAnd ('A' <= char) (char <= 'Z'))
    |> Num.shiftLeftBy 5
    |> Num.bitwiseXor char

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:19):

llvm should turn that into a simd loop

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:20):

oh, probably need to Num.toU8 the conditionals

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:24):

oh yeah, we can't Num.toU8 a bool... that's annoying

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:28):

So I guess a working form of the more optimized version would be to take the helpers from @Chuck Daniels above, but write the core functions as:

charToLower : U8 -> U8
charToLower = \char ->
    toggle = if isUpper char then asciiLowercaseBit else 0
    Num.bitwiseXor char toggle

charToUpper : U8 -> U8
charToUpper = \char ->
    toggle = if isLower char then asciiLowercaseBit else 0
    Num.bitwiseXor char toggle

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:29):

That should turn into a hot simd loop if used within List.map or similar

view this post on Zulip Richard Feldman (Dec 20 2024 at 17:35):

Brendan Hansknecht said:

oh yeah, we can't Num.toU8 a bool... that's annoying

we could offer Bool.toNum : Bool -> Num *

view this post on Zulip Sam Mohr (Dec 20 2024 at 19:34):

https://github.com/roc-lang/roc/issues/7395


Last updated: Jul 06 2025 at 12:14 UTC