✔ case-insensitive string comparison · beginners

Stream: beginners

Topic: ✔ case-insensitive string comparison

Chuck Daniels (Dec 19 2024 at 22:41):

Hi there! I'm new to Roc, and I'm having fun with it on Exercism. Thanks for the nice language!

Is there a convenient way to perform case-insensitive string comparisons using the stdlib? If not, is there some reasonably idiomatic way to do so?

Luke Boswell (Dec 19 2024 at 22:47):

If you know you only have ASCII then it's easy to roll you're own using the builtins.

If you have utf-8/unicode then there isn't an easy way to do it just with the builtins.

Luke Boswell (Dec 19 2024 at 22:48):

If you can use a package then I'd recommend you checkout @Hannes's https://github.com/Hasnep/roc-ascii

Luke Boswell (Dec 19 2024 at 22:48):

The longer term plan is for working with unicode is https://github.com/roc-lang/unicode

Chuck Daniels (Dec 19 2024 at 22:49):

Yep, just ASCII. What would be an idiomatic way to do that?

Luke Boswell (Dec 19 2024 at 22:49):

The unicode package is a WIP and doesn't have the Str comparison functions your looking for

Luke Boswell (Dec 19 2024 at 22:50):

Can you use a package in exercism? @Isaac Van Doren would know

Luke Boswell (Dec 19 2024 at 22:50):

I'll make an example for you, 1 sec

Sam Mohr (Dec 19 2024 at 22:51):

I'd say Ascii.toLowercase left == Ascii.toLowercase right

Chuck Daniels (Dec 19 2024 at 22:51):

I don't know if I can use a package in exercism.

Sam Mohr (Dec 19 2024 at 22:51):

For a simple approach (ASCII only)

Sam Mohr (Dec 19 2024 at 22:53):

toLowercase : Str -> List U8
toLowercase = \str ->
    casingDelta = 'a' - 'A'
    Str.toUtf8 str
    |> List.map \byte ->
        if byte >= 'A' && byte <= 'Z' then
            byte + casingDelta
        else
            byte

insensitiveEqual : Str, Str -> Bool
insensitiveEqual = \left, right ->
    toLowercase left == toLowercase right

Isaac Van Doren (Dec 19 2024 at 22:53):

You can use packages in Exercism but they have to be explicitly added to that exercise in the language track. If it seems like many users of an exercise would want a particular package we can certainly add one

Isaac Van Doren (Dec 19 2024 at 22:54):

In this case hand-rolling might be straightforward enough

Notification Bot (Dec 19 2024 at 22:58):

Chuck Daniels has marked this topic as resolved.

Luke Boswell (Dec 19 2024 at 22:59):

Looks like @Sam Mohr beat me to it...

module [to_upper, to_lower]

to_upper : Str -> Str
to_upper = \str ->
    str
    |> Str.toUtf8
    |> List.map to_upper_help
    |> Str.fromUtf8
    |> unwrap

to_lower : Str -> Str
to_lower = \str ->
    str
    |> Str.toUtf8
    |> List.map to_lower_help
    |> Str.fromUtf8
    |> unwrap

expect to_upper "hello" == "HELLO"
expect to_upper "Hello" == "HELLO"

expect to_lower "HELLO" == "hello"
expect to_lower "Hello" == "hello"

to_upper_help = \c -> if c >= 'a' && c <= 'z' then c - delta else c
to_lower_help = \c -> if c >= 'A' && c <= 'Z' then c + delta else c
delta = 'a' - 'A'

unwrap : Result a _ -> a
unwrap = \result ->
    when result is
        Ok a -> a
        Err _ -> crash "quick and dirty error handling"

$ roc test conversion.roc
0 failed and 4 passed in 523 ms.

Sam Mohr (Dec 19 2024 at 23:00):

A cleaner quick-and-dirty solution

Luke Boswell (Dec 20 2024 at 00:06):

I've added a Good First Issue for someone to add this to our Examples repo

https://github.com/roc-lang/examples/issues/222

Agus Zubiaga (Dec 20 2024 at 03:33):

Nothing wrong with that solution, but I'd like to suggest an alternative. ASCII letters are conveniently arranged so that only one bit flips between lowercase and uppercase. So you can actually do this branchless with a bitwise operation:

to_lower_help = \c -> Num.bitwiseOr 32 c

# or:

to_lower_help = \c -> Num.bitwiseOr ('A' - 'a') c

Agus Zubiaga (Dec 20 2024 at 03:46):

Actually, now that I think about it, this probably shouldn’t be used in the examples. Some non-letter characters would become different ones :sweat_smile:

Agus Zubiaga (Dec 20 2024 at 03:47):

So it’s only safe if you know you’re only working with a restricted subset. Probably just an interesting trick then.

Richard Feldman (Dec 20 2024 at 05:14):

you can check first, e.g.

if c >= 'a' && c <= 'Z' then
    Num.bitwiseOr ('A' - 'a') c
else
    c

Chuck Daniels (Dec 20 2024 at 11:21):

Richard Feldman said:

you can check first, e.g.

if c >= 'a' && c <= 'Z' then
    Num.bitwiseOr ('A' - 'a') c
else
    c

I think you meant c >= 'A' && c <= 'Z', although I would write it as 'A' <= c && c <= 'Z'.

Agus Zubiaga (Dec 20 2024 at 11:38):

You can check but then idk how much better it’s than the original solution

Chuck Daniels (Dec 20 2024 at 13:50):

I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.

Here's my approach:

asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000

charToLower : U8 -> U8
charToLower = \char ->
    if 'A' <= char && char <= 'Z' then
        char |> Num.bitwiseXor asciiLowercaseBit # Set lowercase bit
    else
        char

charToUpper : U8 -> U8
charToUpper = \char ->
    if 'a' <= char && char <= 'z' then
        char |> Num.bitwiseXor asciiLowercaseBit # Clear lowercase bit
    else
        char

Chuck Daniels (Dec 20 2024 at 14:36):

Or this, for some additional functions:

asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000

isUpper : U8 -> Bool
isUpper = \char -> 'A' <= char && char <= 'Z'

isLower : U8 -> Bool
isLower = \char -> 'a' <= char && char <= 'a'

toggleCase : U8 -> U8
toggleCase = \char -> char |> Num.bitwiseXor asciiLowercaseBit # Toggle lowercase bit

charToLower : U8 -> U8
charToLower = \char -> if char |> isUpper then char |> toggleCase else char

charToUpper : U8 -> U8
charToUpper = \char -> if char |> isLower then char |> toggleCase else char

Brendan Hansknecht (Dec 20 2024 at 17:17):

I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.

If llvm can manage to optimize it to simd, bitwise would likely be significantly faster (it probably will fail if you use a conditional instead of doing fully bitwise math).

Otherwise, they should be useing the exact same processor units, so no perf diff (I think they both should be single cycle and limited by ram loads and stores in a loop, which should be pipelined)

Brendan Hansknecht (Dec 20 2024 at 17:19):

So the fast version would be something like:

(Num.bitwiseAnd ('A' <= char) (char <= 'Z'))
    |> Num.shiftLeftBy 5
    |> Num.bitwiseXor char

Brendan Hansknecht (Dec 20 2024 at 17:19):

llvm should turn that into a simd loop

Brendan Hansknecht (Dec 20 2024 at 17:20):

oh, probably need to Num.toU8 the conditionals

Brendan Hansknecht (Dec 20 2024 at 17:24):

oh yeah, we can't Num.toU8 a bool... that's annoying

Brendan Hansknecht (Dec 20 2024 at 17:28):

So I guess a working form of the more optimized version would be to take the helpers from @Chuck Daniels above, but write the core functions as:

charToLower : U8 -> U8
charToLower = \char ->
    toggle = if isUpper char then asciiLowercaseBit else 0
    Num.bitwiseXor char toggle

charToUpper : U8 -> U8
charToUpper = \char ->
    toggle = if isLower char then asciiLowercaseBit else 0
    Num.bitwiseXor char toggle

Brendan Hansknecht (Dec 20 2024 at 17:29):

That should turn into a hot simd loop if used within List.map or similar

Richard Feldman (Dec 20 2024 at 17:35):

Brendan Hansknecht said:

oh yeah, we can't Num.toU8 a bool... that's annoying

we could offer Bool.toNum : Bool -> Num *

Sam Mohr (Dec 20 2024 at 19:34):

https://github.com/roc-lang/roc/issues/7395

Last updated: Aug 17 2025 at 12:14 UTC