Hi there! I'm new to Roc, and I'm having fun with it on Exercism. Thanks for the nice language!
Is there a convenient way to perform case-insensitive string comparisons using the stdlib? If not, is there some reasonably idiomatic way to do so?
If you know you only have ASCII then it's easy to roll you're own using the builtins.
If you have utf-8/unicode then there isn't an easy way to do it just with the builtins.
If you can use a package then I'd recommend you checkout @Hannes's https://github.com/Hasnep/roc-ascii
The longer term plan is for working with unicode is https://github.com/roc-lang/unicode
Yep, just ASCII. What would be an idiomatic way to do that?
The unicode package is a WIP and doesn't have the Str comparison functions your looking for
Can you use a package in exercism? @Isaac Van Doren would know
I'll make an example for you, 1 sec
I'd say Ascii.toLowercase left == Ascii.toLowercase right
I don't know if I can use a package in exercism.
For a simple approach (ASCII only)
toLowercase : Str -> List U8
toLowercase = \str ->
casingDelta = 'a' - 'A'
Str.toUtf8 str
|> List.map \byte ->
if byte >= 'A' && byte <= 'Z' then
byte + casingDelta
else
byte
insensitiveEqual : Str, Str -> Bool
insensitiveEqual = \left, right ->
toLowercase left == toLowercase right
You can use packages in Exercism but they have to be explicitly added to that exercise in the language track. If it seems like many users of an exercise would want a particular package we can certainly add one
In this case hand-rolling might be straightforward enough
Chuck Daniels has marked this topic as resolved.
Looks like @Sam Mohr beat me to it...
module [to_upper, to_lower]
to_upper : Str -> Str
to_upper = \str ->
str
|> Str.toUtf8
|> List.map to_upper_help
|> Str.fromUtf8
|> unwrap
to_lower : Str -> Str
to_lower = \str ->
str
|> Str.toUtf8
|> List.map to_lower_help
|> Str.fromUtf8
|> unwrap
expect to_upper "hello" == "HELLO"
expect to_upper "Hello" == "HELLO"
expect to_lower "HELLO" == "hello"
expect to_lower "Hello" == "hello"
to_upper_help = \c -> if c >= 'a' && c <= 'z' then c - delta else c
to_lower_help = \c -> if c >= 'A' && c <= 'Z' then c + delta else c
delta = 'a' - 'A'
unwrap : Result a _ -> a
unwrap = \result ->
when result is
Ok a -> a
Err _ -> crash "quick and dirty error handling"
$ roc test conversion.roc
0 failed and 4 passed in 523 ms.
A cleaner quick-and-dirty solution
I've added a Good First Issue for someone to add this to our Examples repo
https://github.com/roc-lang/examples/issues/222
Nothing wrong with that solution, but I'd like to suggest an alternative. ASCII letters are conveniently arranged so that only one bit flips between lowercase and uppercase. So you can actually do this branchless with a bitwise operation:
to_lower_help = \c -> Num.bitwiseOr 32 c
# or:
to_lower_help = \c -> Num.bitwiseOr ('A' - 'a') c
Actually, now that I think about it, this probably shouldn’t be used in the examples. Some non-letter characters would become different ones :sweat_smile:
So it’s only safe if you know you’re only working with a restricted subset. Probably just an interesting trick then.
you can check first, e.g.
if c >= 'a' && c <= 'Z' then
Num.bitwiseOr ('A' - 'a') c
else
c
Richard Feldman said:
you can check first, e.g.
if c >= 'a' && c <= 'Z' then Num.bitwiseOr ('A' - 'a') c else c
I think you meant c >= 'A' && c <= 'Z'
, although I would write it as 'A' <= c && c <= 'Z'
.
You can check but then idk how much better it’s than the original solution
I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.
Here's my approach:
asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000
charToLower : U8 -> U8
charToLower = \char ->
if 'A' <= char && char <= 'Z' then
char |> Num.bitwiseXor asciiLowercaseBit # Set lowercase bit
else
char
charToUpper : U8 -> U8
charToUpper = \char ->
if 'a' <= char && char <= 'z' then
char |> Num.bitwiseXor asciiLowercaseBit # Clear lowercase bit
else
char
Or this, for some additional functions:
asciiLowercaseBit : U8
asciiLowercaseBit = 0b0010_0000
isUpper : U8 -> Bool
isUpper = \char -> 'A' <= char && char <= 'Z'
isLower : U8 -> Bool
isLower = \char -> 'a' <= char && char <= 'a'
toggleCase : U8 -> U8
toggleCase = \char -> char |> Num.bitwiseXor asciiLowercaseBit # Toggle lowercase bit
charToLower : U8 -> U8
charToLower = \char -> if char |> isUpper then char |> toggleCase else char
charToUpper : U8 -> U8
charToUpper = \char -> if char |> isLower then char |> toggleCase else char
I suspect that using the bitwise operation would show noticeable performance benefit only in the context of performing a large volume of character conversions.
If llvm can manage to optimize it to simd, bitwise would likely be significantly faster (it probably will fail if you use a conditional instead of doing fully bitwise math).
Otherwise, they should be useing the exact same processor units, so no perf diff (I think they both should be single cycle and limited by ram loads and stores in a loop, which should be pipelined)
So the fast version would be something like:
(Num.bitwiseAnd ('A' <= char) (char <= 'Z'))
|> Num.shiftLeftBy 5
|> Num.bitwiseXor char
llvm should turn that into a simd loop
oh, probably need to Num.toU8
the conditionals
oh yeah, we can't Num.toU8
a bool... that's annoying
So I guess a working form of the more optimized version would be to take the helpers from @Chuck Daniels above, but write the core functions as:
charToLower : U8 -> U8
charToLower = \char ->
toggle = if isUpper char then asciiLowercaseBit else 0
Num.bitwiseXor char toggle
charToUpper : U8 -> U8
charToUpper = \char ->
toggle = if isLower char then asciiLowercaseBit else 0
Num.bitwiseXor char toggle
That should turn into a hot simd loop if used within List.map
or similar
Brendan Hansknecht said:
oh yeah, we can't
Num.toU8
a bool... that's annoying
we could offer Bool.toNum : Bool -> Num *
https://github.com/roc-lang/roc/issues/7395
Last updated: Jul 06 2025 at 12:14 UTC