I know roc is pre-pre-alpha, but really? You can't compare two strings ... ?
"abc" < "def"
This 1st argument to > has an unexpected type:
4│ "abc" > "dev"
^^^^^
The argument is a string of type:
Str
But > needs its 1st argument to be:
Num a
What would you expect the result to be? What about if there are two different emojis?
I think this is similar to comparing floats, we can't compare NaN, so we can't provide a "correct" comparison impl for floats
Unless we just panic on totally normal values
The same applies to strings, unfortunately
Yeah, this was a deliberate choice. Sorting can vary by language and even by country.
Unicode is just a horrible rats nest, and we've pushed for correctness over simplicity pretty much every time, even if 99% of languages don't care about this
I would just say that excluding a feature that every other computer language in the known universe seems to provide without causing even a single computer to implode is a much bigger issue than putting people off because of indent blocks.
indent blocks?
By the way, if your application is only working with ascii strings you can use the roc-ascii package
That was the only term I could think of ... isn't there a very long discussion going on about using braces to define a code block?
We're aware of how inconvenient some of these things are, it's just very easy for subtle bugs to sneak into vital software if we don't do this.
Anton said:
By the way, if your application is only working with ascii strings you can use the roc-ascii package
is there a manual / tutorial somewhere?
I assume you mean for roc-ascii specifically?
It would indeed be good to add some examples to the roc-ascii repo's README.
You probably want to use these functions:
https://hasnep.github.io/roc-ascii/Ascii/#from_str
https://hasnep.github.io/roc-ascii/Ascii/#to_str
https://hasnep.github.io/roc-ascii/Ascii/#sort_asc
https://hasnep.github.io/roc-ascii/Ascii/#sort_desc
This code uses the old Roc syntax but may still be helpful for you
Jonathan Kelly said:
I would just say that excluding a feature that every other computer language in the known universe seems to provide without causing even a single computer to implode is a much bigger issue than putting people off because of indent blocks.
I understand you are frustrated, but please refrain from adding accusatory hyperbole.
Roc restrict users from doing many things. String comparisons is one of them. Roc is being "friendly" by reducing the design space and trying to remove common pitfalls. Strings are full of common pitfalls that developers get burned by all the time. Not having string comparison, string byte indexing, captialization, character splitting, etc directly in the language has lead to many productive discussions with developers about common pitfalls, workarounds, and future plans.
Long term, we do want an official unicode package that has all the information necessary to make the correct decisions. On top of that, we likely will add some methods to the standard library that do the naive things full of pitfalls (those methods will just be very explicitly named around the pitfall, have strong docs explaining the issues, and link to the unicode library function that can handle the problem without pitfalls). We just haven't reached this point yet cause wrangling unicode and localizations is a lot of work and we have many things to get done.
We should probably add a special error message explaining all this if someone tries to use the number comparison operators on strings, but that's probably a longer term thing
If anyone is interested in one person's experience, which will also explain the level of my frustration that lead to the hyperbole, the fact that this "feature" of roc is nowhere to be found on the roc-lang.org website, feels deceptive. Or is the word disingenuous?
I read the tutorial and was impressed enough to give roc a try. Would I have invested the last 3 days of my free time if I'd encountered this fact in the tutorial, or the Str manual page, which I had read from top to bottom? I don't know, but at least I would have known up front, and had some time to consider the implications before the frustration set in. And I would NOT have wasted the better part of an hour trying to work out what I had done wrong to make the compiler think one of the elements in my program should be a Num *.
... and no, I didn't actually audit the entire website .. I asked chatgpt, and then asked it to check again.
I couldn't find an explicit statement on the Roc Programming Language's website
about the absence of built-in functions for lexicographical string comparisons using
less than or greater than operators. However, the Str module documentation lists
the available string operations, and functions for direct lexicographical comparisons
are not included. This omission indicates that such comparisons are not currently
supported in Roc's standard library.
I think the answer here is that we should special-case the compiler error message for this to explain not just why it's gone, but also why it's the best design
also once we have the Sort
module, the error message can suggest something from there as an alternative
Or, you could be transparent, and put it in the Tutorial. Or the FAQ. Or the Str reference page.
I understand that you're upset about this. Message received. You don't need to keep hammering that point home. :smile:
it definitely seems like a good idea to put this in the Str
docs explicitly
I think the tutorial needs to stay focused on "here's how to get to the next step" and not go on too many tangents, and this feels like it would be too much of a tangent
I could also see FAQ, maybe under a broader umbrella of "things Roc intentionally does differently when it comes to strings" - and maybe just link to the Str
docs
I'll make those additions
@Jonathan Kelly
without causing even a single computer to implode
LOL. Literally the source of all bugs is due to a discrepancy between the mental model of the developer of the language's design, the mental model of the developer using the language, any unaddressed loose ends or corner cases, and how the language actually behaves. Some of these discrepancies can be subtle, and those are some of the worst- I've literally spent entire developer-months trying to run down bugs based on this sort of thing. So experienced developers have learned, the hard way, that the more explicit you are, the better, simply because it reduces the potential surface area of the aforementioned things.
Look up "Postel's Law" and how it's become seen over the past 20+ years, as to why the Roc designers probably made this (IMHO correct, if a bit frustrating) choice. And, you can always just write a function to encode the string you're trying to compare into any ordinal numeric pattern you want and then compare those (numerically)- but even then, that would expose cases you might not have considered- For example, which word is "less than" the other in the following 2 words: "auto" and "automatic"? What about "auto" and "áüto"? (Do you ignore foreign diacritics in comparisons, or do you sort them differently, and if so, what order? Etc.) Is a capital "less than" or "greater than" its equivalent lowercase?
I would much rather have the language force me to explicitly define these textual relationships than stick in its own possibly flawed interpretation of them (which might end up surprising me on certain inputs, thus causing bugs).
Thanks, Roc language designers. Keep going. ;)
I would perhaps agree that a helpful error message WHEN someone tries to directly compare two strings (in an ordinal fashion, not a direct equivalence test) might be helpful to beginners.
without causing even a single computer to implode
This video shows why it's important to properly handle unicode and string length: https://www.youtube.com/watch?v=rgsIkZkflMw
No computers imploded but Chinese hackers did get access to a US treasury database.
Just watched that not 2 hours ago after it was recommended in my YT feed.
Really astonishing how string handling almost seems like it requires strict typing in any legitimate security context. Especially where UTF8 is concerned. (Notably, MAYBE, this bug wouldn't have occurred in a UTF-16 context, which is possibly why Microsoft chose that route instead...)
I'd normally bash PHP, but they were relying on the assumption that Postgres' own string-escape functions were not buggy or security-hole-ridden and calling into them...
Richard Feldman said:
I could also see FAQ, maybe under a broader umbrella of "things Roc intentionally does differently when it comes to strings" - and maybe just link to the
Str
docs
As someone who has been tinkering with roc fairly casually for a bit, and who was also taken aback by how complicated string manipulation turned out to be (relative to my expectations, at least), I think this would be really nice.
I appreciate the reasoning for it and I'm not super mad about it, but I think currently there's a bit of friction that arises from the difference between the use case for which this a good design decision (production apps with many users who might not all be in the same place) and what I imagine is the median use case for Roc as a language that is still in alpha (quick and dirty apps for fun or personal use, written in Roc because the programmer was curious about the language).
It might be a little less of a frustrating thing to encounter if there's an warning and explanation somewhere easily found.
I think currently there's a bit of friction that arises from the difference between the use case for which this a good design decision (production apps with many users who might not all be in the same place) and what I imagine is the median use case for Roc as a language that is still in alpha (quick and dirty apps for fun or personal use, written in Roc because the programmer was curious about the language).
This has been added to the faq :)
An easily discoverable tip will be added to the new compiler when it is ready.
Just a quick follow-up for people who are stuck on this issue:
If you really don't care about language-specific alphabetical ordering (which is super complicated and probably requires a full-blown Internationalization
library), you can implement your own string_compare
function by converting the Str
types to List U8
using the List.to_utf8
function, and then recursively iterating over the two lists to compare the U8
values at each index. Something like this:
compare_utf8 : List U8, List U8 -> [EQ, LT, GT]
compare_utf8 = |bytes_a, bytes_b|
when (bytes_a, bytes_b) is
([], []) -> EQ
([_a], []) -> GT
([], [_b]) -> LT
([head_a, .. as tail_a], [head_b, .. as tail_b]) ->
when Num.compare(head_a, head_b) is
EQ -> compare_utf8(tail_a, tail_b)
res -> res
_ -> EQ
compare : Str, Str -> [EQ, LT, GT]
compare = |str_a, str_b|
bytes_a = Str.to_utf8(str_a)
bytes_b = Str.to_utf8(str_b)
compare_utf8(bytes_a, bytes_b)
expect compare("abc", "abc") == EQ
expect compare("abc", "abcd") == LT
expect compare("abcd", "abc") == GT
expect compare("abc", "abd") == LT
expect compare("abd", "abc") == GT
This will probably give you what you expect for most English language UTF-8 strings, but it definitely doesn't handle any edge cases. Still, it's good enough if you're implementing algorithms and data structures that don't require true alphabetical ordering.
Last updated: Jul 06 2025 at 12:14 UTC