Stream: beginners

Topic: Friendly ... ?


view this post on Zulip Jonathan Kelly (Mar 04 2025 at 11:10):

I know roc is pre-pre-alpha, but really? You can't compare two strings ... ?

"abc" < "def"

This 1st argument to > has an unexpected type:

4│ "abc" > "dev"
^^^^^
The argument is a string of type:
Str
But > needs its 1st argument to be:
Num a

view this post on Zulip Sam Mohr (Mar 04 2025 at 11:24):

What would you expect the result to be? What about if there are two different emojis?

view this post on Zulip Sam Mohr (Mar 04 2025 at 11:25):

I think this is similar to comparing floats, we can't compare NaN, so we can't provide a "correct" comparison impl for floats

view this post on Zulip Sam Mohr (Mar 04 2025 at 11:25):

Unless we just panic on totally normal values

view this post on Zulip Sam Mohr (Mar 04 2025 at 11:25):

The same applies to strings, unfortunately

view this post on Zulip Anton (Mar 04 2025 at 11:25):

Yeah, this was a deliberate choice. Sorting can vary by language and even by country.

view this post on Zulip Sam Mohr (Mar 04 2025 at 11:26):

Unicode is just a horrible rats nest, and we've pushed for correctness over simplicity pretty much every time, even if 99% of languages don't care about this

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 11:32):

I would just say that excluding a feature that every other computer language in the known universe seems to provide without causing even a single computer to implode is a much bigger issue than putting people off because of indent blocks.

view this post on Zulip Anton (Mar 04 2025 at 11:36):

indent blocks?

view this post on Zulip Anton (Mar 04 2025 at 11:37):

By the way, if your application is only working with ascii strings you can use the roc-ascii package

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 11:37):

That was the only term I could think of ... isn't there a very long discussion going on about using braces to define a code block?

view this post on Zulip Anton (Mar 04 2025 at 11:39):

We're aware of how inconvenient some of these things are, it's just very easy for subtle bugs to sneak into vital software if we don't do this.

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 11:41):

Anton said:

By the way, if your application is only working with ascii strings you can use the roc-ascii package

is there a manual / tutorial somewhere?

view this post on Zulip Anton (Mar 04 2025 at 11:46):

I assume you mean for roc-ascii specifically?
It would indeed be good to add some examples to the roc-ascii repo's README.

You probably want to use these functions:
https://hasnep.github.io/roc-ascii/Ascii/#from_str
https://hasnep.github.io/roc-ascii/Ascii/#to_str
https://hasnep.github.io/roc-ascii/Ascii/#sort_asc
https://hasnep.github.io/roc-ascii/Ascii/#sort_desc

view this post on Zulip Anton (Mar 04 2025 at 11:49):

This code uses the old Roc syntax but may still be helpful for you

view this post on Zulip Brendan Hansknecht (Mar 04 2025 at 17:26):

Jonathan Kelly said:

I would just say that excluding a feature that every other computer language in the known universe seems to provide without causing even a single computer to implode is a much bigger issue than putting people off because of indent blocks.

I understand you are frustrated, but please refrain from adding accusatory hyperbole.


Roc restrict users from doing many things. String comparisons is one of them. Roc is being "friendly" by reducing the design space and trying to remove common pitfalls. Strings are full of common pitfalls that developers get burned by all the time. Not having string comparison, string byte indexing, captialization, character splitting, etc directly in the language has lead to many productive discussions with developers about common pitfalls, workarounds, and future plans.

Long term, we do want an official unicode package that has all the information necessary to make the correct decisions. On top of that, we likely will add some methods to the standard library that do the naive things full of pitfalls (those methods will just be very explicitly named around the pitfall, have strong docs explaining the issues, and link to the unicode library function that can handle the problem without pitfalls). We just haven't reached this point yet cause wrangling unicode and localizations is a lot of work and we have many things to get done.

view this post on Zulip Hannes (Mar 04 2025 at 18:11):

We should probably add a special error message explaining all this if someone tries to use the number comparison operators on strings, but that's probably a longer term thing

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 20:08):

If anyone is interested in one person's experience, which will also explain the level of my frustration that lead to the hyperbole, the fact that this "feature" of roc is nowhere to be found on the roc-lang.org website, feels deceptive. Or is the word disingenuous?

I read the tutorial and was impressed enough to give roc a try. Would I have invested the last 3 days of my free time if I'd encountered this fact in the tutorial, or the Str manual page, which I had read from top to bottom? I don't know, but at least I would have known up front, and had some time to consider the implications before the frustration set in. And I would NOT have wasted the better part of an hour trying to work out what I had done wrong to make the compiler think one of the elements in my program should be a Num *.

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 20:10):

... and no, I didn't actually audit the entire website .. I asked chatgpt, and then asked it to check again.

I couldn't find an explicit statement on the Roc Programming Language's website
about the absence of built-in functions for lexicographical string comparisons using
less than or greater than operators. However, the Str module documentation lists
the available string operations, and functions for direct lexicographical comparisons
are not included. This omission indicates that such comparisons are not currently
supported in Roc's standard library.

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:17):

I think the answer here is that we should special-case the compiler error message for this to explain not just why it's gone, but also why it's the best design

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:19):

also once we have the Sort module, the error message can suggest something from there as an alternative

view this post on Zulip Jonathan Kelly (Mar 04 2025 at 20:19):

Or, you could be transparent, and put it in the Tutorial. Or the FAQ. Or the Str reference page.

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:24):

I understand that you're upset about this. Message received. You don't need to keep hammering that point home. :smile:

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:25):

it definitely seems like a good idea to put this in the Str docs explicitly

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:26):

I think the tutorial needs to stay focused on "here's how to get to the next step" and not go on too many tangents, and this feels like it would be too much of a tangent

view this post on Zulip Richard Feldman (Mar 04 2025 at 20:27):

I could also see FAQ, maybe under a broader umbrella of "things Roc intentionally does differently when it comes to strings" - and maybe just link to the Str docs

view this post on Zulip Anton (Mar 05 2025 at 09:21):

I'll make those additions

view this post on Zulip Anton (Mar 05 2025 at 15:59):

#7667
PR#7666

view this post on Zulip Peter Marreck (Mar 10 2025 at 15:50):

@Jonathan Kelly

without causing even a single computer to implode

LOL. Literally the source of all bugs is due to a discrepancy between the mental model of the developer of the language's design, the mental model of the developer using the language, any unaddressed loose ends or corner cases, and how the language actually behaves. Some of these discrepancies can be subtle, and those are some of the worst- I've literally spent entire developer-months trying to run down bugs based on this sort of thing. So experienced developers have learned, the hard way, that the more explicit you are, the better, simply because it reduces the potential surface area of the aforementioned things.

Look up "Postel's Law" and how it's become seen over the past 20+ years, as to why the Roc designers probably made this (IMHO correct, if a bit frustrating) choice. And, you can always just write a function to encode the string you're trying to compare into any ordinal numeric pattern you want and then compare those (numerically)- but even then, that would expose cases you might not have considered- For example, which word is "less than" the other in the following 2 words: "auto" and "automatic"? What about "auto" and "áüto"? (Do you ignore foreign diacritics in comparisons, or do you sort them differently, and if so, what order? Etc.) Is a capital "less than" or "greater than" its equivalent lowercase?

view this post on Zulip Peter Marreck (Mar 10 2025 at 15:58):

I would much rather have the language force me to explicitly define these textual relationships than stick in its own possibly flawed interpretation of them (which might end up surprising me on certain inputs, thus causing bugs).
Thanks, Roc language designers. Keep going. ;)
I would perhaps agree that a helpful error message WHEN someone tries to directly compare two strings (in an ordinal fashion, not a direct equivalence test) might be helpful to beginners.

view this post on Zulip Anton (Mar 11 2025 at 11:03):

without causing even a single computer to implode

This video shows why it's important to properly handle unicode and string length: https://www.youtube.com/watch?v=rgsIkZkflMw
No computers imploded but Chinese hackers did get access to a US treasury database.

view this post on Zulip Peter Marreck (Mar 11 2025 at 21:00):

Just watched that not 2 hours ago after it was recommended in my YT feed.
Really astonishing how string handling almost seems like it requires strict typing in any legitimate security context. Especially where UTF8 is concerned. (Notably, MAYBE, this bug wouldn't have occurred in a UTF-16 context, which is possibly why Microsoft chose that route instead...)
I'd normally bash PHP, but they were relying on the assumption that Postgres' own string-escape functions were not buggy or security-hole-ridden and calling into them...

view this post on Zulip Aline Thome (Mar 11 2025 at 21:09):

Richard Feldman said:

I could also see FAQ, maybe under a broader umbrella of "things Roc intentionally does differently when it comes to strings" - and maybe just link to the Str docs

As someone who has been tinkering with roc fairly casually for a bit, and who was also taken aback by how complicated string manipulation turned out to be (relative to my expectations, at least), I think this would be really nice.

I appreciate the reasoning for it and I'm not super mad about it, but I think currently there's a bit of friction that arises from the difference between the use case for which this a good design decision (production apps with many users who might not all be in the same place) and what I imagine is the median use case for Roc as a language that is still in alpha (quick and dirty apps for fun or personal use, written in Roc because the programmer was curious about the language).

It might be a little less of a frustrating thing to encounter if there's an warning and explanation somewhere easily found.

view this post on Zulip Anton (Mar 12 2025 at 09:59):

I think currently there's a bit of friction that arises from the difference between the use case for which this a good design decision (production apps with many users who might not all be in the same place) and what I imagine is the median use case for Roc as a language that is still in alpha (quick and dirty apps for fun or personal use, written in Roc because the programmer was curious about the language).

This has been added to the faq :)
An easily discoverable tip will be added to the new compiler when it is ready.

view this post on Zulip Austin Davis (Mar 18 2025 at 22:25):

Just a quick follow-up for people who are stuck on this issue:

If you really don't care about language-specific alphabetical ordering (which is super complicated and probably requires a full-blown Internationalization library), you can implement your own string_compare function by converting the Str types to List U8 using the List.to_utf8 function, and then recursively iterating over the two lists to compare the U8 values at each index. Something like this:

compare_utf8 : List U8, List U8 -> [EQ, LT, GT]
compare_utf8 = |bytes_a, bytes_b|
    when (bytes_a, bytes_b) is
        ([], []) -> EQ
        ([_a], []) -> GT
        ([], [_b]) -> LT
        ([head_a, .. as tail_a], [head_b, .. as tail_b]) ->
            when Num.compare(head_a, head_b) is
                EQ -> compare_utf8(tail_a, tail_b)
                res -> res

        _ -> EQ

compare : Str, Str -> [EQ, LT, GT]
compare = |str_a, str_b|
    bytes_a = Str.to_utf8(str_a)
    bytes_b = Str.to_utf8(str_b)
    compare_utf8(bytes_a, bytes_b)

expect compare("abc", "abc") == EQ
expect compare("abc", "abcd") == LT
expect compare("abcd", "abc") == GT
expect compare("abc", "abd") == LT
expect compare("abd", "abc") == GT

This will probably give you what you expect for most English language UTF-8 strings, but it definitely doesn't handle any edge cases. Still, it's good enough if you're implementing algorithms and data structures that don't require true alphabetical ordering.


Last updated: Jul 06 2025 at 12:14 UTC