ordering/sorting ability · ideas

there's a case to be made that we should have a built-in sorting/ordering ability (e.g. Ord in Rust or Haskell) that's auto-derived in the same way Eq and Hash are - e.g. for use cases like tree data structures.

I've always kind of assumed this was a good idea, but now I'm second-guessing that because I realized that sorting strings in an auto-derived way is subtly error-prone.

the specific thing I'm concerned about is sorting strings. If we want to make data structures as fast as possible, we should presumably sort by UTF-8 bytes without regard to the semantics of what those bytes represent.

In other words, sorting strings for purposes of figuring out where they go in an OrdMap data structure should be the same as deciding which of two List U8s goes before the other one - because that's the fastest thing we could possibly do, and in that use case all we care about is making the traversal faster than if we didn't sort them.

Richard Feldman (Oct 08 2023 at 12:55):

however, it's a different story if we want to sort strings "alphabetically" because the rules for what that means vary by locale. For example, in Danish, the sequence "aa" is sorted as a single character, and it comes after "z"

Richard Feldman (Oct 08 2023 at 12:57):

my concern is that if we auto-derive the "sort as fast as possible" way for strings, it will work fine for data structures, but it will have the unfortunately coincidental property of also looking like it's a correct "alphabetical" sorting implementation to English speakers, because ASCII happens to sort that way

Richard Feldman (Oct 08 2023 at 13:03):

so there will be a lot of cases of people using stuff like List.sort names and not realizing the edge cases that are being handled incorrectly

Richard Feldman (Oct 08 2023 at 13:03):

Richard Feldman (Oct 08 2023 at 13:10):

so, certainly having a Locale module (in a separate, non-builtins package because the rules for these change all the time and having to couple those updates to language updates would not be a good design) which offers locale-aware alphabetical sorting is a way to prevent those bugs, but the problem is that if there's a builtin alternative that looks correct, it'll definitely happen all the time that people will use the incorrect one instead because it's easier to reach for

Richard Feldman (Oct 08 2023 at 13:11):

and I don't see a way to make the correct version equivalently easy to reach for (let alone easier) because of other design constraints like needing to take an explicit argument of which locale you want, plus the fact that it shouldn't be a builtin so that it can be updated independently from the language

Richard Feldman (Oct 08 2023 at 13:12):

one idea is to just not auto-derive that ability for Str, similarly to how we don't auto-derive Eq for F64 and F32

Richard Feldman (Oct 08 2023 at 13:13):

but of course that would make strings a lot more annoying to use in data structures that use sorting, especially if the string is nested inside another data structure (e.g. records and lists) that could otherwise auto-derive it

Richard Feldman (Oct 08 2023 at 13:15):

but maybe that's okay, because in general hashing data structures will be faster and should be encouraged as the preferred way to store things unless you have a very specific use case? :thinking:

Sky Rose (Oct 08 2023 at 13:48):

Maybe there's two different kinds of Ords: A semantic order (for numbers, user-defined implementations, and not too much else) and a meaningless but consistent arbitrary order (probably literally everything).

Sky Rose (Oct 08 2023 at 13:57):

And there are different functions that depend on each of them according to what they need.

There could be a List.semanticSort that requires SemanticOrd, and doesn't work for strings unless you specify your Locale. Or maybe it doesn't use the ability and you have specify how to do it, like List.sortBy and you pass in an argument that's like Builtins.numbersDecreasing or (Locale.alphabetical Turkish).

And then a List.meaninglessSort if you only care about the order programmatically, that works on everything.

The hard part is then choosing names+interfaces to make that distinction easy to do the right thing. What if there wasn't a "List.sort" function to force people to think about it, or the compiler error for using "List.sort" on something without SemanticOrd explained that maybe they want List.meaninglessSort.

Brendan Hansknecht (Oct 08 2023 at 15:04):

Personally, I would just define ord, not put it on strings and add a custom compiler error message they describes the problem.

On top of that, I would add a sorting builtin that can take a lambda. Then someone can easily define the naive way if that is what they want. We could even make a built-in (Str.naiveCompareBytes) and let people know about it in the error message. I think that would enable quick scripting and most people that don't want to deal with UTF-8 intricacies, but it would also stick out like a sore thumb when used in code.

Brendan Hansknecht (Oct 08 2023 at 15:07):

That said, there is definitely merit in use case to having two forms of ord as @Sky Rose mentioned. In many cases you just need a full ordering for data structure function, but it doesn't matter what the ordering is as long as it is consistent.

Brendan Hansknecht (Oct 08 2023 at 15:09):

That said, I don't like the sound of having two versions where the more correct one is the one that can be used in less places. I think people would default to meaningless ord unless it was intentionally weird such that it could really only be used with data structures

Brendan Hansknecht (Oct 08 2023 at 15:10):

Kinda like go dictionaries and randomizing order intentionally to avoid accidental dependency on ordering.

Richard Feldman (Oct 08 2023 at 15:46):

yeah although it seems like doing that would probably require sacrificing some amount of performance unfortunately

Brendan Hansknecht (Oct 08 2023 at 22:45):

Richard Feldman (Oct 09 2023 at 01:15):

Brendan Hansknecht (Oct 09 2023 at 01:18):

I don't think so, just order starting with the last character and going towards the first

Brendan Hansknecht (Oct 09 2023 at 01:18):

Declan Joseph Maguire (Oct 10 2023 at 14:00):

Perhaps you could call it something like rawOrder. It implies something odd ("Hey, what's a 'raw' ordering?"), while also implying something about how it works (it orders based on some internal data, or how it's represented, not on some human readable meaning). A description could start with a phrase like "uses the internal bitstrings to give a unique ordering to a list" or something. Y'know, scary enough to let newbies know that it's definitely not alphabetical, but still simple enough that you could have the second sentence be "Useful for when you need to order a type uniquely, without caring about how", or somesuch, and people would mostly Get It.

Declan Joseph Maguire (Oct 10 2023 at 14:02):

Maybe "baseOrder", as it'd be some deeper thing that works for many types (all?). Point is, if it has a weird adjective on it, I'm gonna look at it funny, and so I probably won't make a mistake, and if I do, I'll easily remember it next time.

Declan Joseph Maguire (Oct 10 2023 at 14:05):

Declan Joseph Maguire (Oct 10 2023 at 14:22):

In fact, I think there's a second layer to the whole "pit of success" thing, which is signifiers. You can never idiot-proof something against the most advanced idiots yet manufactured, or in practice, that 1 in a 1000 brain fart. What can be useful however, is to signpost the mistakes. Nothing makes a concept stick in your brain harder than fucking up, blaming something else, only to realise YOU were the one to make that mistake. In that way, intentionally tricking newbies can be a useful tutorialisation. The final bit to make that work, however, is they need to be able to see that mistake coming in the future, some prompt that might trigger their memory. "sort" doesn't do that, but "rawSort" or "baseSort" or whatever definitely would. After making my mistakes, I'd look at that highly conspicuous adjective and go "yeah, I should have seen that coming", and when I saw it again I'd be reminded of that specific distinction that I'd previously neglected.

Brendan Hansknecht (Oct 10 2023 at 14:56):

The issue with adding an ability in general is that an end user may never see it. It will be auto serviced on a type and they will pass it to a function or data structure that automatically uses the ability without the end user ever realizing that RawOrder was going to be a foot gun.

When someone writes MyLib.SuperFastSort that depend añon RawOrd because it is faster, users will flock to it and not realizes that ramifications until they have to deal with locale related letter ordering or something. Very easy for these kinds of minor issues or spread across libraries and essentially infect swaths of code.

Declan Joseph Maguire (Oct 10 2023 at 15:15):

I might be misunderstanding, but I don't quite see the issue? Like, in that case shouldn't the library writer be held responsible for making a mistake that had been clearly signposted to them?

By this standard, isn't any builtin doomed because someone could wrap it in some other names, misuse it, and pretend it's good? This failure seems more accurately attributable to a bad library author and not a language flaw, and that seems like a class of mistake that we cannot stop hurting people. We can try to stop authors making mistakes, but we cannot stop people trusting their bad libraries once written. How much responsibility do we want to take on for library authors doing the wrong thing, despite giving them every contrary hint? We want to dissuade mistakes, but there's a fundamental floor below which no amount of good design can compensate flagrant misuse.

This may be another argument for making it obviously broken for anything other than arbitrary ordering. Maybe it's something like "byteOrder", where everything can be ordered by its bitstring representation, and we make sure it's functionally useless for anything other than just generating orders. Then all the inbuilts could have a natural order function in their namespace EXCEPT for strings/problematic cases, in which case you'd need to define the given order function. After all, if the function orders arbitrary types without natural orderings, it must be getting that information from somewhere, and a name that indicates "we are using the underlying bitstring" would be hard to accidentally misuse.

Brendan Hansknecht (Oct 10 2023 at 15:30):

Brendan Hansknecht (Oct 10 2023 at 15:32):

Also, I think this is a language and not library author problem because having RawOrder is equivalent to releasing a partially correct api that is meant for a very specific use case, but is super convenient to use in a lot of wrong places.

As such, it will very naturally proliferate. I mean it literally would be the only way to order strings directly exposed by the standard library.

Declan Joseph Maguire (Oct 10 2023 at 15:45):

If so, maybe a stranger ordering is needed, as was mentioned earlier. Is there a way to bitshift where the bits loop? I assume so, and a couple bitshifts would surely be very performant while still being too random to be useful. Maybe this is another example of why we should allow types that exclude other types. Like, define an order function that takes anything that ISN'T a string, with some useful compiler messages if you pass one. You'd probably want to include the string one in the namespace too, but called byteOrder or rawOrder etc so you know what you're doing. Or I guess include that one in all the namespaces, given it ought work on everything.

It's late here in australia, so I need to sleep and think. I feel there's a deeper design issue here worth surfacing, but I'm not in the mind to do it.

Brendan Hansknecht (Oct 10 2023 at 15:57):

Yeah, definitely there are options. As I suggested earlier, just sort starting from the end of the string. Preferably in large chunk based comparison for more perf and arbitrariness of the end output.

Kevin Gillette (Oct 11 2023 at 12:38):

In the example of Danish sorting, to understand that better, is it definitely a locale thing, where the Danish aa and z are in the ASCII range, yet have a different sort order when specific locale rules are applied, or are those Danish [language family] Unicode graphemes that happen to look like ASCII-range grapheme sequences, yet already have the locale-aware sorting semantics determined by the Unicode without needing specific locale-awareness?

Kevin Gillette (Oct 11 2023 at 12:44):

Regarding the topic overall, it seems there's a tension between determining whether it's safe or reasonable to compare two values for relative order, and whether it's safe or reasonable to sort a collection of those values.

Kevin Gillette (Oct 11 2023 at 12:46):

For floats, it seems quite necessary and common to compare two values for relative order, yet it seems relatively rare to need to sort a list of floats.

Kevin Gillette (Oct 11 2023 at 12:48):

For strings, I've used direct ordering comparison more commonly in low level algorithms, but have used sorting quite commonly in higher level code.

Kevin Gillette (Oct 11 2023 at 12:52):

I think being able to compare and support something holding string-like data is critical in a lot of cases. There are many flavors of algorithm that do not need to sort strings for _presentation_ purposes, yet absolutely are performing that sort for storage or other purposes.

Kevin Gillette (Oct 11 2023 at 13:00):

Consider /usr/share/dict/words: it needs a consistent, fast order for direct lookup, and that order really doesn't need to be locale aware. And because Unicode is very unlikely to reassign (to reorder) codepoints after release, the dict-words maintainers can minimize their maintenance work while retaining correctness if they specifically _ignore_ locale ordering rules: if the locale ordering gets a fix, the word list for a language doesn't need to change. Ultimately that's because ordering is only used for binary search, and operations on the word list typically just yield a single string (thus ordering is internal and not presentational).

Kevin Gillette (Oct 11 2023 at 13:02):

There are vastly many equivalent cases, and removing string ordering altogether (or forcing use of a "machine locale", i.e. "C") brings the risk of scaring people away and greatly increasing the community support burden.

Kevin Gillette (Oct 11 2023 at 13:04):

I think having easily accessible, well-named and well documented functions for these riskier cases is better than forcing all cases through a locale module.

Kevin Gillette (Oct 11 2023 at 13:15):

I think a separate named type for this kind of string case would also be a mistake. I don't recall if List U8 is comparable, but if so, that would probably serve fine for machine-sort cases, and you could just have a compiler error or warning if a Str is converted to List U8 just to compare for order but is discarded, or if a List Str is converted to a List (List U8) just to sort it and convert it back to a List Str again.

Kevin Gillette (Oct 11 2023 at 13:15):

Or perhaps we can introduce a new set of operators (or an operator qualifier sigil) for machine oriented comparisons in cases where the result may not always be intuitive to humans. This is probably not a great idea though.

Kevin Gillette (Oct 11 2023 at 13:22):

In any case, sourcing correct locales has historically been difficult, confusing, and messy. LOCALE on the CLI isn't as bad, and Roc CLI platforms may be able to yield decent results for little work on behalf of the app programmer, but afaict, the situation hasn't gotten much better for network services in the last few decades.

Kevin Gillette (Oct 11 2023 at 13:26):

For a CLI, such as sort, you don't need a Danish translation of the utility itself to provide correct results when given Danish input, provided the user has set their LOCALE correctly (which is likely). For a website, iiuc, there's rarely much point in being aware of a locale unless you also have a translation of the corresponding language.

Kevin Gillette (Oct 11 2023 at 13:36):

If we want to handle locales holistically in a way that application programmers won't mess up, I think we need platforms to provide a locale value (possibly opaque) that the app programmer can just pass along to the relevant string functions, and really have no other way to synthesize that value.

Kevin Gillette (Oct 11 2023 at 13:39):

Likewise, for cross-locale cases (you supplied your name in a Danish locale context, but I need to store many names independent of locale) we'd need a convention of machine-oriented operations either taking List U8; or having a separate set of functions that clearly are named and documented to not be for direct human consumption; or having high level functionality (like a database module) just doing the right thing internally (i.e. taking strings, converting to List U8, then never converting back prior to persisting the data).

Kevin Gillette (Oct 11 2023 at 13:43):

A good website platform would thus parse the request locale and language from headers, determine what can be satisfied using stored/registered translations, and then pass that opaque value to the app code. The app code simply wouldn't be able to use string functions without passing the value all the way down the call stack whenever string operations are involved.

Kevin Gillette (Oct 11 2023 at 13:44):

Though alternatively, perhaps we could have those stdlib string functions receive an implicit locale from the platform via hook.

Declan Joseph Maguire (Oct 11 2023 at 13:53):

I like the idea of having the platform pass it in. Fits well with the underlying metaphor where the platform effectively represents the whole context for the app's operation - and what would be more contextual than the language of the person running the program? But at that point, I think you'd want to generalise that concept a bit to make it less arbitrary, and then you allow the application to pass more and more in, and the distinction between app and platform begins to blur...

Brendan Hansknecht (Oct 11 2023 at 14:29):

Brendan Hansknecht (Oct 11 2023 at 14:30):

Also, I guess that fact that Strings can for the most part get seamlessly converted to List U8, does give people a very easy way to opt into naive sorting.

Brendan Hansknecht (Oct 11 2023 at 14:31):

It also is very honest about the fact that we aren't considering this a string with locale anymore, just a list of bytes. I think it is quite clear about the ordering intentions.

Brendan Hansknecht (Oct 11 2023 at 14:34):

Also, List U8 to Str conversion is essentially free, but Str to List U8 may require some cost due to List not supporting a small string style optimization.

Declan Joseph Maguire (Oct 11 2023 at 14:34):

Going back to the original posts, is there some way to restrict which types something like List.sortNames could accept to those with a canonical compare operation, and having strings be excluded from that? You could have some separate hidden function that is used for all the data-structure compares, where you're not actually exposing the ordering to the user (there might be a fancy type way of capturing this? If you can give a type to ordering functions in general, maybe the needed property is that the outputs of functions that accept an ordering are independent of that ordering).

Maybe you just shouldn't be using functions like "sort" unless it's 100% clear what you mean, and if it's not, the user supplies the function. Perhaps this internal function is exposed enough to use when desired, but it has the super clunky names that sound scary and non-obvious, or better, obviously not what you might naively expect

Brendan Hansknecht (Oct 11 2023 at 14:36):

Brendan Hansknecht (Oct 11 2023 at 14:37):

Type restriction is done through abilities (basically traits). This we could add Ord to everything but Str and it would not work with the default sort algorithm.

Declan Joseph Maguire (Oct 11 2023 at 14:41):

I was just playing around with generalising a concept, only to realise it started to undermine some of Roc's core design philosophy, or seemed to at least. Following a path to demonstrate it ends at a cliff.

Brendan Hansknecht (Oct 11 2023 at 15:13):

I get that, I more meant why? What exactly is the cliff? I want to understand that more.

Declan Joseph Maguire (Oct 13 2023 at 08:03):

God this thread is getting tangly, I'm having trouble tracing back through the arguments to figure out where everything came from. Anyways,

I stand by my idea that the platform handing the locale in is a good idea, and I realised that I'd misremembered bits of how importing worked (look I wrote my previous message late at night, I was a bit loopy). The present import structure is more than rich enough to allow passing some record embodying locale, and doing so could save a lot of headaches.

I see two potential issues with this. Firstly, locale norms aren't stable - what if a country changes its orthography? I think that you could handle that by making it an import that is separate to the platform.

Secondly, a locale record would probably be very complex, which worries me slightly given the record would almost certainly be heterogenous. Perhaps the tutorial is out of date, but the syntax of unpacking a payload from a tag is not well suited to simply using dot notation, it's all about pattern matching. This makes sense when passing tags from one function to another, but not when trying to pull a payload from a tag in a record.

Either we end up passing a monolithic locale record into everything that might use it, or we have to do some clunky unintuitive stuff just to pull out our desired functions, records, etc. We could just introduce some notation to make referencing a tagged object easier, which sounds like the sort of thing you should have anyway (if it hasn't already been done). And maybe passing around a monolithic object isn't a problem because the compiler can just optimise everything away? But having such a complex object be so core to calling inbuilt functions makes me nervous for some reason, like there's edge cases we couldn't consider. I don't have that reaction when imagining the locale having an alphabetise/alphabetize (lol) function that we can just pull out like locale.alphabetise and feed to a sort function.

Declan Joseph Maguire (Oct 13 2023 at 08:07):

Of course none of that solves the problem of how to name a sort function that works on strings and is purely computational rather than cultural, but at least we're closer to understanding how to handle locale dependency independent of platform dependency

Declan Joseph Maguire (Oct 21 2023 at 09:54):

It suddenly occured to me that all the things we talked about in this thread about how to handle localisation also applies to integer string representation. In India, they often don't use thousands and millions, they use "crore" and "lakh". Which might be an issue for string interpolation.

Hannes (Oct 22 2023 at 06:51):

An even more common split is which thousands separator and decimal point symbol to use, but it feels like a library would be the best place to put number serialisation

Declan Joseph Maguire (Oct 22 2023 at 06:54):

Do you reckon this should be another thread in ideas, if one such does not already exist? You sound like you'd know more about it than me, but string representations of numbers (especially in the context of string interpolation) seems like a topic worth discussing.

Luke Boswell (Oct 22 2023 at 07:04):

Luke Boswell (Oct 22 2023 at 07:05):

If we have a library for this then whenever unicode publishes a new version, there is little impact on the base Roc library or applications

Luke Boswell (Nov 23 2023 at 21:30):

This topic has taken somewhat of a detour, but just drafting up something on this. Looking for feedback on this gist which is a brief summary I have made of the current status of this proposal.

# Sort.roc
Sort implements
    compare : a, a -> [LessThan, GreaterThan] where a implements Sort

Richard Feldman (Nov 23 2023 at 21:35):

looks like a good start! I think it needs to return Equals as one of the options

Richard Feldman (Nov 23 2023 at 21:50):

of note, for integers, comparing them and getting 0 for equals, 1 for greater than, and 2 for less than (which is how the tags will line up due to alphabetical order) can be done like this:

int compare(int a, int b) {
    return (a > b) + 2 * (a < b);
}

Brian Carroll (Nov 24 2023 at 06:18):

I did a similar trick in the Wasm back end! Here's how I wrote it in the code comment:

// (x != y) as u8 + (x < y) as u8

Kasper Møller Andersen (May 17 2024 at 19:46):

I just want to add a couple of points here on the actual signature of the compare function used for sorting. Basically, instead of

compare : a, a -> [LessThan, Equal, GreaterThan] where a implements Sort

compare : { a : a, b : a } -> [ABeforeB, Equal, BBeforeA] where a implements Sort

The underlying reason is that, after years of doing Elm dev, I still cannot grok the use of LT/EQ/GT when it comes to sorting, and the previously suggested signature has the same problems I think. These are the major issues:

My proposal is basically to force the two inputs being compared to be named a and b, and then tailoring the return values to say if either a or b comes first. That way I think the signature becomes less abstract, and more tailored to sorting. Which also seems like this signature is being applied to here.

Isaac Van Doren (May 17 2024 at 22:35):

I’ve found LT/EQ/GT to be confusing also so I like this direction! I wonder if there is a way to make the API slightly simpler. Regardless, I think it is an improvement

Kasper Møller Andersen (May 18 2024 at 05:09):

Anton (May 20 2024 at 10:08):

Brendan Hansknecht (May 20 2024 at 11:04):

Not one you should think about/be worried about. Especially not for a small record like this.

Anton (May 20 2024 at 11:30):

I like this proposal in general, I would however use e.g. x for the type variable to avoid having two a's with different meanings:
compare : { a : x, b : x } -> [ABeforeB, Equal, BBeforeA] where x implements Sort

Kasper Møller Andersen (May 20 2024 at 11:38):

Is there a convention for how to choose the type variable for generics in Roc? I feel like I’ve seen a used the most, so if that’s the convention, maybe we should find different names for the two items being sorted instead. x and y for example? Then a can still be used for the type.

Kiryl Dziamura (May 20 2024 at 11:44):

It can be compare : a, a -> [LeftFirst, RightFirst, Equal] where a implements Sort

Kasper Møller Andersen (May 20 2024 at 11:50):

I do like that that signature is simpler, but I also really like that the return type names which input it’s referring to more directly :thinking:

Kasper Møller Andersen (May 20 2024 at 12:08):

Okay, so typing out a more involved example like so, where the return type either refers to the input by name or by position:

myComparison = \this that ->
    when (this, that) is
        (Unavailable, Unavailable) ->
            Equal
        (Unavailable, _) ->
            ThatFirst
        (_, Unavailable) ->
            ThisFirst
        (Available thisData, Available thatData) ->
            when (thisData, thatData) is
                (Ok this_, Ok that_) ->
                    Num.compare this_ that_
                (Err _, Err _) ->
                    Equal
                (Err _, _) ->
                    ThatFirst
                (_, Err _) ->
                    ThisFirst

myComparison = \this that ->
    when (this, that) is
        (Unavailable, Unavailable) ->
            Equal
        (Unavailable, _) ->
            RightFirst
        (_, Unavailable) ->
            LeftFirst
        (Available thisData, Available thatData) ->
            when (thisData, thatData) is
                (Ok this_, Ok that_) ->
                    Num.compare this_ that_
                (Err _, Err _) ->
                    Equal
                (Err _, _) ->
                    RightFirst
                (_, Err _) ->
                    LeftFirst

Brendan Hansknecht (May 20 2024 at 15:05):

I am very much not a fan of the ABeforeB stuff. LeftFirst and such sound fine.

Jasper Woudenberg (May 20 2024 at 17:29):

In either the ABeforeB or LeftFirst varieties, should we change terms like 'ascending' and 'descending' too? I think 'ascending' and 'descending' become ambiguous when we move away from using 'less than' and 'greater than' to describe ordering of elements, no?

Anton (May 20 2024 at 17:31):

Kasper Møller Andersen (May 20 2024 at 17:46):

What's the value of ascending and descending being baked into the sorting API though?

I think it's fine to have, say, Num.compareAscending and Num.compareDescending functions, which are implemented in terms of LeftFirst and friends. But the "sorting primitives" don't need to encode that information in my mind. First off, the code can't probe a given sorting of it's ascending or descending based on this information anyway, so it's only there to aid the developer. But as I wrote previously, I think they hurt more than they help in this regard.

Richard Feldman (May 20 2024 at 18:04):

so like if I call it with foo and bar as the argument, Keep means keep them in that order, and Swap means swap the order

Richard Feldman (May 20 2024 at 18:04):

I guess that misses the semantic distinction of "they are equal, so do whatever you want" versus "they must be swapped"

Richard Feldman (May 20 2024 at 18:05):

Richard Feldman (May 20 2024 at 18:08):

areSorted : a, a -> [Yes, No, Same]
    where a implements Sort

Jasper Woudenberg (May 20 2024 at 18:25):

Not really, but I thought it'd be a downside to moving away from LessThan / GreaterThan if we couldn't come up with alternatives for ascending and descending, as we'd remove ambiguity in one place at the cost of adding it somewhere else.

areSortedAsc : a, a -> [Yes, No, Same]

pickSmallest : a, a -> [Left, Right, Same]

Richard Feldman (May 20 2024 at 18:33):

areSortedAsc : a, a -> [Yes, No, AreEqual]
    where a implements Sort

Richard Feldman (May 20 2024 at 18:33):

I'd like to avoid "left" and "right" as an accessibility consideration, because some people mix up left and right

timotree (May 20 2024 at 18:36):

I don't think "ascending" and "descending" are helpful terms for a general type that implements sort. e.g. I could make an opaque wrapper around U32 which sorts higher numbers before lower ones, and now "ascending" sort for my type results in a list of numbers in descending order. Maybe instead we could have descriptive specialized variants like List.sortAsc : List (Num a) -> List (Num a) or List.sortAsciibetical : List Str -> List Str

Kiryl Dziamura (May 20 2024 at 19:34):

What I like about @Kasper Møller Andersen proposal is that it explicitly rejects ordering in arguments. Instead of implicitly ordered (a, b) he suggests explicitly unordered {a, b} so it’s much simpler now to answer the question “how these unordered items are related?”: the answer does not require any hidden information from arguments ordering then. Asc/desc, less/greater, areSorter - all of them imply the args order is important.

In my proposal, I tried to get rid of the layer of indirection introduced by a/b. So LeftFirst means “left argument is first”. A11y makes sense, but FirstFirst and SecondFirst would be much worse :grinning_face_with_smiling_eyes:

Richard Feldman (May 20 2024 at 19:49):

a reason I prefer something like areSortedAsc is that it's easier for me to see the sorting visually, e.g. if a and b are numbers then:

areSortedAsc = \a, b ->
    if a |> isLessThan b then
        Yes
    else if a == b then
        AreEqual
    else
        No

Richard Feldman (May 20 2024 at 19:50):

compare = \{ a, b } ->
    if a |> isLessThan b then
        ABeforeB
    else if a == b then
        AreEqual
    else
        BBeforeA

Kasper Møller Andersen (May 20 2024 at 19:50):

I would also like to keep ascending/descending completely out of the API if possible, as I think that will become confusing quickly for anything that's not numbers.

I think it's worth asking what kind of sorts people will write out there. Presumably the basic sorting of numbers and strings will be covered by the standard library/some popular package. Skimming over our Elm code base at work (which of course biases uses towards web apps), the main uses for custom sorting functions goes something like:

Which is why I previously wrote this example as a representative custom sorting:

myComparison = \this that ->
    when (this, that) is
        (Unavailable, Unavailable) ->
            Equal
        (Unavailable, _) ->
            RightFirst
        (_, Unavailable) ->
            LeftFirst
        (Available thisData, Available thatData) ->
            when (thisData, thatData) is
                (Ok this_, Ok that_) ->
                    Num.compare this_ that_
                (Err _, Err _) ->
                    Equal
                (Err _, _) ->
                    RightFirst
                (_, Err _) ->
                    LeftFirst

What I like about having LeftFirst and RightFirst here is that I usually don't have to relate back to function signature while I'm pattern matching, because looking at branches like:

    when (this, that) is
        (Unavailable, Unavailable) ->
            Equal
        (Unavailable, _) ->
            RightFirst

the pattern match itself also tells me the order. I.e. I've pattern matched that the left item is Unavailable, so using RightFirst makes very good sense without reading anything else. In this case I also prefer LeftFirst and RightFirst over being ABeforeB and BBeforeA, because whether I'm looking at a or b gets lost when I only look at my pattern match branch. I have to go to the top of my pattern match to read which one is a and which one is b.

If we want to avoid using left and right for accessibility (which is fair), then I think I would revert to preferring my original proposal.

Kasper Møller Andersen (May 20 2024 at 19:56):

(also, @Richard Feldman, what language do you specify in your code blocks to get good syntax highlighting for Roc code? :smiley:)

Richard Feldman (May 20 2024 at 19:58):

I alternate between perl and haskell depending on whether there are comments :laughing:

Richard Feldman (May 20 2024 at 19:58):

I wonder how we get more syntax highlighting in there...we got it on GitHub, should be possible to get it on Zulip somehow, right? :grinning_face_with_smiling_eyes:

Jasper Woudenberg (May 20 2024 at 20:21):

If we want to remove the asceding/descending terminology, maybe going back to Richard's areSorted (without the Asc):

areSorted : a, a -> [Yes, No, AreEqual]

sort : List a -> List a

sortReverse : List a -> List a

Kasper Møller Andersen (May 20 2024 at 21:01):

One more thing: I know we are discussing implementing a function as part of an ability here, which means it will have a name when users implement it too. But are users able to pass a sorting as just a closure? Because if so, we can't rely too much on the function name to convey meaning to users. Because they either won't give it a name at all, or they will give it whatever name they think makes sense for them at least.

So something like areSorted : a, a -> [Yes, No, AreEqual] won't work well for examples like:

List.sortWith myList \this that ->
    when (this, that) is
        (Unavailable, Unavailable) ->
            AreEqual
        (Unavailable, _) ->
            No
        (_, Unavailable) ->
            Yes
        (Available thisData, Available thatData) ->
            when (thisData, thatData) is
                (Ok this_, Ok that_) ->
                    Num.compare this_ that_
...

because Yes and No only make sense when you have a function named areSorted attached to them. So in that sense, Keep and Swap are definitely better, because they imply an ordering without needing a function name to convey context.

Maybe the return type should be [AlreadyOrdered, SwapOrder, EquallyOrdered] to convey enough context without a function name?

Richard Feldman (May 20 2024 at 22:51):

Richard Feldman (May 20 2024 at 22:52):

areSorted : a, a -> [AreSortedForwards, AreSortedBackwards, AreEqual]

kinda wordy, but these functions get written by hand extremely rarely, so maybe fine

Kiryl Dziamura (May 21 2024 at 05:29):

areSorted : a, a -> [Sorted, Unsorted, Equal]

areOrdered : a, a -> [Ordered, Unordered, Equal]

Kiryl Dziamura (May 21 2024 at 06:04):

In this case, it‘ much simpler to think about an infix operation, so the LT, GT makes sense when to put them between the args (a GT b). Abstract alternative naming for tags might be [Precedes, Succeedes, Equals]

Here, we check the order status of the pair as a collection of two items. So the possible tags might be [Ordered, Unordered, Equal]

Jasper Woudenberg (May 21 2024 at 06:15):

Another idea, we could make sortWith take a record so we can use areSorted to name the compare function:

sortWith : { items: List a, areSorted: a, a -> [Yes, No, AreEqual] } -> List a

Kasper Møller Andersen (May 21 2024 at 15:44):

I like thinking about it this way, but I think there's a third question we could also ask:

Granted, this could also be considered a sub-question of the first one, but I also think it's the simplest question we can actually ask. Breaking down the material we have so far, I think we can categorize them something like:

I've also sorted ( :drum:) them by what I personally consider their mental load to be. I put the "point to an element" at the top, both because it requires the least amount of context to decide what's right, and also because most of the sorting implementations I've seen fall very nicely into that category.

I'm on board with dropping LeftFirst and RightFirst for accessibility, but I really want to try and find a different solution which could in that category instead. I would also love to hear from anyone who disagrees with my ranking of mental load, since that's very biased towards what I've personally experienced and seen my coworkers do.

Kasper Møller Andersen (May 21 2024 at 15:59):

sorter : a, a -> [BumpFirstElement, BumpSecondElement, Equal]

List.sortWith myList \a, b ->
    when (a, b) is
        (NotPresent, NotPresent) ->
            Equal
        (NotPresent, _) ->
            BumpSecondElement
        (_, NotPresent) ->
            BumpFirstElement
        (Present _a, Present _b) ->
            Str.sorter _a _b

sorter= \a, b  ->
    if a |> isLessThan b then
        BumpSecondElement
    else if a == b then
        Equal
    else
        BumpFirstElement

I also still think that very few people will ever actually write a numeric sort, as they will just use what's in the standard library. Sorting on custom types in a pattern match is around 90% of the sort implementations I've seen in Elm (on the 20-30 different front end sortings we have implemented in our code base)

Kasper Møller Andersen (May 21 2024 at 16:18):

Although definitely this isn’t the most important API to work on necessarily, and I think most suggestions so far have been a big step up from the old API. So it’s not a hill I’m gonna die on :smiley:

Norbert Hajagos (May 22 2024 at 09:41):

I don't really get the Bump word in this context, despite hearing it many times. I'm not a native English speaker. To me it means "make that element higher up", but that assumes higher up means ...err.. later in the sorted sequence? From the example implementation that is correct, but I wouldn't have figured that out.

areSorted : a, a -> [Sorted, Unsorted, Equal]

I like it the most when the Tag directly describes the relation between the two value, but the use of a record to name the two vars feel too meticulous. The user may also want to name the vars differently especially when the sort function is defined not as an anonymous fn like with List.sortWith. For an opaque User type, I prefer

sort = \user1, user2 -> ...`

sort = \{a: userA, b: userB} -> ...

How do you like [FirstBeforeSecond, SecondBeforeFirst, Equal]?
It would keep the clarity of [ABeforeB, ...], no Asc / Desc, all without the extra record. It would be understandable in an anonymous fn as well as in an ability implementation.

sort= \enemy1, enemy2 ->
    if enemy1.health |> isLessThan enemy2.health then
        FirstBeforeSecond
    else if enemy1.health == enemy2.health then
        Equal
    else
        SecondBeforeFirst

Kasper Møller Andersen (May 22 2024 at 10:17):

Yeah, with Bump I was looking for a verb that describes moving something forward. I was toying with Promote too, but also felt it was too far away from sorting to be good.

Kasper Møller Andersen (May 22 2024 at 10:18):

Brendan Hansknecht (May 23 2024 at 11:06):

...
Obviously the spaceship operator <=> is the only correct way to do this :stuck_out_tongue_wink:.

Kasper Møller Andersen (May 23 2024 at 13:57):

Could you write an example of how the operators would be used? I’m not really following how you want to apply them :smile:

Kiryl Dziamura (May 23 2024 at 14:00):

My concern is that despite LT and GT being common, they're not the best for reasoning about order. Especially considering comparison operators which are perfect for numbers comparison but confusing for anything else. I mean, in terms of friendliness, it's not obvious what a < b means if a and b aren't numbers, you have to be aware of the ability implementation to use the operators.
But I agree, other might be odd for experienced programmers

Kasper Møller Andersen (May 24 2024 at 08:04):

I do agree it's valuable to let people reuse their knowledge from elsewhere, but I would say the LessThan/GreaterThan idiom is not as common as people might think, nor as learnable. Of course, the big elephant in this API room is the use of -1, 0, and 1 to mean these things (or just a boolean, if you're C++).

Using these integers is extremely common of course, (JavaScript, Python, Java, C, C#, Kotlin, etc.), and probably covers the vast majority of code running in production today.

It's not totally clear how much use each language has, but I don't think it's controversial to say JS is the biggest. And it does not use the Lesser/Greater terminology to describe these integers. Instead, MDN says:

Which I think aligns much better with something like FirstBeforeSecond for example than with LessThan.

Secondly, if the API was to be tags named something like LT/EQ/GT, then I think I would personally prefer having the integers actually. Aside from aligning closer with existing languages, I think it also has the advantage that it is completely unreadable if you don't already know about it, which basically forces you to read the documentation.

Having said that, I still agree with @Kiryl Dziamura that the whole Lesser/Greater thing is just generally not that great for describing orderings:

I agree we've gone over a wide selection of API possibilities here, but I think we would benefit from just trying something out at this point. The language is still unstable, so maybe just pick one, and change it if it doesn't work out?

For my money, it seems something like [FirstBeforeSecond, SecondBeforeFirst, Equal] addresses most concerns out there, so I would pick that personally.

Kasper Møller Andersen (May 26 2024 at 07:01):

Sorry to keep banging on this drum, I'm just writing down my thoughts to get them out of my head here :smile:

Most languages have the semantic of their sorting mean "first arg is lesser/greater than second arg", and their return types should basically be viewed as an infix operator to make sense. But since Roc doesn't allow people to make/override infix operators, designing a return type a la LessThan which should be considered as an infix operator doesn't rely mesh well with the language.
.
.

Kasper Møller Andersen (May 26 2024 at 07:25):

I worry that we are generally waaay overthinking this change, and we're getting stuck in analysis-paralysis here with little benefit to come out of it, so I want to try and summarize what I think is for and against:

I still think [FirstBeforeSecond, SecondBeforeFirst, Equal] fits the bill perfectly. The overall shape of the function signature remains the same as everywhere else, it translates easily to the concepts people already have from other languages, and it gets rid of the ambiguities of "lesser" and "greater" as pertaining to sorting.

If anybody has arguments against [FirstBeforeSecond, SecondBeforeFirst, Equal], I would love to hear them. Because right now it just feels we are worried about making this choice, and I can only speculate why. So I'd love to hear what is holding people back here :blush:

Brendan Hansknecht (May 26 2024 at 14:03):

Though a bit verbose. I agree with your selection. I think it is the easiest to just read and understand without needing to think about it. That said, I don't think we should use Equal. I think we should use Equivalent. The values might fail an a == b test despite getting that tag result.

Brendan Hansknecht (May 26 2024 at 14:04):

Also, we aren't against custom infix as a whole. For example, you can override == for an opaque type. That said, <= and friends make a lot less sense to override on most types that aren't numerical.

Brendan Hansknecht (May 26 2024 at 14:06):

So we are a lot more hesitant to enable those infix operators to be overridden. If we did allow them to be overridden, it would probably be via some other ability that forces implementing the full set of numeric math operations or similar. I think this is quite unlikely though cause it leads to a lot of confusing code when used on non-numeric types.

Richard Feldman (May 26 2024 at 20:36):

I'm personally okay with trying this out (I agree with Brendan's point about Equivalent), but I heard a good rule from a language designer (Guido maybe?) which is basically "if you can't decide how to proceed, always default to status quo even if it's not what anyone wants individually"

Richard Feldman (May 26 2024 at 20:42):

the reasoning is basically that changes always have a long term cost - either accumulated API surface area if you don't want to break backwards compatibility, or else accumulated breaking change fatigue

Richard Feldman (May 26 2024 at 20:44):

I think it's fine in this particular case to give it a shot, but I wanted to note that in general I think it's a good idea to be mindful that there's a good argument for staying with the status quo in the face of indecision as to which specific alternative to try

Ayaz Hafiz (May 26 2024 at 23:11):

My 2c is the well known thing (Lt/Gt/Eq) here is better. it’s familiar and I dont think it’s ambiguous (I’m not sure how else you could interpret a, b -> Lt/Gt/Eq other than as a relationship between a and b). Sorting/ordering inherently implies an order, and to me every other option seems like a different set of names for these kinds ordering binary operators.

Zeljko Nesic (May 27 2024 at 09:52):

Kasper Møller Andersen (May 27 2024 at 11:15):

The fact that LT/GT/EQ really wants to resemble </>/== here is a detriment to how easy it is to understand in my book.

You're right that it's just a different set of names, but that's a feature! :smiley: Everything still works as expected, but rather than the API relying on being a metaphor for mathematic ordering, which works for some but causes confusion for others, the new names just makes everything more explicit.

I find it similar to how Roc doesn't have a List.filter function, but List.keepIf instead. It's "just" a rename of course, but it's much clearer still. As someone who has been programming with functional programming of various forms for 10+ years, I still find myself regularly looking up whether returning True in my filter predicate means the element is kept or discarded.

Ben Plotke (Jun 01 2024 at 11:13):

Sorry to be late to the party. I really like the [LessThan, GreterThan, Equals] tags. To me at least, I do not feel there is a mental load in knowing that means "a is 'LessThan' b". In languages with compare returning -1, 0, 1, I translate in my head to LessThan, Equal, GreaterThan. I completely agree on the using Equivalent instead of Equals.

Eric Rogstad (Jun 05 2024 at 23:28):

Just want to suggest that if we ever go with the [Sorted, Unsorted, Equivalent] idea, we might want to consider Reversed instead of Unsorted.

Eric Rogstad (Jun 05 2024 at 23:29):

To me "unsorted" suggests a sort of neutral state of not having been considered yet. Rather than that the items are out of order.

Kasper Møller Andersen (Jan 14 2026 at 14:26):

I don't know if this was ever implemented or not, but for anyone who doesn't know or remember the discussion, there was some consensus to try replacing the [LessThan, GreaterThan, Equals] ordering tags with [FirstBeforeSecond, SecondBeforeFirst, Equivalent] as an experiment, but many people still prefer the old names. There's also a slightly longer summary for the curious, as well as my thorough description of why I find the LT/GT/EQ API poor.

Anyway, the reason I'm returning to this thread is because I had an experience that just reiterated my desire for this change:

When I started this thread, I found the LT/GT/EQ API confusing. Going through the discussion, I learned all the ins and outs of it though, and though I still stood by my points, I at least understood the old API properly. However, some months later, I had to review an Elm PR where a new ordering had been implemented with LT/GT/EQ, and I found myself confused again. I couldn't recollect the understanding I had built up previously.

Which reminded me of a point a lecturer at university really wanted to hammer home: bad UX is bad forever. That is, they had conducted studies on people using software with poor UX, and found the users had barely improved their ability to overcome the poor UX a year later. The users weren't simply getting used to the poor UX and working around it, they were continuously penalized by it. And I think the same thing happened to me with this API.

Anton (Jan 14 2026 at 14:38):

Rick Hull (Jan 20 2026 at 21:14):

@Kasper Møller Andersen This seems to me like the equivalent of a "code smell", maybe a "concept smell" or "design smell", and should thus have some attention paid to it.

Anton (Jan 21 2026 at 09:20):

Rick Hull (Jan 21 2026 at 14:37):

Just affirming that when this happens, it's worth considering if there isn't a nicer or more intuitive design hiding somewhere. Matching user intuition is not the highest goal, and there are always tradeoffs.

Stream: ideas

Topic: ordering/sorting ability

Richard Feldman (Oct 08 2023 at 12:53):

Richard Feldman (Oct 08 2023 at 12:55):

Richard Feldman (Oct 08 2023 at 12:57):

Richard Feldman (Oct 08 2023 at 13:03):

Richard Feldman (Oct 08 2023 at 13:03):

Richard Feldman (Oct 08 2023 at 13:10):

Richard Feldman (Oct 08 2023 at 13:11):

Richard Feldman (Oct 08 2023 at 13:12):

Richard Feldman (Oct 08 2023 at 13:13):

Richard Feldman (Oct 08 2023 at 13:15):

Sky Rose (Oct 08 2023 at 13:48):

Sky Rose (Oct 08 2023 at 13:57):

Brendan Hansknecht (Oct 08 2023 at 15:04):

Brendan Hansknecht (Oct 08 2023 at 15:07):

Brendan Hansknecht (Oct 08 2023 at 15:09):

Brendan Hansknecht (Oct 08 2023 at 15:10):

Richard Feldman (Oct 08 2023 at 15:46):

Brendan Hansknecht (Oct 08 2023 at 22:45):

Richard Feldman (Oct 09 2023 at 01:15):

Brendan Hansknecht (Oct 09 2023 at 01:18):

Brendan Hansknecht (Oct 09 2023 at 01:18):

Brendan Hansknecht (Oct 09 2023 at 01:18):

Brendan Hansknecht (Oct 09 2023 at 01:18):

Declan Joseph Maguire (Oct 10 2023 at 14:00):

Declan Joseph Maguire (Oct 10 2023 at 14:02):

Declan Joseph Maguire (Oct 10 2023 at 14:05):

Declan Joseph Maguire (Oct 10 2023 at 14:22):

Brendan Hansknecht (Oct 10 2023 at 14:56):

Declan Joseph Maguire (Oct 10 2023 at 15:15):

Brendan Hansknecht (Oct 10 2023 at 15:30):

Brendan Hansknecht (Oct 10 2023 at 15:32):

Declan Joseph Maguire (Oct 10 2023 at 15:45):

Brendan Hansknecht (Oct 10 2023 at 15:57):

Kevin Gillette (Oct 11 2023 at 12:38):

Kevin Gillette (Oct 11 2023 at 12:44):

Kevin Gillette (Oct 11 2023 at 12:46):

Kevin Gillette (Oct 11 2023 at 12:48):

Kevin Gillette (Oct 11 2023 at 12:52):

Kevin Gillette (Oct 11 2023 at 13:00):

Kevin Gillette (Oct 11 2023 at 13:02):

Kevin Gillette (Oct 11 2023 at 13:04):

Kevin Gillette (Oct 11 2023 at 13:15):

Kevin Gillette (Oct 11 2023 at 13:15):

Kevin Gillette (Oct 11 2023 at 13:22):

Kevin Gillette (Oct 11 2023 at 13:26):

Kevin Gillette (Oct 11 2023 at 13:36):

Kevin Gillette (Oct 11 2023 at 13:39):

Kevin Gillette (Oct 11 2023 at 13:43):

Kevin Gillette (Oct 11 2023 at 13:44):

Declan Joseph Maguire (Oct 11 2023 at 13:53):

Brendan Hansknecht (Oct 11 2023 at 14:29):

Brendan Hansknecht (Oct 11 2023 at 14:30):

Brendan Hansknecht (Oct 11 2023 at 14:31):

Brendan Hansknecht (Oct 11 2023 at 14:34):

Declan Joseph Maguire (Oct 11 2023 at 14:34):

Brendan Hansknecht (Oct 11 2023 at 14:36):

Brendan Hansknecht (Oct 11 2023 at 14:37):

Declan Joseph Maguire (Oct 11 2023 at 14:41):

Brendan Hansknecht (Oct 11 2023 at 15:13):

Declan Joseph Maguire (Oct 13 2023 at 08:03):

Declan Joseph Maguire (Oct 13 2023 at 08:07):

Declan Joseph Maguire (Oct 21 2023 at 09:54):

Hannes (Oct 22 2023 at 06:51):

Declan Joseph Maguire (Oct 22 2023 at 06:54):

Luke Boswell (Oct 22 2023 at 07:04):

Luke Boswell (Oct 22 2023 at 07:04):

Luke Boswell (Oct 22 2023 at 07:05):

Luke Boswell (Nov 23 2023 at 21:30):

Richard Feldman (Nov 23 2023 at 21:35):

Richard Feldman (Nov 23 2023 at 21:50):

Brian Carroll (Nov 24 2023 at 06:18):

Kasper Møller Andersen (May 17 2024 at 19:46):

Isaac Van Doren (May 17 2024 at 22:35):

Kasper Møller Andersen (May 18 2024 at 05:09):

Anton (May 20 2024 at 10:08):

Brendan Hansknecht (May 20 2024 at 11:04):

Anton (May 20 2024 at 11:30):

Kasper Møller Andersen (May 20 2024 at 11:38):