there's a case to be made that we should have a built-in sorting/ordering ability (e.g. Ord in Rust or Haskell) that's auto-derived in the same way Eq and Hash are - e.g. for use cases like tree data structures.
I've always kind of assumed this was a good idea, but now I'm second-guessing that because I realized that sorting strings in an auto-derived way is subtly error-prone.
the specific thing I'm concerned about is sorting strings. If we want to make data structures as fast as possible, we should presumably sort by UTF-8 bytes without regard to the semantics of what those bytes represent.
In other words, sorting strings for purposes of figuring out where they go in an OrdMap data structure should be the same as deciding which of two List U8s goes before the other one - because that's the fastest thing we could possibly do, and in that use case all we care about is making the traversal faster than if we didn't sort them.
however, it's a different story if we want to sort strings "alphabetically" because the rules for what that means vary by locale. For example, in Danish, the sequence "aa" is sorted as a single character, and it comes after "z"
my concern is that if we auto-derive the "sort as fast as possible" way for strings, it will work fine for data structures, but it will have the unfortunately coincidental property of also looking like it's a correct "alphabetical" sorting implementation to English speakers, because ASCII happens to sort that way
so there will be a lot of cases of people using stuff like List.sort names and not realizing the edge cases that are being handled incorrectly
so it's a pitfall instead of a pit of success
so, certainly having a Locale module (in a separate, non-builtins package because the rules for these change all the time and having to couple those updates to language updates would not be a good design) which offers locale-aware alphabetical sorting is a way to prevent those bugs, but the problem is that if there's a builtin alternative that looks correct, it'll definitely happen all the time that people will use the incorrect one instead because it's easier to reach for
and I don't see a way to make the correct version equivalently easy to reach for (let alone easier) because of other design constraints like needing to take an explicit argument of which locale you want, plus the fact that it shouldn't be a builtin so that it can be updated independently from the language
one idea is to just not auto-derive that ability for Str, similarly to how we don't auto-derive Eq for F64 and F32
but of course that would make strings a lot more annoying to use in data structures that use sorting, especially if the string is nested inside another data structure (e.g. records and lists) that could otherwise auto-derive it
but maybe that's okay, because in general hashing data structures will be faster and should be encouraged as the preferred way to store things unless you have a very specific use case? :thinking:
Maybe there's two different kinds of Ords: A semantic order (for numbers, user-defined implementations, and not too much else) and a meaningless but consistent arbitrary order (probably literally everything).
And there are different functions that depend on each of them according to what they need.
There could be a List.semanticSort that requires SemanticOrd, and doesn't work for strings unless you specify your Locale. Or maybe it doesn't use the ability and you have specify how to do it, like List.sortBy and you pass in an argument that's like Builtins.numbersDecreasing or (Locale.alphabetical Turkish).
And then a List.meaninglessSort if you only care about the order programmatically, that works on everything.
The hard part is then choosing names+interfaces to make that distinction easy to do the right thing. What if there wasn't a "List.sort" function to force people to think about it, or the compiler error for using "List.sort" on something without SemanticOrd explained that maybe they want List.meaninglessSort.
Personally, I would just define ord, not put it on strings and add a custom compiler error message they describes the problem.
On top of that, I would add a sorting builtin that can take a lambda. Then someone can easily define the naive way if that is what they want. We could even make a built-in (Str.naiveCompareBytes) and let people know about it in the error message. I think that would enable quick scripting and most people that don't want to deal with UTF-8 intricacies, but it would also stick out like a sore thumb when used in code.
That said, there is definitely merit in use case to having two forms of ord as @Sky Rose mentioned. In many cases you just need a full ordering for data structure function, but it doesn't matter what the ordering is as long as it is consistent.
That said, I don't like the sound of having two versions where the more correct one is the one that can be used in less places. I think people would default to meaningless ord unless it was intentionally weird such that it could really only be used with data structures
Kinda like go dictionaries and randomizing order intentionally to avoid accidental dependency on ordering.
yeah although it seems like doing that would probably require sacrificing some amount of performance unfortunately
Doing which thing specifically?
making it intentionally weird
I don't think so, just order starting with the last character and going towards the first
should be essentially the same perf as going the other way
So I don't think perf is the issue
Though still probably a bad idea for other reasons
Perhaps you could call it something like rawOrder. It implies something odd ("Hey, what's a 'raw' ordering?"), while also implying something about how it works (it orders based on some internal data, or how it's represented, not on some human readable meaning). A description could start with a phrase like "uses the internal bitstrings to give a unique ordering to a list" or something. Y'know, scary enough to let newbies know that it's definitely not alphabetical, but still simple enough that you could have the second sentence be "Useful for when you need to order a type uniquely, without caring about how", or somesuch, and people would mostly Get It.
Maybe "baseOrder", as it'd be some deeper thing that works for many types (all?). Point is, if it has a weird adjective on it, I'm gonna look at it funny, and so I probably won't make a mistake, and if I do, I'll easily remember it next time.
Add compiler warnings and errors to taste
In fact, I think there's a second layer to the whole "pit of success" thing, which is signifiers. You can never idiot-proof something against the most advanced idiots yet manufactured, or in practice, that 1 in a 1000 brain fart. What can be useful however, is to signpost the mistakes. Nothing makes a concept stick in your brain harder than fucking up, blaming something else, only to realise YOU were the one to make that mistake. In that way, intentionally tricking newbies can be a useful tutorialisation. The final bit to make that work, however, is they need to be able to see that mistake coming in the future, some prompt that might trigger their memory. "sort" doesn't do that, but "rawSort" or "baseSort" or whatever definitely would. After making my mistakes, I'd look at that highly conspicuous adjective and go "yeah, I should have seen that coming", and when I saw it again I'd be reminded of that specific distinction that I'd previously neglected.
The issue with adding an ability in general is that an end user may never see it. It will be auto serviced on a type and they will pass it to a function or data structure that automatically uses the ability without the end user ever realizing that RawOrder was going to be a foot gun.
When someone writes MyLib.SuperFastSort that depend añon RawOrd because it is faster, users will flock to it and not realizes that ramifications until they have to deal with locale related letter ordering or something. Very easy for these kinds of minor issues or spread across libraries and essentially infect swaths of code.
I might be misunderstanding, but I don't quite see the issue? Like, in that case shouldn't the library writer be held responsible for making a mistake that had been clearly signposted to them?
By this standard, isn't any builtin doomed because someone could wrap it in some other names, misuse it, and pretend it's good? This failure seems more accurately attributable to a bad library author and not a language flaw, and that seems like a class of mistake that we cannot stop hurting people. We can try to stop authors making mistakes, but we cannot stop people trusting their bad libraries once written. How much responsibility do we want to take on for library authors doing the wrong thing, despite giving them every contrary hint? We want to dissuade mistakes, but there's a fundamental floor below which no amount of good design can compensate flagrant misuse.
This may be another argument for making it obviously broken for anything other than arbitrary ordering. Maybe it's something like "byteOrder", where everything can be ordered by its bitstring representation, and we make sure it's functionally useless for anything other than just generating orders. Then all the inbuilts could have a natural order function in their namespace EXCEPT for strings/problematic cases, in which case you'd need to define the given order function. After all, if the function orders arbitrary types without natural orderings, it must be getting that information from somewhere, and a name that indicates "we are using the underlying bitstring" would be hard to accidentally misuse.
byte order is that same as ascii order, so that is normal string sorting.
Also, I think this is a language and not library author problem because having RawOrder is equivalent to releasing a partially correct api that is meant for a very specific use case, but is super convenient to use in a lot of wrong places.
As such, it will very naturally proliferate. I mean it literally would be the only way to order strings directly exposed by the standard library.
If so, maybe a stranger ordering is needed, as was mentioned earlier. Is there a way to bitshift where the bits loop? I assume so, and a couple bitshifts would surely be very performant while still being too random to be useful. Maybe this is another example of why we should allow types that exclude other types. Like, define an order function that takes anything that ISN'T a string, with some useful compiler messages if you pass one. You'd probably want to include the string one in the namespace too, but called byteOrder or rawOrder etc so you know what you're doing. Or I guess include that one in all the namespaces, given it ought work on everything.
It's late here in australia, so I need to sleep and think. I feel there's a deeper design issue here worth surfacing, but I'm not in the mind to do it.
Yeah, definitely there are options. As I suggested earlier, just sort starting from the end of the string. Preferably in large chunk based comparison for more perf and arbitrariness of the end output.
In the example of Danish sorting, to understand that better, is it definitely a locale thing, where the Danish aa and z are in the ASCII range, yet have a different sort order when specific locale rules are applied, or are those Danish [language family] Unicode graphemes that happen to look like ASCII-range grapheme sequences, yet already have the locale-aware sorting semantics determined by the Unicode without needing specific locale-awareness?
Regarding the topic overall, it seems there's a tension between determining whether it's safe or reasonable to compare two values for relative order, and whether it's safe or reasonable to sort a collection of those values.
For floats, it seems quite necessary and common to compare two values for relative order, yet it seems relatively rare to need to sort a list of floats.
For strings, I've used direct ordering comparison more commonly in low level algorithms, but have used sorting quite commonly in higher level code.
I think being able to compare and support something holding string-like data is critical in a lot of cases. There are many flavors of algorithm that do not need to sort strings for _presentation_ purposes, yet absolutely are performing that sort for storage or other purposes.
Consider /usr/share/dict/words: it needs a consistent, fast order for direct lookup, and that order really doesn't need to be locale aware. And because Unicode is very unlikely to reassign (to reorder) codepoints after release, the dict-words maintainers can minimize their maintenance work while retaining correctness if they specifically _ignore_ locale ordering rules: if the locale ordering gets a fix, the word list for a language doesn't need to change. Ultimately that's because ordering is only used for binary search, and operations on the word list typically just yield a single string (thus ordering is internal and not presentational).
There are vastly many equivalent cases, and removing string ordering altogether (or forcing use of a "machine locale", i.e. "C") brings the risk of scaring people away and greatly increasing the community support burden.
I think having easily accessible, well-named and well documented functions for these riskier cases is better than forcing all cases through a locale module.
I think a separate named type for this kind of string case would also be a mistake. I don't recall if List U8 is comparable, but if so, that would probably serve fine for machine-sort cases, and you could just have a compiler error or warning if a Str is converted to List U8 just to compare for order but is discarded, or if a List Str is converted to a List (List U8) just to sort it and convert it back to a List Str again.
Or perhaps we can introduce a new set of operators (or an operator qualifier sigil) for machine oriented comparisons in cases where the result may not always be intuitive to humans. This is probably not a great idea though.
In any case, sourcing correct locales has historically been difficult, confusing, and messy. LOCALE on the CLI isn't as bad, and Roc CLI platforms may be able to yield decent results for little work on behalf of the app programmer, but afaict, the situation hasn't gotten much better for network services in the last few decades.
For a CLI, such as sort, you don't need a Danish translation of the utility itself to provide correct results when given Danish input, provided the user has set their LOCALE correctly (which is likely). For a website, iiuc, there's rarely much point in being aware of a locale unless you also have a translation of the corresponding language.
If we want to handle locales holistically in a way that application programmers won't mess up, I think we need platforms to provide a locale value (possibly opaque) that the app programmer can just pass along to the relevant string functions, and really have no other way to synthesize that value.
Likewise, for cross-locale cases (you supplied your name in a Danish locale context, but I need to store many names independent of locale) we'd need a convention of machine-oriented operations either taking List U8; or having a separate set of functions that clearly are named and documented to not be for direct human consumption; or having high level functionality (like a database module) just doing the right thing internally (i.e. taking strings, converting to List U8, then never converting back prior to persisting the data).
A good website platform would thus parse the request locale and language from headers, determine what can be satisfied using stored/registered translations, and then pass that opaque value to the app code. The app code simply wouldn't be able to use string functions without passing the value all the way down the call stack whenever string operations are involved.
Though alternatively, perhaps we could have those stdlib string functions receive an implicit locale from the platform via hook.
I like the idea of having the platform pass it in. Fits well with the underlying metaphor where the platform effectively represents the whole context for the app's operation - and what would be more contextual than the language of the person running the program? But at that point, I think you'd want to generalise that concept a bit to make it less arbitrary, and then you allow the application to pass more and more in, and the distinction between app and platform begins to blur...
@Kevin Gillette Thanks for the great write up here!
Also, I guess that fact that Strings can for the most part get seamlessly converted to List U8, does give people a very easy way to opt into naive sorting.
It also is very honest about the fact that we aren't considering this a string with locale anymore, just a list of bytes. I think it is quite clear about the ordering intentions.
Also, List U8 to Str conversion is essentially free, but Str to List U8 may require some cost due to List not supporting a small string style optimization.
Going back to the original posts, is there some way to restrict which types something like List.sortNames could accept to those with a canonical compare operation, and having strings be excluded from that? You could have some separate hidden function that is used for all the data-structure compares, where you're not actually exposing the ordering to the user (there might be a fancy type way of capturing this? If you can give a type to ordering functions in general, maybe the needed property is that the outputs of functions that accept an ordering are independent of that ordering).
Maybe you just shouldn't be using functions like "sort" unless it's 100% clear what you mean, and if it's not, the user supplies the function. Perhaps this internal function is exposed enough to use when desired, but it has the super clunky names that sound scary and non-obvious, or better, obviously not what you might naively expect
But at that point, I think you'd want to generalise that concept a bit to make it less arbitrary, and then you allow the application to pass more and more in, and the distinction between app and platform begins to blur...
What do you mean?
Type restriction is done through abilities (basically traits). This we could add Ord to everything but Str and it would not work with the default sort algorithm.
Brendan Hansknecht said:
But at that point, I think you'd want to generalise that concept a bit to make it less arbitrary, and then you allow the application to pass more and more in, and the distinction between app and platform begins to blur...
What do you mean?
I was just playing around with generalising a concept, only to realise it started to undermine some of Roc's core design philosophy, or seemed to at least. Following a path to demonstrate it ends at a cliff.
I get that, I more meant why? What exactly is the cliff? I want to understand that more.
God this thread is getting tangly, I'm having trouble tracing back through the arguments to figure out where everything came from. Anyways,
I stand by my idea that the platform handing the locale in is a good idea, and I realised that I'd misremembered bits of how importing worked (look I wrote my previous message late at night, I was a bit loopy). The present import structure is more than rich enough to allow passing some record embodying locale, and doing so could save a lot of headaches.
I see two potential issues with this. Firstly, locale norms aren't stable - what if a country changes its orthography? I think that you could handle that by making it an import that is separate to the platform.
Secondly, a locale record would probably be very complex, which worries me slightly given the record would almost certainly be heterogenous. Perhaps the tutorial is out of date, but the syntax of unpacking a payload from a tag is not well suited to simply using dot notation, it's all about pattern matching. This makes sense when passing tags from one function to another, but not when trying to pull a payload from a tag in a record.
Either we end up passing a monolithic locale record into everything that might use it, or we have to do some clunky unintuitive stuff just to pull out our desired functions, records, etc. We could just introduce some notation to make referencing a tagged object easier, which sounds like the sort of thing you should have anyway (if it hasn't already been done). And maybe passing around a monolithic object isn't a problem because the compiler can just optimise everything away? But having such a complex object be so core to calling inbuilt functions makes me nervous for some reason, like there's edge cases we couldn't consider. I don't have that reaction when imagining the locale having an alphabetise/alphabetize (lol) function that we can just pull out like locale.alphabetise and feed to a sort function.
Of course none of that solves the problem of how to name a sort function that works on strings and is purely computational rather than cultural, but at least we're closer to understanding how to handle locale dependency independent of platform dependency
It suddenly occured to me that all the things we talked about in this thread about how to handle localisation also applies to integer string representation. In India, they often don't use thousands and millions, they use "crore" and "lakh". Which might be an issue for string interpolation.
An even more common split is which thousands separator and decimal point symbol to use, but it feels like a library would be the best place to put number serialisation
Do you reckon this should be another thread in ideas, if one such does not already exist? You sound like you'd know more about it than me, but string representations of numbers (especially in the context of string interpolation) seems like a topic worth discussing.
It's standardised in the https://cldr.unicode.org/index
I think the first step is to build the primitives in https://github.com/roc-lang/unicode/tree/main
If we have a library for this then whenever unicode publishes a new version, there is little impact on the base Roc library or applications
This topic has taken somewhat of a detour, but just drafting up something on this. Looking for feedback on this gist which is a brief summary I have made of the current status of this proposal.
In particular does this seem like a reasonable definition?
# Sort.roc
Sort implements
compare : a, a -> [LessThan, GreaterThan] where a implements Sort
looks like a good start! I think it needs to return Equals as one of the options
of note, for integers, comparing them and getting 0 for equals, 1 for greater than, and 2 for less than (which is how the tags will line up due to alphabetical order) can be done like this:
int compare(int a, int b) {
return (a > b) + 2 * (a < b);
}
(the multiply by 2 will become a bit shift)
I did a similar trick in the Wasm back end! Here's how I wrote it in the code comment:
// (x != y) as u8 + (x < y) as u8
https://github.com/roc-lang/roc/blob/main/crates/compiler/gen_wasm/src/low_level.rs#L1190-L1194
I just want to add a couple of points here on the actual signature of the compare function used for sorting. Basically, instead of
compare : a, a -> [LessThan, Equal, GreaterThan] where a implements Sort
I would propose something like
compare : { a : a, b : a } -> [ABeforeB, Equal, BBeforeA] where a implements Sort
The underlying reason is that, after years of doing Elm dev, I still cannot grok the use of LT/EQ/GT when it comes to sorting, and the previously suggested signature has the same problems I think. These are the major issues:
LT and GT don't say anything about which element is lesser or greater than the other. This must be read in documentation or inferred from context.LT and GT definitely expose an ordering, but they also do so in a very abstract way. For example, do you assume that a given ordering starts low and goes high or the other way? Depending on that, LT and GT may be the reverse of what you think they are. I think this also depends somewhat on whether people mentalise their lists as going from top to bottom, left to right, etc.LT and GT even mean in relation to strings and other types? Similar to how you can't use the < and > operators on strings in Roc, using LT and GT quickly becomes confusing on types where "lesser" and "greater" don't have inherent meaning.My proposal is basically to force the two inputs being compared to be named a and b, and then tailoring the return values to say if either a or b comes first. That way I think the signature becomes less abstract, and more tailored to sorting. Which also seems like this signature is being applied to here.
I’ve found LT/EQ/GT to be confusing also so I like this direction! I wonder if there is a way to make the API slightly simpler. Regardless, I think it is an improvement
I’m happy to see even better suggestions if anyone’s got some :smiley:
Would a record argument ({ a : a, b : a }) have a performance cost?
Not one you should think about/be worried about. Especially not for a small record like this.
I like this proposal in general, I would however use e.g. x for the type variable to avoid having two a's with different meanings:
compare : { a : x, b : x } -> [ABeforeB, Equal, BBeforeA] where x implements Sort
Is there a convention for how to choose the type variable for generics in Roc? I feel like I’ve seen a used the most, so if that’s the convention, maybe we should find different names for the two items being sorted instead. x and y for example? Then a can still be used for the type.
It can be compare : a, a -> [LeftFirst, RightFirst, Equal] where a implements Sort
I do like that that signature is simpler, but I also really like that the return type names which input it’s referring to more directly :thinking:
Okay, so typing out a more involved example like so, where the return type either refers to the input by name or by position:
myComparison = \this that ->
when (this, that) is
(Unavailable, Unavailable) ->
Equal
(Unavailable, _) ->
ThatFirst
(_, Unavailable) ->
ThisFirst
(Available thisData, Available thatData) ->
when (thisData, thatData) is
(Ok this_, Ok that_) ->
Num.compare this_ that_
(Err _, Err _) ->
Equal
(Err _, _) ->
ThatFirst
(_, Err _) ->
ThisFirst
myComparison = \this that ->
when (this, that) is
(Unavailable, Unavailable) ->
Equal
(Unavailable, _) ->
RightFirst
(_, Unavailable) ->
LeftFirst
(Available thisData, Available thatData) ->
when (thisData, thatData) is
(Ok this_, Ok that_) ->
Num.compare this_ that_
(Err _, Err _) ->
Equal
(Err _, _) ->
RightFirst
(_, Err _) ->
LeftFirst
I think I actually prefer your suggestion @Kiryl Dziamura :smiley:
I am very much not a fan of the ABeforeB stuff. LeftFirst and such sound fine.
In either the ABeforeB or LeftFirst varieties, should we change terms like 'ascending' and 'descending' too? I think 'ascending' and 'descending' become ambiguous when we move away from using 'less than' and 'greater than' to describe ordering of elements, no?
do you have any ideas for alternatives?
What's the value of ascending and descending being baked into the sorting API though?
I think it's fine to have, say, Num.compareAscending and Num.compareDescending functions, which are implemented in terms of LeftFirst and friends. But the "sorting primitives" don't need to encode that information in my mind. First off, the code can't probe a given sorting of it's ascending or descending based on this information anyway, so it's only there to aid the developer. But as I wrote previously, I think they hurt more than they help in this regard.
what about [Keep, Swap]?
so like if I call it with foo and bar as the argument, Keep means keep them in that order, and Swap means swap the order
I guess that misses the semantic distinction of "they are equal, so do whatever you want" versus "they must be swapped"
so maybe like [Keep, Swap, Same] or something?
or like:
areSorted : a, a -> [Yes, No, Same]
where a implements Sort
Anton said:
do you have any ideas for alternatives?
Not really, but I thought it'd be a downside to moving away from LessThan / GreaterThan if we couldn't come up with alternatives for ascending and descending, as we'd remove ambiguity in one place at the cost of adding it somewhere else.
I kind of like Kasper's idea though, of building it into the function name.
Applying that to Richard's proposal:
areSortedAsc : a, a -> [Yes, No, Same]
or a variation:
pickSmallest : a, a -> [Left, Right, Same]
maybe "are equal" is clearer than "same"
areSortedAsc : a, a -> [Yes, No, AreEqual]
where a implements Sort
I'd like to avoid "left" and "right" as an accessibility consideration, because some people mix up left and right
I don't think "ascending" and "descending" are helpful terms for a general type that implements sort. e.g. I could make an opaque wrapper around U32 which sorts higher numbers before lower ones, and now "ascending" sort for my type results in a list of numbers in descending order. Maybe instead we could have descriptive specialized variants like List.sortAsc : List (Num a) -> List (Num a) or List.sortAsciibetical : List Str -> List Str
What I like about @Kasper Møller Andersen proposal is that it explicitly rejects ordering in arguments. Instead of implicitly ordered (a, b) he suggests explicitly unordered {a, b} so it’s much simpler now to answer the question “how these unordered items are related?”: the answer does not require any hidden information from arguments ordering then. Asc/desc, less/greater, areSorter - all of them imply the args order is important.
In my proposal, I tried to get rid of the layer of indirection introduced by a/b. So LeftFirst means “left argument is first”. A11y makes sense, but FirstFirst and SecondFirst would be much worse :grinning_face_with_smiling_eyes:
a reason I prefer something like areSortedAsc is that it's easier for me to see the sorting visually, e.g. if a and b are numbers then:
areSortedAsc = \a, b ->
if a |> isLessThan b then
Yes
else if a == b then
AreEqual
else
No
compared to:
compare = \{ a, b } ->
if a |> isLessThan b then
ABeforeB
else if a == b then
AreEqual
else
BBeforeA
I would also like to keep ascending/descending completely out of the API if possible, as I think that will become confusing quickly for anything that's not numbers.
I think it's worth asking what kind of sorts people will write out there. Presumably the basic sorting of numbers and strings will be covered by the standard library/some popular package. Skimming over our Elm code base at work (which of course biases uses towards web apps), the main uses for custom sorting functions goes something like:
Which is why I previously wrote this example as a representative custom sorting:
myComparison = \this that ->
when (this, that) is
(Unavailable, Unavailable) ->
Equal
(Unavailable, _) ->
RightFirst
(_, Unavailable) ->
LeftFirst
(Available thisData, Available thatData) ->
when (thisData, thatData) is
(Ok this_, Ok that_) ->
Num.compare this_ that_
(Err _, Err _) ->
Equal
(Err _, _) ->
RightFirst
(_, Err _) ->
LeftFirst
What I like about having LeftFirst and RightFirst here is that I usually don't have to relate back to function signature while I'm pattern matching, because looking at branches like:
when (this, that) is
(Unavailable, Unavailable) ->
Equal
(Unavailable, _) ->
RightFirst
the pattern match itself also tells me the order. I.e. I've pattern matched that the left item is Unavailable, so using RightFirst makes very good sense without reading anything else. In this case I also prefer LeftFirst and RightFirst over being ABeforeB and BBeforeA, because whether I'm looking at a or b gets lost when I only look at my pattern match branch. I have to go to the top of my pattern match to read which one is a and which one is b.
If we want to avoid using left and right for accessibility (which is fair), then I think I would revert to preferring my original proposal.
(also, @Richard Feldman, what language do you specify in your code blocks to get good syntax highlighting for Roc code? :smiley:)
I alternate between perl and haskell depending on whether there are comments :laughing:
I wonder how we get more syntax highlighting in there...we got it on GitHub, should be possible to get it on Zulip somehow, right? :grinning_face_with_smiling_eyes:
If we want to remove the asceding/descending terminology, maybe going back to Richard's areSorted (without the Asc):
areSorted : a, a -> [Yes, No, AreEqual]
sort : List a -> List a
sortReverse : List a -> List a
areSorted is self-explanatoryOne more thing: I know we are discussing implementing a function as part of an ability here, which means it will have a name when users implement it too. But are users able to pass a sorting as just a closure? Because if so, we can't rely too much on the function name to convey meaning to users. Because they either won't give it a name at all, or they will give it whatever name they think makes sense for them at least.
So something like areSorted : a, a -> [Yes, No, AreEqual] won't work well for examples like:
List.sortWith myList \this that ->
when (this, that) is
(Unavailable, Unavailable) ->
AreEqual
(Unavailable, _) ->
No
(_, Unavailable) ->
Yes
(Available thisData, Available thatData) ->
when (thisData, thatData) is
(Ok this_, Ok that_) ->
Num.compare this_ that_
...
because Yes and No only make sense when you have a function named areSorted attached to them. So in that sense, Keep and Swap are definitely better, because they imply an ordering without needing a function name to convey context.
Maybe the return type should be [AlreadyOrdered, SwapOrder, EquallyOrdered] to convey enough context without a function name?
that's interesting!
maybe this?
areSorted : a, a -> [AreSortedForwards, AreSortedBackwards, AreEqual]
kinda wordy, but these functions get written by hand extremely rarely, so maybe fine
What about
areSorted : a, a -> [Sorted, Unsorted, Equal]
Or
areOrdered : a, a -> [Ordered, Unordered, Equal]
From what I understand, there are two possible questions to ask:
In this case, it‘ much simpler to think about an infix operation, so the LT, GT makes sense when to put them between the args (a GT b). Abstract alternative naming for tags might be [Precedes, Succeedes, Equals]
Here, we check the order status of the pair as a collection of two items. So the possible tags might be [Ordered, Unordered, Equal]
I suppose we’re leaning toward the second question in this discussion, right?
Oh, good thought considering sortWith.
Another idea, we could make sortWith take a record so we can use areSorted to name the compare function:
sortWith : { items: List a, areSorted: a, a -> [Yes, No, AreEqual] } -> List a
Though it means you can't use sortWith in a pipeline anymore, so meh :/
Kiryl Dziamura sagde:
From what I understand, there are two possible questions to ask:
How does the first arg relate to the second one?
...What is the order status in this pair of args?
I like thinking about it this way, but I think there's a third question we could also ask:
Granted, this could also be considered a sub-question of the first one, but I also think it's the simplest question we can actually ask. Breaking down the material we have so far, I think we can categorize them something like:
LeftFirst)Swap, Ordered, AreSortedForwards, etc.)LessThan, GT, etc.)I've also sorted ( :drum:) them by what I personally consider their mental load to be. I put the "point to an element" at the top, both because it requires the least amount of context to decide what's right, and also because most of the sorting implementations I've seen fall very nicely into that category.
I'm on board with dropping LeftFirst and RightFirst for accessibility, but I really want to try and find a different solution which could in that category instead. I would also love to hear from anyone who disagrees with my ranking of mental load, since that's very biased towards what I've personally experienced and seen my coworkers do.
One potential solution I thought of is:
sorter : a, a -> [BumpFirstElement, BumpSecondElement, Equal]
And in use, it could look like:
List.sortWith myList \a, b ->
when (a, b) is
(NotPresent, NotPresent) ->
Equal
(NotPresent, _) ->
BumpSecondElement
(_, NotPresent) ->
BumpFirstElement
(Present _a, Present _b) ->
Str.sorter _a _b
Or for a numeric sort:
sorter= \a, b ->
if a |> isLessThan b then
BumpSecondElement
else if a == b then
Equal
else
BumpFirstElement
I also still think that very few people will ever actually write a numeric sort, as they will just use what's in the standard library. Sorting on custom types in a pattern match is around 90% of the sort implementations I've seen in Elm (on the 20-30 different front end sortings we have implemented in our code base)
Although definitely this isn’t the most important API to work on necessarily, and I think most suggestions so far have been a big step up from the old API. So it’s not a hill I’m gonna die on :smiley:
I don't really get the Bump word in this context, despite hearing it many times. I'm not a native English speaker. To me it means "make that element higher up", but that assumes higher up means ...err.. later in the sorted sequence? From the example implementation that is correct, but I wouldn't have figured that out.
Out of these, I like Kiryl's suggestion the most
areSorted : a, a -> [Sorted, Unsorted, Equal]
I like it the most when the Tag directly describes the relation between the two value, but the use of a record to name the two vars feel too meticulous. The user may also want to name the vars differently especially when the sort function is defined not as an anonymous fn like with List.sortWith. For an opaque User type, I prefer
sort = \user1, user2 -> ...`
compared to
sort = \{a: userA, b: userB} -> ...
How do you like [FirstBeforeSecond, SecondBeforeFirst, Equal]?
It would keep the clarity of [ABeforeB, ...], no Asc / Desc, all without the extra record. It would be understandable in an anonymous fn as well as in an ability implementation.
sort= \enemy1, enemy2 ->
if enemy1.health |> isLessThan enemy2.health then
FirstBeforeSecond
else if enemy1.health == enemy2.health then
Equal
else
SecondBeforeFirst
Yeah, with Bump I was looking for a verb that describes moving something forward. I was toying with Promote too, but also felt it was too far away from sorting to be good.
Norbert Hajagos said:
How do you like
[FirstBeforeSecond, SecondBeforeFirst, Equal]?
A lot :smiley:
Just caught up with this thread....man naming is hard....
Also, I have 2 big thoughts here:
<, ==, >=, etc. Those operators are pretty much always clear. Of course they can be more verbose depending on the use. Cause you would have multiple checks instead of a single tag. Though they also would be tailored to your exact use. ...
Obviously the spaceship operator <=> is the only correct way to do this :stuck_out_tongue_wink:.
Brendan Hansknecht said:
- A lot of this discussion makes me feel like we should avoid an API that uses tags at all and instead only allow comparison operators like
<,==,>=, etc. Those operators are pretty much always clear. Of course they can be more verbose depending on the use. Cause you would have multiple checks instead of a single tag. Though they also would be tailored to your exact use.
Could you write an example of how the operators would be used? I’m not really following how you want to apply them :smile:
My concern is that despite LT and GT being common, they're not the best for reasoning about order. Especially considering comparison operators which are perfect for numbers comparison but confusing for anything else. I mean, in terms of friendliness, it's not obvious what a < b means if a and b aren't numbers, you have to be aware of the ability implementation to use the operators.
But I agree, other might be odd for experienced programmers
Brendan Hansknecht sagde:
- Some of these apis are getting pretty odd which may be jarring and quite confusing to must users. Despite LessThan and Greater than not being great terms, they are common and learnable.
I do agree it's valuable to let people reuse their knowledge from elsewhere, but I would say the LessThan/GreaterThan idiom is not as common as people might think, nor as learnable. Of course, the big elephant in this API room is the use of -1, 0, and 1 to mean these things (or just a boolean, if you're C++).
Using these integers is extremely common of course, (JavaScript, Python, Java, C, C#, Kotlin, etc.), and probably covers the vast majority of code running in production today.
It's not totally clear how much use each language has, but I don't think it's controversial to say JS is the biggest. And it does not use the Lesser/Greater terminology to describe these integers. Instead, MDN says:
- A negative value indicates that
ashould come beforeb.- A positive value indicates that
ashould come afterb.- Zero or
NaNindicates thataandbare considered equal.
Which I think aligns much better with something like FirstBeforeSecond for example than with LessThan.
Secondly, if the API was to be tags named something like LT/EQ/GT, then I think I would personally prefer having the integers actually. Aside from aligning closer with existing languages, I think it also has the advantage that it is completely unreadable if you don't already know about it, which basically forces you to read the documentation.
Having said that, I still agree with @Kiryl Dziamura that the whole Lesser/Greater thing is just generally not that great for describing orderings:
<, LT, etc.) for ordering actually makes them ambiguous. That's because, in programming, they are not assertions about the actual ordering of the things being compared, but they are boolean tests on a given ordering. But when we use them to declare ordering, they must be assertions. So outside of sorting, they mean one thing, and inside sorting another.I agree we've gone over a wide selection of API possibilities here, but I think we would benefit from just trying something out at this point. The language is still unstable, so maybe just pick one, and change it if it doesn't work out?
For my money, it seems something like [FirstBeforeSecond, SecondBeforeFirst, Equal] addresses most concerns out there, so I would pick that personally.
Sorry to keep banging on this drum, I'm just writing down my thoughts to get them out of my head here :smile:
Most languages have the semantic of their sorting mean "first arg is lesser/greater than second arg", and their return types should basically be viewed as an infix operator to make sense. But since Roc doesn't allow people to make/override infix operators, designing a return type a la LessThan which should be considered as an infix operator doesn't rely mesh well with the language.
.
.
Summary of entire discussion
I worry that we are generally waaay overthinking this change, and we're getting stuck in analysis-paralysis here with little benefit to come out of it, so I want to try and summarize what I think is for and against:
-1, 0, 1 as return types for sorting is fairly ubiquitous in most programming languages to mean something like "a less than b", "equal", and "b less than a"I still think [FirstBeforeSecond, SecondBeforeFirst, Equal] fits the bill perfectly. The overall shape of the function signature remains the same as everywhere else, it translates easily to the concepts people already have from other languages, and it gets rid of the ambiguities of "lesser" and "greater" as pertaining to sorting.
If anybody has arguments against [FirstBeforeSecond, SecondBeforeFirst, Equal], I would love to hear them. Because right now it just feels we are worried about making this choice, and I can only speculate why. So I'd love to hear what is holding people back here :blush:
Though a bit verbose. I agree with your selection. I think it is the easiest to just read and understand without needing to think about it. That said, I don't think we should use Equal. I think we should use Equivalent. The values might fail an a == b test despite getting that tag result.
Also, we aren't against custom infix as a whole. For example, you can override == for an opaque type. That said, <= and friends make a lot less sense to override on most types that aren't numerical.
So we are a lot more hesitant to enable those infix operators to be overridden. If we did allow them to be overridden, it would probably be via some other ability that forces implementing the full set of numeric math operations or similar. I think this is quite unlikely though cause it leads to a lot of confusing code when used on non-numeric types.
Kasper Møller Andersen said:
If anybody has arguments against
[FirstBeforeSecond, SecondBeforeFirst, Equal], I would love to hear them. Because right now it just feels we are worried about making this choice, and I can only speculate why. So I'd love to hear what is holding people back here :blush:
I'm personally okay with trying this out (I agree with Brendan's point about Equivalent), but I heard a good rule from a language designer (Guido maybe?) which is basically "if you can't decide how to proceed, always default to status quo even if it's not what anyone wants individually"
the reasoning is basically that changes always have a long term cost - either accumulated API surface area if you don't want to break backwards compatibility, or else accumulated breaking change fatigue
I think it's fine in this particular case to give it a shot, but I wanted to note that in general I think it's a good idea to be mindful that there's a good argument for staying with the status quo in the face of indecision as to which specific alternative to try
My 2c is the well known thing (Lt/Gt/Eq) here is better. it’s familiar and I dont think it’s ambiguous (I’m not sure how else you could interpret a, b -> Lt/Gt/Eq other than as a relationship between a and b). Sorting/ordering inherently implies an order, and to me every other option seems like a different set of names for these kinds ordering binary operators.
I second conservative naming.
Agreed on Equivalent over Equal as well :smile:
Ayaz Hafiz sagde:
I’m not sure how else you could interpret a, b -> Lt/Gt/Eq other than as a relationship between a and b
There's a few different things which make them confusing:
a < b, where LT and co. mimick the mathematical operators, but I find that confusing still. It means that a sorting function is basically a function which takes two arguments, and determines which infix operator to apply to the arguments based on their values, which is not a pattern I recall having seen anywhere else.< operator is usually a boolean operation, which tests if a given relationship is true or false. But with LT, we want it to mean that we are asserting a given relationship, rather than testing it.LT is supposed to be read, it's not clear to me that "less" means "forward" in a sorting. If I want to sort in descending order, I basically have to state that 5 LT 4, which flies in the face of how the normal < operator works.The fact that LT/GT/EQ really wants to resemble </>/== here is a detriment to how easy it is to understand in my book.
Ayaz Hafiz sagde:
to me every other option seems like a different set of names for these kinds ordering binary operators.
You're right that it's just a different set of names, but that's a feature! :smiley: Everything still works as expected, but rather than the API relying on being a metaphor for mathematic ordering, which works for some but causes confusion for others, the new names just makes everything more explicit.
I find it similar to how Roc doesn't have a List.filter function, but List.keepIf instead. It's "just" a rename of course, but it's much clearer still. As someone who has been programming with functional programming of various forms for 10+ years, I still find myself regularly looking up whether returning True in my filter predicate means the element is kept or discarded.
Sorry to be late to the party. I really like the [LessThan, GreterThan, Equals] tags. To me at least, I do not feel there is a mental load in knowing that means "a is 'LessThan' b". In languages with compare returning -1, 0, 1, I translate in my head to LessThan, Equal, GreaterThan. I completely agree on the using Equivalent instead of Equals.
Just want to suggest that if we ever go with the [Sorted, Unsorted, Equivalent] idea, we might want to consider Reversed instead of Unsorted.
To me "unsorted" suggests a sort of neutral state of not having been considered yet. Rather than that the items are out of order.
I don't know if this was ever implemented or not, but for anyone who doesn't know or remember the discussion, there was some consensus to try replacing the [LessThan, GreaterThan, Equals] ordering tags with [FirstBeforeSecond, SecondBeforeFirst, Equivalent] as an experiment, but many people still prefer the old names. There's also a slightly longer summary for the curious, as well as my thorough description of why I find the LT/GT/EQ API poor.
Anyway, the reason I'm returning to this thread is because I had an experience that just reiterated my desire for this change:
When I started this thread, I found the LT/GT/EQ API confusing. Going through the discussion, I learned all the ins and outs of it though, and though I still stood by my points, I at least understood the old API properly. However, some months later, I had to review an Elm PR where a new ordering had been implemented with LT/GT/EQ, and I found myself confused again. I couldn't recollect the understanding I had built up previously.
Which reminded me of a point a lecturer at university really wanted to hammer home: bad UX is bad forever. That is, they had conducted studies on people using software with poor UX, and found the users had barely improved their ability to overcome the poor UX a year later. The users weren't simply getting used to the poor UX and working around it, they were continuously penalized by it. And I think the same thing happened to me with this API.
Thanks for coming to my TED talk :smile:
I don't know if this was ever implemented or not
I don't think it was. [FirstBeforeSecond, ... is good for me.
@Kasper Møller Andersen This seems to me like the equivalent of a "code smell", maybe a "concept smell" or "design smell", and should thus have some attention paid to it.
Can you elaborate @Rick Hull?
However, some months later, I had to review an Elm PR where a new ordering had been implemented with LT/GT/EQ, and I found myself confused again. I couldn't recollect the understanding I had built up previously.
Just affirming that when this happens, it's worth considering if there isn't a nicer or more intuitive design hiding somewhere. Matching user intuition is not the highest goal, and there are always tradeoffs.
Last updated: Jun 16 2026 at 16:19 UTC