Sort ability for Unions · compiler development

Stream: compiler development

Topic: Sort ability for Unions

Ben Plotke (Jul 21 2024 at 21:29):

I am working with @Jasper Woudenberg on implementing the Sort ability. We were wondering, when sorting Unions, what should the behavior be when the tags are different? Jasper suggested sorting by tag index, which corresponds to alphabetical order. I was thinking we should put empty tags first, but then sorting the rest alphabetically feels inconsistent to me. Does anyone have an opinion on this?

On a separate note, I was under the impression that we have a different back-end for dev builds, so I was surprised to see I could run roc test someCode.roc and the LLVM code gen I had written appeared to work. Does the dev build plug into LLVM? I am correct that we will only need to implement code gen for LLVM and WASM, but not the dev build?

Brendan Hansknecht (Jul 21 2024 at 21:36):

Tags maybe shouldn't autoderive sort?

Brendan Hansknecht (Jul 21 2024 at 21:36):

That or alphabetical is our general opinion of how to expose tags

Brendan Hansknecht (Jul 21 2024 at 21:37):

We have 2 dev backends. One for wasm and one for native

Ben Plotke (Jul 21 2024 at 21:37):

Not auto deriving sort would definitely be easier from an implementation perspective

Brendan Hansknecht (Jul 21 2024 at 21:37):

native dev is used in the repl by default

Brendan Hansknecht (Jul 21 2024 at 21:38):

wasm dev in the web repl

Brendan Hansknecht (Jul 21 2024 at 21:38):

otherwise, everything is llvm currently

Brendan Hansknecht (Jul 21 2024 at 21:38):

Also, for the ability, I think it should autoderive at the mono level. So no need to implement in any of the backends

Ben Plotke (Jul 21 2024 at 21:41):

@Brendan Hansknecht Thank you for taking the time to respond. Unfortunately I am now more confused. We only need to implement for LLVM, and all the other backends get it automatically?

Ben Plotke (Jul 21 2024 at 21:42):

Brendan Hansknecht said:

native dev is used in the repl by default

Does roc dev ... not run the dev build?

Brendan Hansknecht (Jul 21 2024 at 21:45):

nope. just enables expects and dbg statements. roc == roc dev

Brendan Hansknecht (Jul 21 2024 at 21:45):

Eventually it should, but we haven't wired everything up yet

Brendan Hansknecht (Jul 21 2024 at 21:47):

We only need to implement for LLVM, and all the other backends get it automatically?

We need to implement for mono and all backends get it. That is the level before all backends. Same way that Hash or Encode is implemented.

https://github.com/roc-lang/roc/tree/main/crates/compiler/derive/src

Brendan Hansknecht (Jul 21 2024 at 21:47):

I think that should work, but maybe I am missing something. I guess Eq doesn't use derive, so maybe it is special...not sure

Brian Carroll (Jul 21 2024 at 21:49):

There's a diagram of the compiler stages at the bottom of this readme
https://github.com/roc-lang/roc/blob/main/crates%2FREADME.md

Brian Carroll (Jul 21 2024 at 21:50):

Everything goes through mono before it splits into the dev / LLVM backends

Richard Feldman (Jul 21 2024 at 22:28):

Ben Plotke said:

I am working with Jasper Woudenberg on implementing the Sort ability. We were wondering, when sorting Unions, what should the behavior be when the tags are different? Jasper suggested sorting by tag index, which corresponds to alphabetical order. I was thinking we should put empty tags first, but then sorting the rest alphabetically feels inconsistent to me. Does anyone have an opinion on this?

:thinking: what would be the advantage of sorting empty tags first?

Richard Feldman (Jul 21 2024 at 22:29):

I expect the main use for giving Sort to a tag union would be that it would let you put it in a data structure that relies on knowing how things are ordered (e.g. a tree)

Richard Feldman (Jul 21 2024 at 22:29):

so the fastest thing would be beneficial there, and the fastest thing would be to go by index because it's an integer that is right there already! :big_smile:

Ayaz Hafiz (Jul 21 2024 at 23:32):

sorting by index would be really confusing

Ayaz Hafiz (Jul 21 2024 at 23:32):

that would result in two different sort derivations for [A U8, B U8] vs [B U8, A U8] even though those types are equivalent

Ayaz Hafiz (Jul 21 2024 at 23:33):

it gets worse because in the implementation the order is not guaranteed to be consistent across module boundaries

Ayaz Hafiz (Jul 21 2024 at 23:33):

the benefit of sorting alphabetically is that your program semantics also do not change if you refactor the order of your tag union

Brendan Hansknecht (Jul 21 2024 at 23:43):

Don't we sort tags to generate the index? (Otherwise, how would platforms use tags)

Brendan Hansknecht (Jul 21 2024 at 23:44):

Either the exception of null cases of executive types that get the null ptr directly

Richard Feldman (Jul 22 2024 at 01:31):

Ayaz Hafiz said:

the benefit of sorting alphabetically is that your program semantics also do not change if you refactor the order of your tag union

that's a great point!

Richard Feldman (Jul 22 2024 at 01:33):

Brendan Hansknecht said:

Don't we sort tags to generate the index? (Otherwise, how would platforms use tags)
Either the exception of null cases of executive types that get the null ptr directly

that's the default, but yeah like you noted - there are some optimizations we do where we no longer store a discriminant for one tag, which means the others get different indices because there's one missing compared to before

Richard Feldman (Jul 22 2024 at 01:34):

but the "refactoring tag payloads shouldn't change tag sort order in a distant part of the program" is a strong argument to me; that's enough to convince me we should do it strictly alphabetically

Brendan Hansknecht (Jul 22 2024 at 01:42):

Sounds good

Brendan Hansknecht (Jul 22 2024 at 01:43):

Also, alphabetically will only be minorly slower. It will be a switch (probably optimize to a lookup table or no-op) on the tag returning an alphabetic index

Ben Plotke (Jul 23 2024 at 04:44):

Thank you everyone for you responses! Alphabetical it is.

Last updated: Jul 26 2025 at 12:14 UTC