exhaustive record destructuring · ideas

in Rust, if I destructure a struct, by default I get an error if I leave off any fields. I can opt out of that error by adding .. as one of the fields, at which point the destructure works like how Roc's record destructures work.

I've found this to be an annoying default, but I do occasionally want it. For example, I'm about to write a function that returns the total length in bytes of all the fields in a struct (they're all collections) so I can know how much space to preallocate for copying them. In that case, the exhaustive destructuring is a nice way to make sure that if I add a new field, I don't forget to add it to that calculation.

Richard Feldman (Dec 16 2023 at 23:35):

{ foo, bar, ..{} }

Richard Feldman (Dec 16 2023 at 23:36):

David Mell (Dec 16 2023 at 23:45):

Would this also work with tuples? (I guess the syntax would be ..() in that case.)

Agus Zubiaga (Dec 17 2023 at 00:08):

Yes! This would be great. Part of me wants it to be opt-out like in Rust, but I’m happy to get it either way :big_smile:

Agus Zubiaga (Dec 17 2023 at 00:17):

Here is an example of a function where I need to do something with all the fields of a record

Agus Zubiaga (Dec 17 2023 at 00:19):

I’d also use it in a lot of other situations where I’m likely to have to add code if I add a new field, even if I don’t need all of them

Richard Feldman (Dec 17 2023 at 00:20):

the thing is, in Rust I am writing .. almost always, like 95% of the time - it's really annoying :sweat_smile:

Agus Zubiaga (Dec 17 2023 at 00:23):

Yeah, I get it. When it’s manageable, I like to do { usedField, unusedField: _ } instead of .. because it gives me peace of mind to know the compiler will let me know if I add a new field. I know it’s not for everyone, though :grinning:

Agus Zubiaga (Dec 17 2023 at 00:34):

It depends a lot on the use case. You’re probably right about the best default.

Brendan Hansknecht (Dec 17 2023 at 01:21):

Pearce Keesling (Dec 17 2023 at 04:28):

I think defaulting to non exhaustive makes more sense in roc where type inference and structural typing are the norm. In rust since it is all concrete types you know exactly what fields to expect in a struct and it is unlikely to grow/shrink as much up the stack

Eli Dowling (Dec 18 2023 at 06:14):

{a,b} :=  {a:1,b:2,c:3} #error record not full restructured
{a,b} =  {a:1,b:2,c:3} #all good

If you did go for a symbol, I'd definitely prefer a simpler symbol. I think adding another record like syntax into the mix is confusing, and looks a bit weird inside tuples
Eg:

{a,b,!}

{a,b,~}

Basically the symbol would be the opposite of "_" instead of "and then the rest" it means "and there is no more" I think it's quite intuitive actually.

Anton (Dec 18 2023 at 09:49):

Anton (Dec 18 2023 at 09:50):

It's very unlikely that you want a variable named nothingElse :p so it's fine in that respect

Kevin Gillette (Dec 22 2023 at 22:20):

I think keywords are more elegant when they're uniform-case and avoid underscores.

nothing-else looks a lot better to me than nothingElse or nothing_else, but it's still suspect. If else weren't already a keyword, then nothing-else would be precariously similar to a subtraction expression.

Kevin Gillette (Dec 22 2023 at 22:26):

I like the suggestion of ! the most so far since it has well understood meaning and the behavior has a decent chance of being inferred by the reader even if they don't know about that feature.

Without explanation, I'd have no idea what ..{} means. {} as a type param to close a record is perhaps self-consistent, but imo not all that intuitive, so I don't believe we should expand use of that syntax into more areas.

Richard Feldman (Jan 02 2025 at 22:10):

one potentially interesting design: we could make it so that structural record destructures are non-exhaustive, but custom record destructures work the way they do in Rust

Richard Feldman (Jan 02 2025 at 22:11):

{ x, y } = # not exhaustive, like today

Point.{ x, y } = # exhaustive, like Rust

Point.{ x, y, .. } = # not exhaustive, like Rust

Richard Feldman (Jan 02 2025 at 22:13):

then if desired, we could do something like this for exhaustive structural records:

{ x, y, ..{} } = # exhaustive

Richard Feldman (Jan 02 2025 at 22:13):

which doesn't look the prettiest, but also seems like it would be extremely rare to want in practice

Anthony Bullard (Jan 02 2025 at 22:46):

Richard Feldman (Jan 02 2025 at 22:50):

Kilian Vounckx (Jan 03 2025 at 07:36):

Dawid Danieluk (Jan 16 2025 at 15:50):

Another idea, use ellipsis ... operator (there were some discussions about having it right?).
Want to use rest? { x, y, ..rest }
Don't care about it? { x, y, ... }

If ... would be introduced then I think it'd be pretty nice usage of it (as in "i don't care about the 'rest' right now") so it's similar conceptually to todo!() with nice side effect that changing ... into ..rest requires less keystrokes and looks similarly.

It doesn't introduce new concepts and would reuse something already in the language (assuming that ellipsis will be added).

Sam Mohr (Jan 16 2025 at 18:13):

We already plan on having { x, y, .. } meaning open record and { x, y } meaning closed record

Anthony Bullard (Jan 16 2025 at 19:13):

I think Dawid is talking about taking the "rest" of the open record and doing something with it

Anthony Bullard (Jan 16 2025 at 19:14):

Sam Mohr (Jan 16 2025 at 19:28):

Anthony, I don't understand how this would help that. Could you give an example?

Sam Mohr (Jan 16 2025 at 19:29):

Also, I think supporting .. and ... in the same location could lead to some very surprising code breaks

Sam Mohr (Jan 16 2025 at 19:30):

Though hopefully the presence of a warning saying "you wrote a ..., remove it eventually" would help

Sam Mohr (Jan 16 2025 at 19:30):

Anthony Bullard (Jan 16 2025 at 22:29):

I think the idea is, ... means "there might be other stuff here, but I don't care about it", and ..<IDENT> means "there might be other stuff, and if so, put that other stuff into a record and assign it to the variable IDENT".

Sam Mohr (Jan 16 2025 at 22:30):

{ x, y, .. } = { x: 123, y: 456, z: 789, foo: "bar" }

Anthony Bullard (Jan 16 2025 at 22:31):

Anthony Bullard (Jan 16 2025 at 22:32):

{ x, y, ..rest } = { x: 123, y: 456, z: 789, foo: "bar" }
expect x == 123
expect y == 456
expect rest == { z: 789, foo: "bar" }

Anthony Bullard (Jan 16 2025 at 22:32):

{ x, y, ... } = { x: 123, y: 456, z: 789, foo: "bar" }

Sam Mohr (Jan 16 2025 at 22:33):

{ x, y, ..rest } = { x: 123, y: 456, z: 789, foo: "bar" }
expect x == 123
expect y == 456
expect rest == { x: 123, y: 456, z: 789, foo: "bar" }

We don't want to have rest only contain the uncaptured fields because that requires us to create a new record, which is inefficient if done a lot

Anthony Bullard (Jan 16 2025 at 22:33):

But that doesn't really jive with how similar features (in the few languages that have it) work

Sam Mohr (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:35):

But I would be INCREDIBLY surprised that ..rest didn't only give me back a new struct with the uncaptured fields

Anthony Bullard (Jan 16 2025 at 22:36):

{ x, y, ..rest } = { x: 123, y: 456, z: 789, foo: "bar" }
expect x == 123
expect y == 456
expect rest == { x: 123, y: 456, z: 789, foo: "bar" }

{ x, y, ..rest } = some_func()
expect x == 123
expect y == 456
expect rest == { z: 789, foo: "bar" }

Sam Mohr (Jan 16 2025 at 22:36):

You're right that it's different for us to have rest capture everything (a.k.a. be a reference to the original record), but if the default from other languages is easy to make it inefficient, we should help people write more efficient code

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:37):

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:37):

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:38):

Anthony Bullard (Jan 16 2025 at 22:38):

Anthony Bullard (Jan 16 2025 at 22:39):

So if I call Str.split_firstI have to understand there's a new stack allocated struct (tuple), and probably two new heap allocated strings

Sam Mohr (Jan 16 2025 at 22:39):

Okay, just to make sure we're on the same page, if { x, y, ..rest } returned only the other fields into rest, you think that .. would handle what we want instead of ...?

Anthony Bullard (Jan 16 2025 at 22:39):

Sam Mohr (Jan 16 2025 at 22:40):

There aren't heap-allocated strings, I think. We should just take references to the slices we want

Anthony Bullard (Jan 16 2025 at 22:40):

Richard Feldman (Jan 16 2025 at 22:40):

both strings actually share references to the original allocation, so that one happens to be efficient :big_smile:

Anthony Bullard (Jan 16 2025 at 22:40):

Really? We just create a new seamless slice over the original? Basically a new view?

Anthony Bullard (Jan 16 2025 at 22:41):

Sam Mohr (Jan 16 2025 at 22:41):

If you wanted to make sure that rest was empty, you could do { x, y, ..{} }, but { x, y } does that, so no need for .._ or ..{}

Anthony Bullard (Jan 16 2025 at 22:41):

Anthony Bullard (Jan 16 2025 at 22:42):

Sam Mohr (Jan 16 2025 at 22:43):

Another reason I think we shouldn't do ... here is because I think it should unambiguously refer to "code I haven't written yet"

Sam Mohr (Jan 16 2025 at 22:43):

Anthony Bullard (Jan 16 2025 at 22:43):

Anthony Bullard (Jan 16 2025 at 22:44):

Sam Mohr (Jan 16 2025 at 22:45):

Brendan Hansknecht (Jan 17 2025 at 20:39):

I was the original person to really push against struct mutation syntaxes (that actually change the fields contained) because they are inefficient. I honestly think that was likely a mistake at this point. Except for very large structs or very hot loops, no programmer will actually care about the efficiency here. Same with tags conversions from one union to another.

I'm not saying we should just enable these things, but I really think we have started to weigh them too heavily. Yes, small hits across an entire program can really hurt. Same in a hot loop. But being able to just use the language and get things done is more important to most people.

The big question in my mind is when a user has completed there app and goes to optimize, will they feel that they need to remove the feature in many locations or that using it was a mistake.

For most of the these features, I think that almost always the user will not care and these features are unlikely to be at the bottleneck. If they are a bottleneck it will be in a few limited hot loops.

That said, still really hard to gauge. A single large struct destructuring might lead to tons of data copying due to destructing and a ton of extra refcount updates. So it could be very heavy.

Brendan Hansknecht (Jan 17 2025 at 20:41):

At the same time, most low level languages will never hit this class of issues due to only having nominal types. It is structural types that specifically opt into these kinds of questions and problems.

Brendan Hansknecht (Jan 17 2025 at 20:42):

Really hard to pick a balance cause roc is trying to live in two different worlds. One where the feature is a no brainer and another where the feature is questionable at best.

Richard Feldman (Jan 17 2025 at 21:25):

another factor is that LLVM in a lot of cases may end up breaking up the structs anyway

Richard Feldman (Jan 17 2025 at 21:25):

like it's not as if we are definitely going to end up making an actual whole new struct, after all the optimization passes have happened

Brendan Hansknecht (Jan 18 2025 at 00:27):

I'm not sure how likely this is in practice, but yeah, theoretically a completely local struct could be split into many separate variable. I'm not sure I've ever seen it in our optimized IR though.

Brendan Hansknecht (Jan 18 2025 at 00:29):

Brendan Hansknecht (Jan 18 2025 at 00:30):

but also, what is the actually cost. It is just moving a handful of bytes form one stack offset to another

Sam Mohr (Jan 18 2025 at 00:41):

I think the perf cost is not really a thing, I'm more worried about the specialization cost (more records, longer compile times). But that also should be negligible

Brendan Hansknecht (Jan 18 2025 at 00:41):

Oh, actually, I think I am wrong here and the promotion happens more often than I realize. It just does it in a weird way that still leaves around a lot of allocas even if structs are broken up and mostly treated as scalars.

Brendan Hansknecht (Jan 18 2025 at 00:41):

Brendan Hansknecht (Jan 18 2025 at 00:42):

At a minimum, structs that are local to a function will be split into n alloca instructions for each field. That should make all of this data movement free as along as we don't cross the function boundary.

Brendan Hansknecht (Jan 18 2025 at 00:42):

That said, sometime llvm gets confused by data movement that involves pointers and allocas.

Brendan Hansknecht (Jan 18 2025 at 00:43):

Not same for tags though. They are opaque to llvm due to being unions and a major cause of most alloca and data movement that sticks around.

Brendan Hansknecht (Jan 18 2025 at 00:47):

I think generally that won't be an issue. You likely will just get one specialization to a function. The only special case will be if a record is open going into a function and then you return rest still leaving it open. That will then specialize per record type passed in.

fn : {a : Str, ..rest } -> { ..rest }
fn = \{a, ..rest} ->
    dbg a
    rest

Brendan Hansknecht (Jan 18 2025 at 00:49):

Though I guess any function that takes an open record (which is all of them) is already susceptible to this. So no real change.

This will actually specialize just as much as the function above. For every different shaped record passed in, it is a new specialization.

fn : {a : Str} -> Str
fn = \{a} ->
    a

Sam Mohr (Jan 18 2025 at 00:50):

Brendan Hansknecht (Jan 18 2025 at 00:53):

Might even be worth reconsidering record update to allow adding fields (though that has more weird consequences about exactly how it will work)

Richard Feldman (Jan 18 2025 at 01:06):

yeah in general I think we can plan to revisit record features sometime after 0.1.0

Richard Feldman (Jan 18 2025 at 01:07):

definitely not urgent and they can all be nonbreaking changes as long as we've already switched to the .. syntax

Sam Mohr (Jan 18 2025 at 01:09):

Yeah, that last part is the thing I'm gonna try to fix this weekend as the last syntax push. I think { x, y } now being closed instead of open ({ x, y, .. } is now open) could break stuff, so it'd be nice to get that in

Stream: ideas

Topic: exhaustive record destructuring

Richard Feldman (Dec 16 2023 at 23:35):

Richard Feldman (Dec 16 2023 at 23:35):

Richard Feldman (Dec 16 2023 at 23:35):

Richard Feldman (Dec 16 2023 at 23:36):

David Mell (Dec 16 2023 at 23:45):

Agus Zubiaga (Dec 17 2023 at 00:08):

Agus Zubiaga (Dec 17 2023 at 00:17):

Agus Zubiaga (Dec 17 2023 at 00:19):

Richard Feldman (Dec 17 2023 at 00:20):

Agus Zubiaga (Dec 17 2023 at 00:23):

Agus Zubiaga (Dec 17 2023 at 00:34):

Brendan Hansknecht (Dec 17 2023 at 01:21):

Brendan Hansknecht (Dec 17 2023 at 01:21):

Pearce Keesling (Dec 17 2023 at 04:28):

Eli Dowling (Dec 18 2023 at 06:14):

Anton (Dec 18 2023 at 09:49):

Anton (Dec 18 2023 at 09:49):

Anton (Dec 18 2023 at 09:49):

Anton (Dec 18 2023 at 09:50):

Kevin Gillette (Dec 22 2023 at 22:20):

Kevin Gillette (Dec 22 2023 at 22:26):

Richard Feldman (Jan 02 2025 at 22:10):

Richard Feldman (Jan 02 2025 at 22:11):

Richard Feldman (Jan 02 2025 at 22:13):

Richard Feldman (Jan 02 2025 at 22:13):

Anthony Bullard (Jan 02 2025 at 22:46):

Richard Feldman (Jan 02 2025 at 22:50):

Kilian Vounckx (Jan 03 2025 at 07:36):

Dawid Danieluk (Jan 16 2025 at 15:50):

Sam Mohr (Jan 16 2025 at 18:13):

Anthony Bullard (Jan 16 2025 at 19:13):

Anthony Bullard (Jan 16 2025 at 19:14):

Sam Mohr (Jan 16 2025 at 19:28):

Sam Mohr (Jan 16 2025 at 19:28):

Sam Mohr (Jan 16 2025 at 19:29):

Sam Mohr (Jan 16 2025 at 19:30):

Sam Mohr (Jan 16 2025 at 19:30):

Anthony Bullard (Jan 16 2025 at 22:29):

Sam Mohr (Jan 16 2025 at 22:30):

Anthony Bullard (Jan 16 2025 at 22:31):

Anthony Bullard (Jan 16 2025 at 22:32):

Anthony Bullard (Jan 16 2025 at 22:32):

Sam Mohr (Jan 16 2025 at 22:33):

Anthony Bullard (Jan 16 2025 at 22:33):

Sam Mohr (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:34):

Anthony Bullard (Jan 16 2025 at 22:35):

Anthony Bullard (Jan 16 2025 at 22:36):

Sam Mohr (Jan 16 2025 at 22:36):

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:37):

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:37):

Anthony Bullard (Jan 16 2025 at 22:37):

Sam Mohr (Jan 16 2025 at 22:38):

Anthony Bullard (Jan 16 2025 at 22:38):

Anthony Bullard (Jan 16 2025 at 22:39):

Sam Mohr (Jan 16 2025 at 22:39):

Anthony Bullard (Jan 16 2025 at 22:39):

Anthony Bullard (Jan 16 2025 at 22:39):

Anthony Bullard (Jan 16 2025 at 22:39):

Sam Mohr (Jan 16 2025 at 22:40):

Anthony Bullard (Jan 16 2025 at 22:40):

Anthony Bullard (Jan 16 2025 at 22:40):

Richard Feldman (Jan 16 2025 at 22:40):

Anthony Bullard (Jan 16 2025 at 22:40):

Anthony Bullard (Jan 16 2025 at 22:40):

Anthony Bullard (Jan 16 2025 at 22:41):

Sam Mohr (Jan 16 2025 at 22:41):

Anthony Bullard (Jan 16 2025 at 22:41):

Anthony Bullard (Jan 16 2025 at 22:42):

Sam Mohr (Jan 16 2025 at 22:43):

Sam Mohr (Jan 16 2025 at 22:43):

Sam Mohr (Jan 16 2025 at 22:43):

Anthony Bullard (Jan 16 2025 at 22:43):