Advanced Record Manipulation · ideas

I've been thinking a bit about what record utility functions and operators would be worthwhile for manipulating records. I documented some of my thoughts below and would love to hear other's perspectives on these -- and on other valuable record utility functions, if there are any. I believe strongly in the value of being able to merge multiple records together but am unclear on the value of the rest of these, especially given the constraints around union types in Roc.

I'm quite inexperienced in Roc and its related languages (like Elm) so would be happy to learn more about why I am thinking about any of these things wrong if I am!

Merge multiple records

It can often be useful to merge many records together, e.g. if you are pulling data from many sources and want to reconcile it into a single record.

» record = { a: 1, b: 2 }
… updates = { a: 3 }
… { record & update }
{ a: 3, b: 2 } : { a : Num *, b: Num * }

This syntax fits very naturally with the existing record update syntax: { record & a: 3 }. In fact, you might assume the proposed syntax Just Works given the record update syntax (I did!). Beyond the utility of merging records, I think there's value to the proposed syntax just in virtue of the principle of least surprise.

An open question: Should the operands be able to be records of any shape, or should the second operand be a subset of the first operand? E.g., should this work?

» record = { a: 1, b: 2 }
… updates = { a: 3, c: 4 }
… { record & update }
{ a: 3, b: 2, c: 4 } : { a : Num *, b: Num *, c: Num * }

» record = { a: 1, b: 2 }
… updates = { a: "string", c: 4 }
… { record & update }
{ a: "string", b: 2, c: 4 } : { a : Str, b: Num *, c: Num * }

I suspect these latter examples have use cases but I haven't thought of any. Since the current record update syntax does not allow you to change the type of fields or add new fields, the proposed syntax probably shouldn't either. But maybe we should eventually include a method in the standard library to do this, like

» Record.merge { a: 1, b: 2 } { a: "string", c: 4 }
{ a: "string", b: 2, c: 4 } : { a : Str, b: Num *, c: Num * }

Remove fields from a record

» Record.except { a: 1, b: 2, c: 3 } ["b", "c"]
{ a: 1 } : { a: Str }

Should the list be stringified keys? I think it's natural to want the list to be [.b, .c] but I don't think that works since .b and .c have different types.

Pick fields from a record

» Record.pick { a: 1, b: 2, c: 3 } ["a"]
{ a: 1 } : { a: Str }

Get the keys from a record

» Record.keys { a: 1, b: 2, c: 3 }
["a", "b" , "c"] : List Str

Does it make sense to represent the keys as strings? Is this function valuable at all without the ability to convert those keys into values (see below)?

Get the key fns from a record

» Record.keysFns { a: 1, b: 2, c: 3 }
[.a, .b, .c] : List ?

I'm guessing this one doesn't work since .a, .b, and .c all have different types ({ a : a }* -> a, { b : a }* -> a, and { c : a }* -> a respectively)

Get the values from a record

This is tricky since the values can be of different types. One option would be something like

» Record.values { a: 1, b: "2", c: 3 }
[ Num 1, Str "2" , Num 3 ] : List [ Num (Num *), Str Str ]

Record.values { a: 1, b: "2", c: { d: 3 }, e: { f: 5 } }

since AFAIK there is no way to have a tag which generalized over records, so you would need a different tag for each nested record. (Is there a strategy I'm missing here?)

FWIW, it seems like it is likely to be a common pattern in Roc to want to tag values with their type as the tag name. This shows up in both the tutorial and in some tools people are already building, like strify (https://github.com/JanCVanB/Strify/blob/main/Strify.roc). Maybe this will be less common once abilities land and you can just encode values? If it is common it seems reasonable to have a convention for naming the tags (Str, StrValue, and StrElem are all possible candidates for tagging strings, for example).

Convert a record into a dictionary

This runs into the same problems as getting the values from a record but would be handy and would also remove the need for many of the above functions since you could just convert into a dictionary and then get the dictionary's keys, for instance.

Brendan Hansknecht (Feb 18 2022 at 18:33):

Merge multiple records

I totally like the first example. The second example of adding a field seems ok, but I would worry that many cases it would be a sign of a bug. It also may lead to more data copying. Instead of making record big enough to hold all values from the beginning (with default value or optional fields), you have to generate a new record. I think the third example should be a type mismatch. I don't think we should even add Record.merge for it. I think that kind of change should always be explicit.

Remove fields from a record

I find this one really interesting. I think supporting something like this would be reasonable, but I definitely don't like it taking string keys. So I agree that [ .b, .c] looks nicer. That being said, records are not dictionaries with string keys. They are a chunk of data that happen to contain named references to subsets of the data. We also already essentially support this via open records. I feel like that should likely be the official way to remove a field, but if you only want to remove a single field, it would be a hassle. Do you have a specific use case in mind for this were you want removing record fields and don't want a dictionary? Just feels like a misuse of a record to me.

Pick fields from a record

I would again point to open records here. Just pass to a function that takes an open record. In the case you don't want to pass to a function then just write: new_record = { a: old_record.a }. I think that pick shouldn't be needed, but maybe I just am not understanding the use case.

Added note: I guess if we add key fns from below, I think this would just become applying a key fn on a record. I don't think it would make sense to have an explicit function Record.pick

Get keys from a record

I don't think this makes sense, a record is not a dictionary and we don't have a way at runtime to map from a string to a record field. I guess it may be useful for a stringify library though.

Get the key fns from a record

This seems like it could lead to some really interesting code. I would definitely be interesting in trying it out. The big problem I see is that the functions might work on one record and not another, so it might get confusing. For one record, .a might be an I32 for another an I64. Those create 2 different functions. Probably not a big issue though, just may lead to more confusing errors in some cases.

Get the values from a record

I think this would essentially be runtime reflection, except generated at compile time. For the tag over a record, it could maybe be a tag containing a list of key fns, but that probably doesn't actually work either. Either way, I really hope that abilities fix this and we don't need to add something like this.

Later thought: This being hard might be a good thing. We don't want Roc to be a dynamically typed language. Making Roc act like one might be an antifeature 99% of the time. Stringifying being an exception.

Convert a record into a dictionary

Should be solvable with abilities, I think. I think having it as a function otherwise doesn't make much sense given everything is typed in Roc and that would essentially be requesting dynamic types.

Tommy Graves (Feb 18 2022 at 19:30):

@Brendan Hansknecht Thanks for your thoughts! I think I largely agree with you. Definitely agree that the third merge example seems like it should be a type mismatch and I can't think of a use case where you would actually want it to override the type of the field.

For removing and picking fields from records -- I think the main use case that comes to mind is elegantly encoding into JSON (or any other encode target). E.g. if I have a record record = {a: 1, b: 2, c: 3} and want to encode into JSON for transmitting to some third party that requires me to specify only {a: 1, b: 2} then this line:

Encode.encode JSON.encoder (Record.except record [.c])

Encode.encode JSON.encoder { a: record.a, b: record.b }

Encode.encode JSON.encoder (Record.pick record [.a, .b])

Encode.encode JSON.encoder { a: record.a, b: record.b }

even for records with 10+ fields. But that said I think it still looks slightly better and is easier to write?

In some languages -- especially dynamic ones -- pick/except or their variants are pretty frequently used. I want to take some time to look at applications in those languages to see the use cases to see if there's anything that can help motivate their inclusion beyond encoding.

I wonder about this. One of the advantages of dynamically typed languages is that you rarely are fighting with the language when trying to implement an idea, which is why for proofs of concepts and MVPs they tend to allow you to be extremely productive. I think it might be the case that, as long as the type system remains sound there are some advantages to making the language support relatively dynamic features. (Though at the same time I know software developers love to build unnecessary abstractions that make everything harder, and the more dynamic features your language has the easier it is to build such abstractions).

I do hope that abilities (specifically the encode ability) provide a really elegant way to achieve most of this functionality. I'm not sure what it looks like to build an encoder like JSON.encoder or Encode.str. @Richard Feldman do you have an example of what the implementation of encoders would look like?

jan kili (Feb 18 2022 at 19:46):

Here are some possibly-outdated links about abilities and Encode.str, which I hope will make my Strify library utterly useless :)

jan kili (Feb 18 2022 at 19:48):

Ayaz Hafiz (Feb 19 2022 at 04:08):

I think the merge operator is a good idea. It would require a change in the current semantics since today record updates only apply for existing fields, but I agree it's a natural extension and (to me) it seems to align with the broad goals of the language (as far as I understand them, anyway, which may be wrong :) ). The change to semantics wouldn't be difficult either in implementation, or more importantly, in teaching.

I also really like the "pick" and "remove" operators. I think techniques like these make it really to easy and nice to express certain ideas, and help with flow during rapid development. And they fit especially naturally in languages with anonymous unions. IMO the Pick and Omit generic types in TypeScript are two of the most powerful ones (for context, Pick<{a: 1, b: 2, c: 3}, 'a'|'b'> = {a: 1, b: 2}> and Omit<{a: 1, b: 2, c: 3}, 'a'|'b'> = {c: 3}. Another nice thing is if we are clever in the implementation, these are zero-cost operations - they need not induce any runtime overhead, living only in the type system.

That said, I would prefer these to be type/syntax-level operators (e.g. rcd^{a, b} and rcd\{a, b}; not saying this should be the syntax, just to illustrate) rather than things implemented in the language stdlib itself. My reasoning is that

Brendan Hansknecht (Feb 19 2022 at 05:58):

I don't think you are accurate about them being "zero-cost operations". I think some(most?) of the time they will get optimized to have no cost, but other times they will be forced to incur a cost. Specifically around function call boundaries.

some_record: {a: BigType, b: BigType, c: BigType}
some_record = {a: ..., b: ..., c: ...}

doSomething: {a: BigType, b: BigType, c: BigType} -> U64
doSomething = \r ->
    (someComp r.a) + (someComp r.c)

doSomething some_record

When calling doSomething some_record here, the record will be passed by reference. This is a single push of a memory address. Then the computation is run.

some_record: {a: BigType, b: BigType, c: BigType}
some_record = {a: ..., b: ..., c: ...}

doSomething: {a: BigType, c: BigType} -> U64
doSomething = \r ->
    (someComp r.a) + (someComp r.c)

doSomething (Pick some_record [ .a, .c])

Now we have a memory problem. doSomething expects a and c to be contiguous in memory. In some_record, they are not contiguous in memory. As such, we need to allocate a new chunk of stack space, copy over a and c and then pass the reference of that chunk of stack space to doSomething. The is potentially a rather hefty performance hit.

Brendan Hansknecht (Feb 19 2022 at 05:59):

Note: with a smart enough compiler and changing doSomething to take an open record, I think you could remove the cost, but then you are requiring users to know to use an open record there or take a performance hit.

Brendan Hansknecht (Feb 19 2022 at 06:01):

If instead a user calling doSomething had to write doSomething {a: some_record.a, b: some_record.b}, I think they would immediately see that they are copying data. They also, would get annoyed at typing out the record fields. Those together would likely promote the user to change doSomething to take an open record for the nicer syntax of doSomething some_record

Ayaz Hafiz (Feb 19 2022 at 06:19):

you’re right, what i should have said is “if the original and reduced type are instantiations of what the value is used as, no copy is needed”. actually it’s never zero cost since you might need to bump the reference count. but anyway i don’t think this should be a huge consideration either for the merits or lack thereof of the feature

Ayaz Hafiz (Feb 19 2022 at 06:22):

there’s a spectrum of how flexible record manipulation can be/is in a language; I agree in that I don’t think it should be as flexible as treating records as dictionaries (for Roc’s use case) but I think having adhoc record updates/expansions/contractions is natural given that there are ad hoc records

Brendan Hansknecht (Feb 19 2022 at 06:25):

For sure! I like a lot of the ideas behind theses features. I just am trying to give a fuller picture. I just feel that depending on how some of these are implemented, they could lead to a number of situations where performance is suddenly terrible and it is hard to tell why.

Brendan Hansknecht (Feb 19 2022 at 06:27):

If we simply force Pick to return an open record that can't be passed as a closed records, I think that might already fix most of the performance related concerns, but it would probably confuse users. Pick record [.a, .b] is not of type {a : Type, b: Type}.

Ayaz Hafiz (Feb 19 2022 at 06:36):

yeah, that would be confusing. it also would require a copy if you pass it to something that takes a “{a: …, b:…}”, unless i’m missing something obvious (sorry for the poor formatting, on my phone”

Brendan Hansknecht (Feb 19 2022 at 06:41):

Brendan Hansknecht (Feb 19 2022 at 06:43):

Note: you could also theoretically optimize to avoid copy in more cases. If the wanted closed record wanted {b: ..., c: ...}, and b was properly aligned, you could also load the address of b and avoid the copy.

Brendan Hansknecht (Feb 19 2022 at 06:43):

Ayaz Hafiz (Feb 19 2022 at 06:45):

Right but we don’t need to return an open record from “pick” to do that. We can just store the original layout as a “shadow”. At type inference time we just check that the original and the smaller layout are consistent with all usages, and during code generation we discard the smaller layout and just always use the shadow

Brendan Hansknecht (Feb 19 2022 at 06:49):

I guess so, I think the important part is that pick doesn't return a regular record type. It is somehow propagating the original record information.

Ayaz Hafiz (Feb 19 2022 at 06:53):

yeah exactly. that’s why the type would have to be somewhat magical, or at least different than other types we have currently (effective dual of row variables), so I think it may be better as a syntactic/language feature rather than a stdlib function

Richard Feldman (Feb 22 2022 at 03:35):

one interesting use case for merge: a function to model a JOIN in a query builder for a relational database

Richard Feldman (Feb 22 2022 at 03:36):

like if I want to say "take this query I've been building up, which will return rows of this record type, and join in a table which will return rows of this other record type, then I should get back a query whose rows will have a type that's the union of those two records' fields"

Richard Feldman (Feb 22 2022 at 03:41):

the alternative would be to specify a translation function that let you combine the two rows in a wrapper (e.g. { foo: { columns from first table go in here }, bar: { columns from second table go in here } })

Richard Feldman (Feb 22 2022 at 03:41):

Tommy Graves (Feb 22 2022 at 20:27):

Figuring out the removal / pick syntax seems important if we want to seriously consider those features. I toyed around with some possibilities but didn't feel like anything read clearly at all.

Ayaz Hafiz (Feb 22 2022 at 20:40):

Yorye Nathan (Feb 23 2022 at 08:35):

I like this syntax {existingRecord1..{a, b}, existingRecord2..{c}} for picking and combining. (order matters for the combining)
Can say record..{x} takes field x, and record..{} or record..{*} takes all fields from the record.
Removal can look similar record..~{x, y}

Richard Feldman (Mar 17 2022 at 13:01):

interesting thought: if we had a "take this record and remove a field" syntax, that could be handy if floats end up not supporting equality.

Richard Feldman (Mar 17 2022 at 13:02):

you could do (making up syntax here) { myRecord -x -y } == { otherRecord -x -y } to see if the records are equal when excluding the float fields

Richard Feldman (Mar 17 2022 at 13:02):

Tommy Graves (Mar 17 2022 at 15:31):

That'd be useful if you ever had a record with a function value, too, although maybe doing that is an antipattern.

Richard Feldman (Mar 17 2022 at 15:32):

Richard Feldman (Mar 17 2022 at 15:33):

like if it's just a thunk to delay a computation for performance reasons, you could evaluate it afterwards and do == on its return value, similar to the float approach

Kevin Gillette (Mar 17 2022 at 16:24):

For the [JSON] encoding example, while it may not apply in the same way for all languages or in Roc, in my daily work I've found code is easier to reason about if the internal and external representations receive their own, complete record types, with functions translating between the types.

I believe this separation is good even if those representations happen to be the same, because they will likely diverge at some point. For example, an internal representation should be minimal (keep impossible states impossible) while external representations will often have convenience fields or omit internal details, and also have different names to retain compatibility with earlier contracts.

Using the same record type for both use cases creates inflexibility (if I add this field to the response, then all my persistence layer tests break!), and if one type is primarily defined in terms of another (the response is just the internal record minus this one and plus these 5), then you risk changing your contract when you change your internal implementation details, and in any case that's a kind of "spaghetti typing" and thus you need to jump between multiple definitions to even understand what the response contract looks like in isolation.

That said, the adding-or-removing-fields approach would be useful in those conversion functions.

Kevin Gillette (Mar 17 2022 at 16:27):

Regarding database joins, how would natural joins and using be handled cleanly (when the same field has the same name in multiple tables and they can be collapsed together)? How would name collisions be handled (when you _don't_ want the same field name from multiple tables to collapse into a single Roc field)?

Stream: ideas

Topic: Advanced Record Manipulation

Tommy Graves (Feb 18 2022 at 16:59):

Merge multiple records

Remove fields from a record

Pick fields from a record

Get the keys from a record

Get the key fns from a record

Get the values from a record

Convert a record into a dictionary

Brendan Hansknecht (Feb 18 2022 at 18:33):

Merge multiple records

Remove fields from a record

Pick fields from a record

Get keys from a record

Get the key fns from a record

Get the values from a record

Convert a record into a dictionary

Tommy Graves (Feb 18 2022 at 19:30):

jan kili (Feb 18 2022 at 19:46):

jan kili (Feb 18 2022 at 19:48):

Ayaz Hafiz (Feb 19 2022 at 04:08):

Brendan Hansknecht (Feb 19 2022 at 05:58):

Brendan Hansknecht (Feb 19 2022 at 05:59):

Brendan Hansknecht (Feb 19 2022 at 06:01):

Ayaz Hafiz (Feb 19 2022 at 06:19):

Ayaz Hafiz (Feb 19 2022 at 06:22):

Brendan Hansknecht (Feb 19 2022 at 06:25):

Brendan Hansknecht (Feb 19 2022 at 06:27):

Ayaz Hafiz (Feb 19 2022 at 06:36):

Brendan Hansknecht (Feb 19 2022 at 06:41):

Brendan Hansknecht (Feb 19 2022 at 06:43):

Brendan Hansknecht (Feb 19 2022 at 06:43):

Ayaz Hafiz (Feb 19 2022 at 06:45):

Brendan Hansknecht (Feb 19 2022 at 06:49):

Ayaz Hafiz (Feb 19 2022 at 06:53):

Richard Feldman (Feb 22 2022 at 03:35):

Richard Feldman (Feb 22 2022 at 03:36):

Richard Feldman (Feb 22 2022 at 03:41):

Richard Feldman (Feb 22 2022 at 03:41):

Tommy Graves (Feb 22 2022 at 20:27):

Ayaz Hafiz (Feb 22 2022 at 20:40):

Yorye Nathan (Feb 23 2022 at 08:35):

Richard Feldman (Mar 17 2022 at 13:01):

Richard Feldman (Mar 17 2022 at 13:02):

Richard Feldman (Mar 17 2022 at 13:02):

Tommy Graves (Mar 17 2022 at 15:31):

Richard Feldman (Mar 17 2022 at 15:32):

Richard Feldman (Mar 17 2022 at 15:33):

Kevin Gillette (Mar 17 2022 at 16:24):

Kevin Gillette (Mar 17 2022 at 16:27):