Advanced Record Manipulation
I've been thinking a bit about what record utility functions and operators would be worthwhile for manipulating records. I documented some of my thoughts below and would love to hear other's perspectives on these -- and on other valuable record utility functions, if there are any. I believe strongly in the value of being able to merge multiple records together but am unclear on the value of the rest of these, especially given the constraints around union types in Roc.
I'm quite inexperienced in Roc and its related languages (like Elm) so would be happy to learn more about why I am thinking about any of these things wrong if I am!
It can often be useful to merge many records together, e.g. if you are pulling data from many sources and want to reconcile it into a single record.
Proposed syntax:
» record = { a: 1, b: 2 }
… updates = { a: 3 }
… { record & update }
{ a: 3, b: 2 } : { a : Num *, b: Num * }
This syntax fits very naturally with the existing record update syntax: { record & a: 3 }. In fact, you might assume the proposed syntax Just Works given the record update syntax (I did!). Beyond the utility of merging records, I think there's value to the proposed syntax just in virtue of the principle of least surprise.
An open question: Should the operands be able to be records of any shape, or should the second operand be a subset of the first operand? E.g., should this work?
» record = { a: 1, b: 2 }
… updates = { a: 3, c: 4 }
… { record & update }
{ a: 3, b: 2, c: 4 } : { a : Num *, b: Num *, c: Num * }
What about this?
» record = { a: 1, b: 2 }
… updates = { a: "string", c: 4 }
… { record & update }
{ a: "string", b: 2, c: 4 } : { a : Str, b: Num *, c: Num * }
I suspect these latter examples have use cases but I haven't thought of any. Since the current record update syntax does not allow you to change the type of fields or add new fields, the proposed syntax probably shouldn't either. But maybe we should eventually include a method in the standard library to do this, like
» Record.merge { a: 1, b: 2 } { a: "string", c: 4 }
{ a: "string", b: 2, c: 4 } : { a : Str, b: Num *, c: Num * }
» Record.except { a: 1, b: 2, c: 3 } ["b", "c"]
{ a: 1 } : { a: Str }
Should the list be stringified keys? I think it's natural to want the list to be [.b, .c] but I don't think that works since .b and .c have different types.
» Record.pick { a: 1, b: 2, c: 3 } ["a"]
{ a: 1 } : { a: Str }
» Record.keys { a: 1, b: 2, c: 3 }
["a", "b" , "c"] : List Str
Does it make sense to represent the keys as strings? Is this function valuable at all without the ability to convert those keys into values (see below)?
» Record.keysFns { a: 1, b: 2, c: 3 }
[.a, .b, .c] : List ?
I'm guessing this one doesn't work since .a, .b, and .c all have different types ({ a : a }* -> a, { b : a }* -> a, and { c : a }* -> a respectively)
This is tricky since the values can be of different types. One option would be something like
» Record.values { a: 1, b: "2", c: 3 }
[ Num 1, Str "2" , Num 3 ] : List [ Num (Num *), Str Str ]
Unfortunately this doesn't work for nested records like
Record.values { a: 1, b: "2", c: { d: 3 }, e: { f: 5 } }
since AFAIK there is no way to have a tag which generalized over records, so you would need a different tag for each nested record. (Is there a strategy I'm missing here?)
FWIW, it seems like it is likely to be a common pattern in Roc to want to tag values with their type as the tag name. This shows up in both the tutorial and in some tools people are already building, like strify (https://github.com/JanCVanB/Strify/blob/main/Strify.roc). Maybe this will be less common once abilities land and you can just encode values? If it is common it seems reasonable to have a convention for naming the tags (Str, StrValue, and StrElem are all possible candidates for tagging strings, for example).
This runs into the same problems as getting the values from a record but would be handy and would also remove the need for many of the above functions since you could just convert into a dictionary and then get the dictionary's keys, for instance.
Very intersting ideas. My rough comments/questions:
I totally like the first example. The second example of adding a field seems ok, but I would worry that many cases it would be a sign of a bug. It also may lead to more data copying. Instead of making record big enough to hold all values from the beginning (with default value or optional fields), you have to generate a new record. I think the third example should be a type mismatch. I don't think we should even add Record.merge for it. I think that kind of change should always be explicit.
I find this one really interesting. I think supporting something like this would be reasonable, but I definitely don't like it taking string keys. So I agree that [ .b, .c] looks nicer. That being said, records are not dictionaries with string keys. They are a chunk of data that happen to contain named references to subsets of the data. We also already essentially support this via open records. I feel like that should likely be the official way to remove a field, but if you only want to remove a single field, it would be a hassle. Do you have a specific use case in mind for this were you want removing record fields and don't want a dictionary? Just feels like a misuse of a record to me.
I would again point to open records here. Just pass to a function that takes an open record. In the case you don't want to pass to a function then just write: new_record = { a: old_record.a }. I think that pick shouldn't be needed, but maybe I just am not understanding the use case.
Added note: I guess if we add key fns from below, I think this would just become applying a key fn on a record. I don't think it would make sense to have an explicit function Record.pick
I don't think this makes sense, a record is not a dictionary and we don't have a way at runtime to map from a string to a record field. I guess it may be useful for a stringify library though.
This seems like it could lead to some really interesting code. I would definitely be interesting in trying it out. The big problem I see is that the functions might work on one record and not another, so it might get confusing. For one record, .a might be an I32 for another an I64. Those create 2 different functions. Probably not a big issue though, just may lead to more confusing errors in some cases.
I think this would essentially be runtime reflection, except generated at compile time. For the tag over a record, it could maybe be a tag containing a list of key fns, but that probably doesn't actually work either. Either way, I really hope that abilities fix this and we don't need to add something like this.
Later thought: This being hard might be a good thing. We don't want Roc to be a dynamically typed language. Making Roc act like one might be an antifeature 99% of the time. Stringifying being an exception.
Should be solvable with abilities, I think. I think having it as a function otherwise doesn't make much sense given everything is typed in Roc and that would essentially be requesting dynamic types.
@Brendan Hansknecht Thanks for your thoughts! I think I largely agree with you. Definitely agree that the third merge example seems like it should be a type mismatch and I can't think of a use case where you would actually want it to override the type of the field.
For removing and picking fields from records -- I think the main use case that comes to mind is elegantly encoding into JSON (or any other encode target). E.g. if I have a record record = {a: 1, b: 2, c: 3} and want to encode into JSON for transmitting to some third party that requires me to specify only {a: 1, b: 2} then this line:
Encode.encode JSON.encoder (Record.except record [.c])
is a lot nicer in some cases than:
Encode.encode JSON.encoder { a: record.a, b: record.b }
especially as the record grows in size.
For picking I think there is not as much of a benefit since
Encode.encode JSON.encoder (Record.pick record [.a, .b])
is pretty similar in structure to
Encode.encode JSON.encoder { a: record.a, b: record.b }
even for records with 10+ fields. But that said I think it still looks slightly better and is easier to write?
In some languages -- especially dynamic ones -- pick/except or their variants are pretty frequently used. I want to take some time to look at applications in those languages to see the use cases to see if there's anything that can help motivate their inclusion beyond encoding.
We don't want Roc to be a dynamically typed language. Making Roc act like one might be an antifeature 99% of the time.
I wonder about this. One of the advantages of dynamically typed languages is that you rarely are fighting with the language when trying to implement an idea, which is why for proofs of concepts and MVPs they tend to allow you to be extremely productive. I think it might be the case that, as long as the type system remains sound there are some advantages to making the language support relatively dynamic features. (Though at the same time I know software developers love to build unnecessary abstractions that make everything harder, and the more dynamic features your language has the easier it is to build such abstractions).
I do hope that abilities (specifically the encode ability) provide a really elegant way to achieve most of this functionality. I'm not sure what it looks like to build an encoder like JSON.encoder or Encode.str. @Richard Feldman do you have an example of what the implementation of encoders would look like?
Here are some possibly-outdated links about abilities and Encode.str, which I hope will make my Strify library utterly useless :)
... Encode.str which encodes values as strings, and which can be used like toStr except you have to actually call (Encode.encode Encode.str value) instead of toStr. This would mean that although it's an option, it's (by design!) less ergonomic than a flexible function like Num.fromStr, which means the path of least resistance (and least error-proneness) is to use Num.fromStr instead of this.
I think the merge operator is a good idea. It would require a change in the current semantics since today record updates only apply for existing fields, but I agree it's a natural extension and (to me) it seems to align with the broad goals of the language (as far as I understand them, anyway, which may be wrong :) ). The change to semantics wouldn't be difficult either in implementation, or more importantly, in teaching.
I also really like the "pick" and "remove" operators. I think techniques like these make it really to easy and nice to express certain ideas, and help with flow during rapid development. And they fit especially naturally in languages with anonymous unions. IMO the Pick and Omit generic types in TypeScript are two of the most powerful ones (for context, Pick<{a: 1, b: 2, c: 3}, 'a'|'b'> = {a: 1, b: 2}> and Omit<{a: 1, b: 2, c: 3}, 'a'|'b'> = {c: 3}. Another nice thing is if we are clever in the implementation, these are zero-cost operations - they need not induce any runtime overhead, living only in the type system.
That said, I would prefer these to be type/syntax-level operators (e.g. rcd^{a, b} and rcd\{a, b}; not saying this should be the syntax, just to illustrate) rather than things implemented in the language stdlib itself. My reasoning is that
Record.pick. It would probably need to look something like {}a, List (Partial<b> (KeysOf a)) -> Partial<b> a.. but now Partial<b> and KeysOf are somewhat "magical" types. Partial<b> says that we need to know exactly what keys to include at the time that pick is called, and its appearance in the return type signifies that those keys are exactly the ones in the record being returned. And KeysOf is a magical type that would tell the compiler "treat the keys of this record as values", but which (in my opinion) should not have a runtime representation.Partial <b> themselves usefully, because the type would entirely opaque to them (what <b> could not be resolved until typechecking). All this is to say that even if we make them look like stdlib functions I think they would really always have to be language/compiler builtins - which isn't a bad thing, there is precedent for that already, but might make it harder to reason about, "okay, why can that function do that, but I can't implement the same thing?"I don't think you are accurate about them being "zero-cost operations". I think some(most?) of the time they will get optimized to have no cost, but other times they will be forced to incur a cost. Specifically around function call boundaries.
some_record: {a: BigType, b: BigType, c: BigType}
some_record = {a: ..., b: ..., c: ...}
doSomething: {a: BigType, b: BigType, c: BigType} -> U64
doSomething = \r ->
(someComp r.a) + (someComp r.c)
doSomething some_record
When calling doSomething some_record here, the record will be passed by reference. This is a single push of a memory address. Then the computation is run.
If instead we use pick and the code looks like this:
some_record: {a: BigType, b: BigType, c: BigType}
some_record = {a: ..., b: ..., c: ...}
doSomething: {a: BigType, c: BigType} -> U64
doSomething = \r ->
(someComp r.a) + (someComp r.c)
doSomething (Pick some_record [ .a, .c])
Now we have a memory problem. doSomething expects a and c to be contiguous in memory. In some_record, they are not contiguous in memory. As such, we need to allocate a new chunk of stack space, copy over a and c and then pass the reference of that chunk of stack space to doSomething. The is potentially a rather hefty performance hit.
Note: with a smart enough compiler and changing doSomething to take an open record, I think you could remove the cost, but then you are requiring users to know to use an open record there or take a performance hit.
If instead a user calling doSomething had to write doSomething {a: some_record.a, b: some_record.b}, I think they would immediately see that they are copying data. They also, would get annoyed at typing out the record fields. Those together would likely promote the user to change doSomething to take an open record for the nicer syntax of doSomething some_record
you’re right, what i should have said is “if the original and reduced type are instantiations of what the value is used as, no copy is needed”. actually it’s never zero cost since you might need to bump the reference count. but anyway i don’t think this should be a huge consideration either for the merits or lack thereof of the feature
there’s a spectrum of how flexible record manipulation can be/is in a language; I agree in that I don’t think it should be as flexible as treating records as dictionaries (for Roc’s use case) but I think having adhoc record updates/expansions/contractions is natural given that there are ad hoc records
For sure! I like a lot of the ideas behind theses features. I just am trying to give a fuller picture. I just feel that depending on how some of these are implemented, they could lead to a number of situations where performance is suddenly terrible and it is hard to tell why.
If we simply force Pick to return an open record that can't be passed as a closed records, I think that might already fix most of the performance related concerns, but it would probably confuse users. Pick record [.a, .b] is not of type {a : Type, b: Type}.
yeah, that would be confusing. it also would require a copy if you pass it to something that takes a “{a: …, b:…}”, unless i’m missing something obvious (sorry for the poor formatting, on my phone”
So I think we can track and minimize that:
{a: ..., b: ..., c: ...} and function wants {a: ..., c: ...}). A copy was needed any. So do the copy. Though changing to receiving an open record {a: ..., c: ...}* would be even better.Note: you could also theoretically optimize to avoid copy in more cases. If the wanted closed record wanted {b: ..., c: ...}, and b was properly aligned, you could also load the address of b and avoid the copy.
But if you mess up and alignment is wrong you might get a segfault.
Right but we don’t need to return an open record from “pick” to do that. We can just store the original layout as a “shadow”. At type inference time we just check that the original and the smaller layout are consistent with all usages, and during code generation we discard the smaller layout and just always use the shadow
I guess so, I think the important part is that pick doesn't return a regular record type. It is somehow propagating the original record information.
yeah exactly. that’s why the type would have to be somewhat magical, or at least different than other types we have currently (effective dual of row variables), so I think it may be better as a syntactic/language feature rather than a stdlib function
one interesting use case for merge: a function to model a JOIN in a query builder for a relational database
like if I want to say "take this query I've been building up, which will return rows of this record type, and join in a table which will return rows of this other record type, then I should get back a query whose rows will have a type that's the union of those two records' fields"
the alternative would be to specify a translation function that let you combine the two rows in a wrapper (e.g. { foo: { columns from first table go in here }, bar: { columns from second table go in here } })
which might be necessary for LEFT JOIN anyway
Figuring out the removal / pick syntax seems important if we want to seriously consider those features. I toyed around with some possibilities but didn't feel like anything read clearly at all.
Perhaps {rcd \ a, b } to say "remove fields a and b from rcd"?
I like this syntax {existingRecord1..{a, b}, existingRecord2..{c}} for picking and combining. (order matters for the combining)
Can say record..{x} takes field x, and record..{} or record..{*} takes all fields from the record.
Removal can look similar record..~{x, y}
{r1..{}, r2..{z}, w: 0} can avoid intermediate copies and just create one combined objectinteresting thought: if we had a "take this record and remove a field" syntax, that could be handy if floats end up not supporting equality.
you could do (making up syntax here) { myRecord -x -y } == { otherRecord -x -y } to see if the records are equal when excluding the float fields
and then separately compare x and y using something like Num.isApproxEq
That'd be useful if you ever had a record with a function value, too, although maybe doing that is an antipattern.
depends on what the function does, I suppose!
like if it's just a thunk to delay a computation for performance reasons, you could evaluate it afterwards and do == on its return value, similar to the float approach
For the [JSON] encoding example, while it may not apply in the same way for all languages or in Roc, in my daily work I've found code is easier to reason about if the internal and external representations receive their own, complete record types, with functions translating between the types.
I believe this separation is good even if those representations happen to be the same, because they will likely diverge at some point. For example, an internal representation should be minimal (keep impossible states impossible) while external representations will often have convenience fields or omit internal details, and also have different names to retain compatibility with earlier contracts.
Using the same record type for both use cases creates inflexibility (if I add this field to the response, then all my persistence layer tests break!), and if one type is primarily defined in terms of another (the response is just the internal record minus this one and plus these 5), then you risk changing your contract when you change your internal implementation details, and in any case that's a kind of "spaghetti typing" and thus you need to jump between multiple definitions to even understand what the response contract looks like in isolation.
That said, the adding-or-removing-fields approach would be useful in those conversion functions.
Regarding database joins, how would natural joins and using be handled cleanly (when the same field has the same name in multiple tables and they can be collapsed together)? How would name collisions be handled (when you _don't_ want the same field name from multiple tables to collapse into a single Roc field)?
Last updated: Jun 16 2026 at 16:19 UTC