I was just looking into how serde handles this (our encode ability is based on that), and I do think we may have a gap in our encoder. Which may argue for something like Color::Red being clearly specified.
We currently have tag : Str, List (Encoder fmt) -> Encoder fmt | fmt has EncoderFormatting for encoding tags.
Serde has an extra piece of information. When encoding tags, it also pass in the variant index.
So the function would be tag : Str, Nat, List (Encoder fmt) -> Encoder fmt | fmt has EncoderFormatting
This enables encoding to protobuf and other formats that want to just use dense index forms.
To get full functionality here, we need the index. To get the index, we need to know the full type, not just [Red]*.
Even if we require opaque types here. There is still no way to encode a tag densely. We would at a minimum need to add a function that knows the tag is final. That way it can use the index instead of the name.
It sounds like your really on to sometheing there! :grinning_face_with_smiling_eyes: (details are over my head)
Basically, if we want to support encoding into dense formats instead of always encoding enums/tags as strings, we need the index. The index can only be calculated if we know the entire tag.
5 messages were moved here from #ideas > expose Bool.true as True by Richard Feldman.
after monomorphization, we should know this though
but this is actually a deeper rabbit hole I think!
it's a great point that we should probably provide the index so you can serialize to something like protobuf, and I totally missed that
and I do think monomorphization will take care of this so that it won't be a problem in practice
(that is, we'll always have the index)
however, the rabbit hole is: today, there is no way to go from a tag union to an index at runtime
this would make that possible for the first time
what are the implications of that? :thinking:
similarly, what are the implications for decoding?
like if we can encode a tag into a raw index integer, then we'll need to be able to do the reverse to decode it: go from a raw index integer into a tag union...so what are the implications of being able to go from index to tag?
We won't know the index if you do:
Encode.encode Red
We will think the index is 0 cause the tag is [Red]*. This is the same issue we described with bool essentially.
I think that's correct though, in this case
like you are in fact giving it an enumeration with one value in it
But that isn't what a user will want in practice
They will have Color : [Red, Green, Blue]
the difference is that if I'm encoding something into JSON (for example, but most formats have a distinction between bool and other enums), there is a totally separate encoding for booleans vs others
But maybe not an issue in practice cause encode and decode generally need to be used with types.
I get the separate encoding problem, but I am more talking about resolving to an unexpected type in the users eyes and not having a good way to correct the type.
yeah the difference is:
to me, a very important part of that distinction is that there's already a way to solve the former: as you noted earlier, you can name the variable and put a type annotation on it
so the concern is "this might be inconvenient in practice" as opposed to "this is ambiguous and there's no way for the compiler to help you other than guessing based on a heuristic"
Also, i think we could hide the tag value mostly from user land.
Make a special encode function that takes a tag value and then correctly dispatches to the current tag encode function.
So then the tag is would only even be used or seen by people who implement encoders and no where else
No built-in to extract the tag
I do think there's an interesting question about encoding literals though
like let's say I do this:
Encode.encode {
firstName: "Sam",
lastName: "Sample",
role: Guest,
}
that role encoder is going to see the type [Guest]
and I think the problem here is more fundamental than "just allow Role::Guest syntax"
Really?
yeah, I mean that's just a record literal
it just has an inferred type
it's not a User or anything, I could have just put that entire literal directly into the repl
so even if there's a Role::Guest syntax, I still have to remember to do that here
the problem is that if I wanted a Role : [Guest, Admin, Moderator] or whatever, I didn't think to specify that
Oh sure.
You have to use the syntax
I just put Guest because that's what I'd normally do, and it will work out fine everywhere else except specifically when passing a value to Encode.encode
But i get the point that it may be easy to miss
yeah this definitely seems like an easy mistake to make
Yeah
the obvious solution is to not give the Encode or Decode abilities to tag unions
That is kinda an argument for encode only working on opaque ttypes in order to ensure proper typing always.
so then you have to be explicit about how you want to encode or decode them
That's fair, but i don't think the current other encode functions are powerful enough to encode tags
hm, as in?
*they are powerful enough but very inconvenient. Look at serde. It has multiple functions around variant encoding.
As in a user doesn't want to manually generate a different struct for each variant.
yep
Oh, i guess we just can remove auto derived, but leave in the function for encoding a tag.
also it would create a disincentive to use tag unions in any data structure you might want to serialize, which is definitely the wrong incentive to create
Then the user could pick the index and it would avoid exposing the internal tag index
well the problem is, let's say today I have User : { ... } - I can just serialize and deserialize that right away without writing any more code
as soon as I put a tag union in User, if tag unions no longer have Encode and Decode, now I have to customize how to do that
What about tracking if there is a closed value that it could be referring to and if there is then its a warning or something.
interesting!
certainly we can have a warning if you give values of certain types to Encode.encode
So warnings plus Color::Red or similar syntax?
That sounds like it should work
to be honest, I'd just start with the warning first and then see if there's demand for the syntax in practice
like maybe it comes up very rarely and there isn't actually demand in practice
e.g. tests (for serialization formats, for example) are one scenario where I can imagine this coming up, but also I can imagine tests writing helper functions to generate data, and those helper functions would have type annotations which would take care of this automatically
my main concern is not inconvenience, but rather mistakes going undetected
(because the inconvenience seems likely to be rare in practice)
I'm not sure how the warning would interact with tag union polarity though :thinking:
@Ayaz Hafiz would probably have the best intuition around this
specifically the question of: suppose I write this
Encode.encode {
firstName: "Sam",
lastName: "Sample",
role: Guest,
}
...and we want to give a compiler warning like "hey you're giving Encode.encode a tag with the type [Guest] - this is probably not what you want, can you give that an explicit type annotation if it really is what you want?"
I'm not sure if that might have false positives
hm, actually - maybe the warning heuristic we want is that you're trying to encode a single-tag union?
nm it could still happen with a conditional
where one branch of the conditional had Guest and another had Moderator, so the type given to Encode.encode was [Guest, Moderator] - which has different indices than if Admin is in the mix
I gotta run some errands, will be back later...but great catch noticing this scenario! Glad to have found it before people started tripping over it in production :sweat_smile:
oh one more question I just thought of, before I head out: what size would the index be?
it'll almost always fit in U8, but we have logic to upgrade to U16 if you have more than 256 tags in the union
I guess we could always give U64 just to be safe, and then have a separate U64 argument for how many total tags there are, which would let you infer the smallest integer you could use to represent the discriminant (e.g. in protobuf)
ok nm guess that's not a hard one to solve :laughing:
Serde just went with u32 for reference
I can sadly imagine a tag that eventually is bigger than u16, but I think u32 is super safe size.
But yeah, i think give the max is even better. Them you get extra compression
just to put it out there in case we forget: another way to address the error-prone situation is to leave the API as-is, which would have the tradeoffs of:
(not saying it's the best option, but it is an option)
For sure
An extra note, we could keep the current form, but add an extra field to Encoder. Instead of just tag, add indexedTag that uses an integer instead of the name. That would never be used by auto derived, but it could be used by an opaque type. That way, if you want the denser encoding, you could do it manually with an opaque type.
This also means we don't have to expose the tag index. Instead the opaque type encode would define its own when to get the correct integer. So user defined as opposed to the internal compiler index.
Something like:
Color := [Red,Green,Blue] has Encode {encode}
encode = \encoder, color ->
when color is
Red -> Encode.indexedTag encoder 0 []
Green -> Encode.indexedTag encoder 1 []
Blue -> Encode.indexedTag encoder 2 []
Now color has dense encoding with 3 int values.
And if you don't really want the opaque type for regular code, just unwrap it right after encoding/decoding.
I like that because the encoding does not depend on the definition order
It would be alphabetical, not definition order
So that isn't an issue (if we exposed the internal tag id, i mean)
Ah, true. I guess that’d be even worse though because a protocol won’t likely use that indexing.
Protocol would just see it as an integer? I don't think protocol has any control over that value
I mean protocols/contracts usually have enum definitions and the integer associated to each value won't be based on alphabetical order
Yeah, if you want a special order, no matter what, you would have to fall back on opaque types and explicit definitions. The question is if we want to be able to auto derived alphabetical order.
I can only see that working if you're encoding data that only your Roc program is going to decode
So I think I'd prefer the compiler not to auto-derive this
yeah - imo deriving index based encoder/decoders seems really error-prone when doing anything that doesn't live entirely inside Roc. and even then, can be a mess if you think about a live system with different versions living together sometimes.
the explicit approach you mentioned seems pretty nice from my perspective
So yeah, maybe all tags should be strings by default and opaque is required for integer indices.
Just need to expand encode to support it
I guess the standard in a language like rust would be definition order for these things....but that sounds like a mistake in roc due to how tags reorder by default. Yeah, i like the opaque way thinking about this more.
this is an option, but I can imagine people being unhappy about the ergonomics in practice
I think it'll be pretty common to want to store (non-opaque) tag unions in data structures, e.g. enumerations like [Admin, Moderator, Guest]
and if you've done that a bunch and are suddenly like "oh we've decided to change serialization formats from JSON to protobuf, now I have to go back and rewrite a ton of code to deal with opaque wrappers solely because that's the only viable way to go to protobuf," that's a really unpleasant experience :grimacing:
separately, it bothers me that this would mean less efficient serialization formats would have better ergonomics than more efficient ones - e.g. "just always use JSON, that way you won't have to deal with opaque type wrappers" (or, alternatively, "always use Bool over tag unions, because those Just Work with protobuf and you don't have to write custom serialization code for them" - which to me would be even worse!)
ideally we could make the ergonomics of JSON about the same as for binary formats that use (typically) 1-byte discriminants for tags
none of that makes the footgun go away, of course, but it makes me feel motivated to try to find another way to remove it :big_smile:
Very true, though ordering will be a strange question. Since declaration order doesn't seem well suited for roc and always using alphabetical may be rough in cases
hm, what would be the issue with alphabetical?
the nice thing about alphabetical is that if you know the tag names in the union, you know the index
which in turn means the index doesn't change if the code around it changes (which is a downside of designs like having a global index across all tag names that are used in the program: you can add a tag name somewhere else and have it change other tags' indices and break things)
the warning idea seems promising to me, because this is something that we expect to come up rarely in practice (but be an easy-to-miss error if it does come up); if that ends up being true, then most people won't even see it
but if they do see it, it would hopefully be because they were actually about to make a mistake (as opposed to a false positive)
You could only use an index based on alphabetical order if you own the protobuf schema you’re talking to
Otherwise, I think you just won’t be able to use auto derived encoders because they won’t match up
Let's say you have this protobuf enum:
enum Role {
ROLE_GUEST = 0;
ROLE_MEMBER = 1;
ROLE_ADMIN = 2;
}
and this Roc type:
Role : [Guest, Member, Admin]
If I understand correctly:
Admin would encode to 0Guest would encode to 1 Member would encode to 2oh, true
yeah I was thinking about the case where you control both sides :thinking:
yeah I don't see a way around it in the case where somebody else is in charge of that mapping
auto-derived just couldn't possibly work in that scenario
although in that case, you'd probably use a code generator anyway
to generate explicit Roc encoders/decoders from the schema you're using as a source of truth
Yeah, protobuf implementations almost always use codegen
I guess I just don't see many cases where this could work apart from apps that want to dump their own data to disk or something
hm, so is the status quo actually a good option after all? :thinking:
as in, don't expose index
Richard Feldman said:
separately, it bothers me that this would mean less efficient serialization formats would have better ergonomics than more efficient ones - e.g. "just always use JSON, that way you won't have to deal with opaque type wrappers" (or, alternatively, "always use
Boolover tag unions, because those Just Work with protobuf and you don't have to write custom serialization code for them" - which to me would be even worse!)
This is a really good point, and the only solution that comes to mind is being able to implement abilities on non-opaque types, but I don't know if that's even possible
yeah the original motivation for adding opaque types to the language was that abilities wouldn't work otherwise :big_smile:
I suspected so
so I wonder what use cases exist where both of the following are true:
Any roc web app where you control the frontend?
Also, even if we keep status quo, we still need a new method for generating encoding tags via index in the Encoder trait.
Just it would be via explicit index.
I agree re. Brendan's point that you may want this for something like front-end/back-end that you control and you're okay with an auto-derived impl of a some usage like protobuf
I think we should add the tag index to both the encoding and decoding APIs. The auto-derived implementation would work as-is based on definition order (or any other order) and encoding formats could implement this optimization as desired. Concretely, the Encoding api would change to
# `tag {name, index} payloads` encodes a tag of `name` at `index` in its definition and a list of its payloads.
tag : {name: Str, index: U64}, List (Encoder fmt) -> Encoder fmt | fmt has EncoderFormatting
And the Decoding api would become (note that we haven't implemented auto derived decoding for tags yet!!)
## `discriminant {tagNames, maxIndex}` decodes the index of a tag given the names of the tags and the number of tags in the definition.
discriminant : {tagNames: List (List U8), maxIndex: U64} -> Decoder U64 fmt | fmt has DecoderFormatting
## `sequence state stepElem finalizer` decodes a possibly-heterogenous sequence representation into `state`.
sequence : state, (state -> [Keep (Decoder state fmt), Skip]), (state -> Result val DecodeError) -> Decoder val fmt | fmt has DecoderFormatting
The cost is one more load of a compile-time-known U64, which seems marginal.
@Ayaz Hafiz what do you think about the footgun mentioned earlier? e.g. that this:
Encode.encode {
firstName: "Sam",
lastName: "Sample",
role: Guest,
}
will always put role in as 0
because the type encode sees is [Guest], even if the intention is something else like [Guest, Moderator, Admin]
I think my bias is that I don't think that would happen in practice. Like, it seems you would catch that very quickly - if you are writing test code, you would probably try to deserialize it and see that it's a problem. Otherwise, you are likely passing bound, typed variables (not literals) and are less likely to run into this.
I predict it would happen rarely in practice, but I think if it does happen it could be very nasty
like that code looks totally normal and correct, and I think would not be likely to be caught in code review
and it type-checks
With the warning about open tags and Role::Guest I think that would fix most of the nastiness, right?
I think the warning by itself could be sufficient (if we're right that it happens very rarely in practice)
but my concern there is that the warning would give too many false positives
(I'm not sure how polarity factors in there)
That is fair. Warning is definitely enough of a start.
the warning idea is basically "if Encode.encode tries to encode a tag union that's still open, give a compile-time warning"
yeah
which would catch both the situation above, as well as the situation where I had role: set to a conditional where each branch returned a different tag, but the actual union I wanted had more than those two in it
but it wouldn't fire for any function with the return type User, where User is a type alias for that record which includes { ..., role : [Admin, Guest, Moderator] } - which would be closed, and therefore prevent the warning
so as long as I'm giving Encode.encode a User value that I made from a function like that, or if I'm manually annotating a variable (e.g. to get around the warning), I won't get a warning
well it would fire in that case
all tag unions are open in output position
Even if defined closed or inside another type?
You cannot define tag unions as closed unless they are under an opaque type or you take a tag union as an input to a function and return the same union (same as in same type)
yeah I was afraid of something like that :laughing:
so another heuristic I had an idea for is: give a warning for a single-tag union
or I guess maybe say single-tag unions don't have Encode
that seems more reasonable IMO. i don’t really see how you run into this unless you’re writing literals in test code. in every other case it seems like you’d flag this in code review given how structurally typed Roc is.
the single tag union warning, that is
which would have at least these tradeoffs:
yeah tests do seem like the most likely place for it to come up, agreed
I can easily see this happen...right?
The entire message doesn't need to be constant. Just one field.
So if I am in the admin branch of my code and go: role = Admin. Then eventually put role in a struct, that would lead to this issue, right?
if you had a conditional with different tags in each branch, it wouldn't catch that
I genuinely have a hard time imagining this ever coming up. Like you'd have to write either something like role: when ... or else have role assigned to an un-annotated variable that was a conditional, and even then it would only be a problem if you wrote out the whole literal like this, and it never got unified with anything that made the tag union have all the tags
Brendan Hansknecht said:
So if I am in the admin branch of my code and go:
role = Admin. Then eventually putrolein a struct, that would lead to this issue, right?
it depends :big_smile:
there is another side that’s a problem here, which is the decoding side - where the problem is far more likely. You are probably going to run into this there more than you are on the encoding side unless you have annotations (again I think mostly in test code, but I think the scenarios are easier to come up with- for example you match on the expectation of only one tag you want to see appear, and the rest falls into the wildcard and is not seen by the type system)
so it won't be an issue if something else is causing that role field to have the bigger type, for example:
role (e.g. instead of passing an anonymous literal to Encode.encode { ... } you have Encode.encode (makeUser ...) and makeUser is annotated to return a User, which specifies all the tags Role haswhat if we had auto derivers for opaque types pass the index, but derivations for the structural types do not
(I gotta run for a bit, back later!)
re: opaques, briefly - I mentioned that earlier:
Richard Feldman said:
this is an option, but I can imagine people being unhappy about the ergonomics in practice
If we want to fully remove the footgun, I believe the only option is opaque types - there is no other way to force a closed union.
I feel like this overall seems to suggest that we really want encode and decode to require typing info. Like never autoderived and always explicit (which as you just said above, is currently done in roc via opaque types).
As mentioned I’m not sure that we should try to fully avoid it.
I disagree. I think having it autoderived for structural types is a huge productivity boost for things like JSON over web services and prototyping
The challenge is balancing those kinds of use cases with the optimal cases like protobuf as you describe where you want the schema to be strict.
That's fair. That was why I suggested a separate index based tag encoder and string base tag encoder.
String base would autoderive (exactly like current roc). If you need indices, you need and opaque type and to specify them explicitly (in user land code, not alphabetical). Just make it an opaque type where you expose wrap and unwrap. Then you just need to wrap when throwing it in the final struct. Or you use the same as Bool.true for your type.
That does make it less convient to used the optimized version, but if you are using the optimized version, you are probably clearly defining all your types. So you just need to add a function that converts from the json friendly version of the type to the proto friendly version. That doesn't seem too hard if you want the perf gain from proto.
# Json version
Role: [User, Admin, Guest]
User: {id: U64, firstName: Str, lastName: Str, role: Role}
main =
....
# Constants always correct. Tags encode as strings.
Encode.encode {id: 3, firstName: "John", lastName: "Doe", role: Admin}
# Switch to proto (Still keep json types in code, but add proto type for boundaries)
# This is probably in its own module.
ProtoRole := [User, Admin, Guest] has Encode {encode: encodeRole}
encodeRole = \encoder, role ->
when role is
Admin -> Encode.indexedTag encoder 0 []
User -> Encode.indexedTag encoder 1 []
Guest -> Encode.indexedTag encoder 2 []
ProtoUser: {id: U64, firstName: Str, lastName: Str, role: ProtoRole}
fromUser = \{id, firstName, lastName, role} -> {id, firstName, lastName, role: @ProtoRole role}
main =
....
Encode.encode (Proto.fromUser {id: 3, firstName: "John", lastName: "Doe", role: Admin})
Agreed. Probably the main consideration in this case is what Richard mentioned, what is the cost of moving from a Json-based encoding to Protobuf based encoding since you’d need to perform this transformation globally?
This really is not a big deal if you want perf gain from proto or whatever other format, but maybe for some formats it could be painful. Like if a format wants flexibility, but is limited, thus must use the numbered indices (not sure if such formats exist).
Theoretically, fromUser doesn't actually need to do anything, but roc, probably doesn't know that.
Also, my point was that you can just do this at the encode edges and avoid changing your main code.
That way it isn't global.
true
It is just adding one method call after searching for each Encode.encode and Decode.decode
we could also have a tool that refactors all named structural types to be opaque types for you, I can imagine how that analysis is done
somewhat related, have we talked at all about whether we want the Roc tool chain to enforce semver for encode/decode, and if so how we do that? this conversation would be a part of that, but we also should discuss that (in a separate context) for what happens if a library changes how it encodes an opaque types.
not to sidetrack this conversation, just a note for later (can’t figure out how to make a new thread on my phone ): )
what if we had a warning regarding type annotations?
like we trace whether a type has unified with an annotation, right?
so we could say "hey you're using a tag union with encode/decode, you should really annotate that to make sure it's doing the thing you expect"
so you don't need to stop using structural types, just make sure to use an annotation sometime before you give them to encode or decode to make sure it's clear what you actually want to encode/decode
and it's just a warning bc we can of course do it without the annotation, so you're not blocked if you're e.g. doing some quick and dirty JSON prototyping
A use case where someone might want a custom order: Adding a new tag to something in a backwards-compatible way. Any automatic order (like alphabetical) could cause pre-existing data to get decoded incorrectly if the new tag doesn't happen to go to the end of the list.
All of the automatic orders seem magic and would make me afraid that it'd bite me sometime. Even the strictest way of automatically generating indices (definition order, only in places with annotations) seems like laying a trap. I want to be able to refactor and reorder my type definitions without having to worry about whether there's serialized data out there that would become incompatible. It seems helpful for quick and dirty JSON prototyping, but for any production code, I think we should push people towards writing explicit mappings from Tag to integer.
(A compiler warning for times when you prefer the convenience of automatically generated indices and accept the tradeoffs seems like a good way to allow that use case without endorsing it.)
Compiler warnings are made to block CI, so i don't think that is great for this case.
If we auto derived order, we should just document what it is. This is super common in many languages and generally not a problem. Serde does it and i have never heard anyone complain about it. I do think it is slightly different in roc because you can not specify enum ordering where it is defined, but most enums in languages like rust just have implicit declaration order.
Also, json wouldnt use indices by default, so it doesn't apply at all.
That said, i also think it would be totally reasonable to just not auto-derived indices and always make it explicit. So if you use auto derived you just get strings. I think that is clean and would work for most things. Also, wouldn't block something like proto, just would make a worse API without adding opaque types (due to using strings instead of enums).
yeah that's a good point
if we only auto-derived strings but not indices, then:
and then of course protobuf itself (and similar encodings) would use their own schemas as the source of truth anyway, and not auto-derived encoders/decoders
another thing I realized: it's the same situation when it comes to records
that is to say: the dense encoding also uses field indices instead of string names for fields
and it has the same backwards-compatibility concern (although not the same "what if you do a single-tag union" thing)
which is to say: if we're storing field indices instead of string labels, and I add a new field whose name doesn't happen to be alphabetically later than all the others, then its index will be somewhere in the middle, and now if I receive an older version of this type (using the same serialization format), it might successfully decode erroneously rather than giving an error
so it seems like in the case of both tags and records, if we want to offer auto-generated encoders and decoders, using string field and tag names is significantly less error-prone even in a binary representation (although it is of course significantly less compact than indices in both cases)
something like MessagePack
I think it would be safe to auto-generate encoders and decoders for MessagePack without concerns about backwards-compatibility potentially causing erroneous decodings or needing new compiler warnings
Oh, so for record fields we also need two versions. 1 auto derived with strings and another that can be explicitly implemented with indices?
That sucks. This feels like it is getting more edge case filled, less efficient, and less convenient overall.
or we just say auto-deriving is strings-only
and say if you want indices, you need to go to an explicit schema (e.g. via opaque types) and manage backwards-compatibility yourself
and auto-deriving is just never going to be an option there
Yeah, so less efficient, less convenient, and more chances people will need opaque types.
er sorry, which is less efficient and less convenient? :sweat_smile:
Records aren't like tags. You won't get a nice error if you miss a field like you will if you miss a tag variant.
sorry, I still don't follow - do you mean that offering record indices is a good idea? bad idea? something else?
I'm pretty ignorant on typical use cases for e.g. ProtoBuf, so I might be asking silly questions here. What are the situations where you would even want to auto-derive an encoder for something like ProtoBuf? Most often you would be generating Roc types and encoders/decoders from a protobuf definition file right?
But perhaps you're building a server and want to auto-derive an encoder/definition file that you can send to the team building the client application, skipping the step of writing a definition file and generating types. Is this the sort of use case we are thinking of here?
Is this discussion more focused on a use case where you need Some compact binary format to be used by the application you are writing, either on the same machine or a different machine? Save files, multiplayer interactions or collaboration, etc. In this case the format is a bit arbitrary, as your application is the only application that cares about the format. (A prime candidate for auto-derivation)
Trying to understand the problem space so I can follow the conversation a bit better :sweat_smile:
do you mean that offering record indices is a good idea? bad idea? something else?
Let's say that record and tags always auto-generate strings. When writing a tag encoder manually to use indices, you write:
ProtoRole := [User, Admin, Guest] has Encode {encode: encodeRole}
encodeRole = \encoder, role ->
when role is
Admin -> Encode.indexedTag encoder 0 []
User -> Encode.indexedTag encoder 1 []
Guest -> Encode.indexedTag encoder 2 []
When you have a record, you write:
ProtoUser := {id: U64, firstName: Str, lastName: Str, role: ProtoRole}
encodeUser = \encoder, user ->
encoder
|> Encode.indexedRecord 0 user.id
|> Encode.indexedRecord 1 user.firstName
|> Encode.indexedRecord 2 user.lastName
|> Encode.indexedRecord 3 user.role
Now imagine that you need to add a field to the tag or record. In the tag case, you get an compiler error in the encode function. In the record case, you just miss data.
Most often you would be generating Roc types and encoders/decoders from a protobuf definition file right?
Proto is reasonable to talk about because it is a format edge case, but you are correct that long term, proto should be auto generated from a definition file. That said, for proto to be generated, we need to define how we could support it; the first uses in roc will be hand written; and there are other formats without generators that have some or all of protos complexities. Maybe BSON would technically be a better format to talk about. If you are using BSON, you probably want more density, but you may be using it as just a faster json and not really care about exact encoding much. So autoderived, would be much more useful to you. Just make the frontend alphbetical as well and the message only has a single consumer, so versioning can be updated at once.
which is less efficient and less convenient?
Was realizing that everything is less efficient than I initially thought because also every struct name will be encoded as a string as well. Just was kinda lamenting that that is sad. Also, it would mean that we would be pushing more users towards opaque types. Opaque types are less convenient.
Slightly tangential thought: In rust, for example, there is no implicit auto-derive. There is only an explicit auto-derive. Are we concerned at all about the security implications of an implicit auto-derive? As in, imagine I have a user record. I just encode it and send it to the frontend. One day, a new engineer adds a feature and as part of it, adds a field to the User type that really should not be public. Because they didn't realize that we encode directly on the User type, we are now encoding and sending that private information to the frontend.
With something like rust, this could happen, but is less likely because the fact a record could be sent to the frontend (and is implicitly going to add new fields to the encoding) lives right above the record #[derive(Serialize)].
I am honestly starting to lean towards only allowing encode on opaque types at all. I still think we can autoderive, but that should be done something like this:
ProtoRole := [User, Admin, Guest] has Encode {encode: _}
This would mean that:
The big downside being you have either use opaque types in many places in your codebase, or you need to make a method to convert from your regular type to your encoding friendly type. That said, it should just be one method. And if it works out type wise, it may just be exposing the @MyType method.
Brendan Hansknecht said:
Now imagine that you need to add a field to the tag or record. In the tag case, you get an compiler error in the encode function. In the record case, you just miss data.
oh yeah, I see what you mean now - yes, that's much worse :sweat_smile:
Brendan Hansknecht said:
Slightly tangential thought: In rust, for example, there is no implicit auto-derive. There is only an explicit auto-derive. Are we concerned at all about the security implications of an implicit auto-derive? As in, imagine I have a user record. I just encode it and send it to the frontend. One day, a new engineer adds a feature and as part of it, adds a field to the User type that really should not be public. Because they didn't realize that we encode directly on the User type, we are now encoding and sending that private information to the frontend.
With something like rust, this could happen, but is less likely because the fact a record could be sent to the frontend (and is implicitly going to add new fields to the encoding) lives right above the record
#[derive(Serialize)].
yeah I thought about this, but the issue is that if you need to make everything opaque in order to serialize it, then people will just start making everything opaque as a matter of course, without thinking about it, and the same mistake will be about as likely to happen
I think the better solution is to always make sensitive data opaque and don't opt into Encode there (because it is already opt-in on opaque types)
a good practice for that is to have like Sensitive a := a wrapper type, which doesn't have Encode and which overrides Display and Inspect to just return "***" so they don't accidentally end up in logs either
:thinking:
I'm not convinced. Opaque types are less convenient to use especially if you ever cross module boundaries.
I don't think this would lead to the proliferation of opaque types in most code bases. I think many more people would just add opaque types at the boundaries with a conversion function.
That is a lot less friction to the project as a whole.
I do agree about using opaque types for sensitive data, but i bet many newer users won't even think about that as an option, so i think it will be pretty uncommon.
For more advanced users that would use opaque types for sensitive data and care more about code quality, i think it is pretty easy to suggest to them that encode and it's related opaque types should be nicely wrapped in their own module and only be converted to/from at the boundaries of the system.
Brendan Hansknecht said:
I don't think this would lead to the proliferation of opaque types in most code bases. I think many more people would just add opaque types at the boundaries with a conversion function.
oh interesting, I think I misunderstood the idea you were proposing - so you're saying we still support deriving Encode and Decode for structural types, we just don't do it automatically?
in other words, if I make a new opaque type and declare that it has Encode, even if it's a big complicated nested structural type in there, the whole thing will get derived
but I can no longer give a structural type to Encode.encode directly; rather, I have to wrap it in an opaque type first
like I have SerializedUser := User and then I call Encode.encode on SerializedUser
do I have that right?
Yeah
Of course, if your opaque type includes other opaque types, those must also have Encode.
hm, but if everybody is just uncritically doing SerializedUser := User as a matter of course, before passing that to Encode.encode, does that give people more pause when it comes to checking whether they're encoding secrets?
or is it just a chore they do before calling Encode.encode
actually, come to think of it - that would probably be an effective solution to the index problem
well, part of the index problem at least (not the backwards-compatibility part)
because SerializedUser := User followed by Encode.encode (@SerializedUser { role: Guest, ... }) will make the Guest have the type [Guest, Admin, Moderator] because of the @SerializedUser
so I guess if we want to support index-based encoding/decoding, that's a possible way we could do it
Yeah, that is part of why I suggested requiring the opaque types.
For auto derive, something like SerializedUser := User has Encode {encode: _}
Any if you need a special derivation, you change change _ to myEncodeFn
But yeah, I guess most of the time when using this, you will just do SerializedType := Type has Encode {encode: _}
So that wouldn't actually help with security at all.
I was initially thinking that the SerializedType would be defined explicitly. But I guess that probably would not be common.
yeah people will definitely take the path of least resistance :big_smile:
Last updated: Jun 16 2026 at 16:19 UTC