Managing tag unions · beginners · Zulip Chat Archive

Hi! I'm trying to figure my way around a few compiler errors. The below is a reproducible full example:

interface Parser
    exposes [
        Parser,
        min,
        max,
    ]
    imports []

Parser a err : List U8 -> Result (State a) err

State a : {
    value : a,
    numRead : Nat, # Number of bytes read in the last parse.
    bytes : List U8,
}

min : Parser a err, Nat -> Parser a [TooShort]err
min = \parser, minRead -> \bytes ->
        record <- parser bytes |> Result.try
        if record.numRead < minRead then
            Err TooShort
        else
            Ok record

max : Parser a err, Nat -> Parser a [TooLong]err
max = \parser, maxRead -> \bytes ->
        result = parser bytes

        when result is
            Err err ->
                result

            Ok record ->
                if record.numRead > maxRead then
                    Err TooLong
                else
                    Ok record

── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─

This 2nd argument to try has an unexpected type:

19│>          record <- parser bytes |> Result.try
20│>          if record.numRead < minRead then
21│>              Err TooShort
22│>          else
23│>              Ok record

The argument is an anonymous function of type:

    { bytes : List U8, numRead : Nat, value : a } -> [Err [TooShort],
    Ok { bytes : List U8, numRead : Nat, value : a }]

But try needs its 2nd argument to be:

    { bytes : List U8, numRead : Nat, value : a } -> Result {
    bytes : List U8, numRead : Nat, value : a } err


── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─

Something is off with the body of the min definition:

17│   min : Parser a err, Nat -> Parser a [TooShort]err
18│>  min = \parser, minRead -> \bytes ->
19│>          record <- parser bytes |> Result.try
20│>          if record.numRead < minRead then
21│>              Err TooShort
22│>          else
23│>              Ok record

The body is an anonymous function of type:

    List U8 -> Result { bytes : List U8, numRead : Nat, value : a
    } err

But the type annotation on min says it should be:

    List U8 -> Result { bytes : List U8, numRead : Nat, value : a
    } [TooShort]err

Tip: The type annotation uses the type variable err to say that this
definition can produce any type of value. But in the body I see that
it will only produce a tag value of a single specific type. Maybe
change the type annotation to be more specific? Maybe change the code
to be more general?


── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─

The 2nd branch of this when does not match all the previous branches:

29│           when result is
30│               Err err ->
31│                   result
32│
33│               Ok record ->
34│>                  if record.numRead > maxRead then
35│>                      Err TooLong
36│>                  else
37│>                      Ok record

The 2nd branch is a value of type:

    [Err [TooLong], Ok (State a)]

But all the previous branches have type:

    Result (State a) err

All branches of a when must have the same type!

────────────────────────────────────────────────────────────────────────────────

3 errors and 1 warning found in 26 ms.

min and max are intended to parser functions which accept any Parser and return an augmented Parser of the same success type which enforces a minimum or maximum consumed length. The error union returned is intended to be the input Parser's error union with the addition of one tag.

Brian Carroll (Dec 23 2022 at 15:33):

I think it's saying that you are returning a closed tag union rather than an open one. This area has been under development recently. Try removing the err type variable from the end of the type signatures of min and max?

Ayaz Hafiz (Dec 23 2022 at 18:24):

f : err -> [TooShort]err
f = \err -> if Bool.true then err else TooShort

the reason this goes wrong is because you cannot treat an err like a [TooShort]err, because those two things are different types, and hence have different shapes. Today Roc doesn't permit implicitly converting from one to the other. Since you don't know anything about what err actually looks like, the only way to resolve this is to instead use the signature [TooShort]err -> [TooShort]err, or in your examples, eg min : Parser a [TooShort]err, Nat -> Parser a [TooShort]err

Brian Carroll (Dec 23 2022 at 19:24):

Right so there's no way to specify, or infer, that err must be a tag union rather than, say, a number or string..

Kevin Gillette (Dec 23 2022 at 19:34):

Ah, I hadn't considered including TooShort in the type, since that seemed less precise, or it seemed like something the compiler could infer (since a "set of tag" necessarily includes TooShort)?

Kevin Gillette (Dec 23 2022 at 19:42):

I think I understand the how of the compiler's refusal to accept this (performance implications), but I'm a bit confused about the why. How is this substantially different than a, b -> c, where the compiler knows little or nothing about the shapes prior to evaluating the calling context? This seems like just another monomorphizing/specialization case, though I don't know what subtle tradeoffs are involved.

Since tag unions are a prominent concept, couldn't the compiler defer making assessments about runtime characteristics until the calling context is known? If the caller passes a Parser Nat [], then it's known that the final shape of the returned type is Parser Nat [TooShort]. If the caller passes a Parser Nat [A X, B Y, C Z], then likewise, the resulting shape can be concretely evaluated (at some point, all abstractions must concretize, otherwise the compiler is dealing with an incomplete program).

I'd expect that, in the worst case, where there may be thousands of parameterized and simple tags in the same union, the compiler would need to arrange for the memory representation to hold some marker to identify the tag, as well as sufficient memory to hold the largest of all the tags in that union? iow, in the worst case, there's just less optimization opportunity, but it's still a tractable problem.

Ayaz Hafiz (Dec 23 2022 at 19:46):

if you have f = \err -> if Bool.true then err else TooShort and you call this with [A] then f needs to give you back a value of shape [A, TooShort]. with roc's runtime model, this involves a conversion in the runtime representation of the value. such conversions aren't done implicitly, which is why this isn't permitted and you get a type error.

as another example if you have f : [A, B] -> [A, B, C] this can't be implemented as f = \x -> x today, instead you must do

f = \x -> when x is
  A -> A
  B -> B

Ayaz Hafiz (Dec 23 2022 at 19:51):

Kevin Gillette (Dec 23 2022 at 20:17):

min : Parser a _, Nat -> Parser a [TooShort]err

min : Parser a _, Nat -> Parser a [TooShort]*

I guess it's a bit strange to me to have a constrained tag union where the constraint type variable (err) only appears in the result. For the _ variant which yields no errors or warnings, does Roc just infer that the _ is [TooShort]err as well?

Is it common to have implied/inferred portions of declarations I intend to expose to other modules?

Ayaz Hafiz (Dec 23 2022 at 20:22):

yes it infers [TooShort]err. in the former case [TooShort]err can be simplified to [TooShort] as well since if it only appears once in a signature, err behaves like *

Ayaz Hafiz (Dec 23 2022 at 20:23):

its not super common but it can be useful e.g. if you want to type the "surface" of a type but there are large nested types you don't care too much about and are fine letting being inferred

Kevin Gillette (Dec 23 2022 at 20:41):

Is the only option for performing explicit conversions this "repacking" of tags that you demonstrated? I think I've seen this happen a bit with Result handling (i.e. using Err err -> Err err instead of _ -> result).

Why aren't/can't implicit conversions be permitted? I've seen a somewhat analogous case in at least one other language (interface itable subsetting in Go) where the caller context, or whole-of-program analysis, where the subset/intersecting entity (let's pretend it's [B, C] in this case) might be encoded in such a way as to be compatible with the superset entity or union of entities (let's pretend it's [A, B, C], though similar would apply for [B, C] -> [A, B]).

In Go, behind the scenes (at least with the main toolchain), if you have an interface { Read(); Write(); Close() } (simplified for brevity), it'll encode an itable like 0:Read|1:Write|2:Close (essentially an array of signature definitions), and anywhere you use a { Read(); Write() } or an { Write(); Close() }, it'll just re-reference different slices into that same itable. Conversely, a { Read(); Close() } would need a different itable entry, but it could be satisfied with a packing like 0:Close|1:Read|2:Write|3:Close. All of this would be arranged during linking, similarly to how a linker might deduplicate identical strings (and substrings) into a packed portion of the readonly segment of the binary using an intermediate suffix tree.

If the Roc compiler determines that the optimal encoding for [B, C] is a single bit, assigning 0 for B and 1 for C, but in practice it must be padded up to at least a byte if not a cpu word, then it's really got 256 or 2^32 or 2^64 possible tags, depending. In such cases, the compiler could determine that both input and output in [B, C] -> [A, B, C] can share the same tag identifier assignments and memory layout without any loss in efficiency, and consistently assign the value of 1 to B. There would only be a potential efficiency loss, for example, when transitioning from a tag union consisting only of unparameterized ("simple") tags to one that has at least one parameterized tag, or growing from at most 256 simple tags to 257+ (or growing past other thresholds), or including a parameterized tag which requires more memory than the prior union of tags.

I'm not sure any of those cases are worth forcing the programmer to micromanage A -> A instead of being able to handle the new unique cases while using x -> xas a fallback case.

Ayaz Hafiz (Dec 23 2022 at 21:24):

Yes, the only option for explicit conversions is the repacking. Implicit conversions can be done but they aren't because they are implicit, and can potentially be very expensive (if you need very deeply nested repacking).

There is no "global analysis" for tags done because at the limit that would require all tags to be represented uniformly (e.g. as an int) which could waste a lot of space for the common case of tags, which survive only between 1-2 function calls (and don't escape outside of that). Also, a global analysis means that tags with different names can collide, at which point you have to either say that you can't have two tags that have the same representation, or change the representation for all tags in the global program. OCaml has both those issues.

Implicit conversions can be done and the design space is wide. I have (much longer) discussions of this in this blog post and this previous message. The design choice Roc has made today is for the sake of optimizing individual representations of tag unions, and for surfacing conversions explicitly to the developer.

Stream: beginners

Topic: Managing tag unions

Kevin Gillette (Dec 23 2022 at 10:51):