Hi! I'm trying to figure my way around a few compiler errors. The below is a reproducible full example:
interface Parser
exposes [
Parser,
min,
max,
]
imports []
Parser a err : List U8 -> Result (State a) err
State a : {
value : a,
numRead : Nat, # Number of bytes read in the last parse.
bytes : List U8,
}
min : Parser a err, Nat -> Parser a [TooShort]err
min = \parser, minRead -> \bytes ->
record <- parser bytes |> Result.try
if record.numRead < minRead then
Err TooShort
else
Ok record
max : Parser a err, Nat -> Parser a [TooLong]err
max = \parser, maxRead -> \bytes ->
result = parser bytes
when result is
Err err ->
result
Ok record ->
if record.numRead > maxRead then
Err TooLong
else
Ok record
Running roc check Parser.roc
produces:
── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─
This 2nd argument to try has an unexpected type:
19│> record <- parser bytes |> Result.try
20│> if record.numRead < minRead then
21│> Err TooShort
22│> else
23│> Ok record
The argument is an anonymous function of type:
{ bytes : List U8, numRead : Nat, value : a } -> [Err [TooShort],
Ok { bytes : List U8, numRead : Nat, value : a }]
But try needs its 2nd argument to be:
{ bytes : List U8, numRead : Nat, value : a } -> Result {
bytes : List U8, numRead : Nat, value : a } err
── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─
Something is off with the body of the min definition:
17│ min : Parser a err, Nat -> Parser a [TooShort]err
18│> min = \parser, minRead -> \bytes ->
19│> record <- parser bytes |> Result.try
20│> if record.numRead < minRead then
21│> Err TooShort
22│> else
23│> Ok record
The body is an anonymous function of type:
List U8 -> Result { bytes : List U8, numRead : Nat, value : a
} err
But the type annotation on min says it should be:
List U8 -> Result { bytes : List U8, numRead : Nat, value : a
} [TooShort]err
Tip: The type annotation uses the type variable err to say that this
definition can produce any type of value. But in the body I see that
it will only produce a tag value of a single specific type. Maybe
change the type annotation to be more specific? Maybe change the code
to be more general?
── TYPE MISMATCH ────────────────────────────────────────────────── Parser.roc ─
The 2nd branch of this when does not match all the previous branches:
29│ when result is
30│ Err err ->
31│ result
32│
33│ Ok record ->
34│> if record.numRead > maxRead then
35│> Err TooLong
36│> else
37│> Ok record
The 2nd branch is a value of type:
[Err [TooLong], Ok (State a)]
But all the previous branches have type:
Result (State a) err
All branches of a when must have the same type!
────────────────────────────────────────────────────────────────────────────────
3 errors and 1 warning found in 26 ms.
min
and max
are intended to parser functions which accept any Parser and return an augmented Parser of the same success type which enforces a minimum or maximum consumed length. The error union returned is intended to be the input Parser's error union with the addition of one tag.
How might I do this while satisfying the compiler?
Thanks!
I think it's saying that you are returning a closed tag union rather than an open one. This area has been under development recently. Try removing the err
type variable from the end of the type signatures of min and max?
I think it's easier to understand if you cut this down to a smaller example, eg
f : err -> [TooShort]err
f = \err -> if Bool.true then err else TooShort
the reason this goes wrong is because you cannot treat an err
like a [TooShort]err
, because those two things are different types, and hence have different shapes. Today Roc doesn't permit implicitly converting from one to the other. Since you don't know anything about what err
actually looks like, the only way to resolve this is to instead use the signature [TooShort]err -> [TooShort]err
, or in your examples, eg min : Parser a [TooShort]err, Nat -> Parser a [TooShort]err
Right so there's no way to specify, or infer, that err must be a tag union rather than, say, a number or string..
Ah, I hadn't considered including TooShort in the type, since that seemed less precise, or it seemed like something the compiler could infer (since a "set of tag" necessarily includes TooShort
)?
Side note: as an experience report, I had also tried, without success:
min : Parser a *, Nat -> Parser a [TooShort]*
min : Parser a *, Nat -> Parser a *
min : Parser a []*, Nat -> Parser a [TooShort]*
_
in place of *
for the above.I think I understand the how of the compiler's refusal to accept this (performance implications), but I'm a bit confused about the why. How is this substantially different than a, b -> c
, where the compiler knows little or nothing about the shapes prior to evaluating the calling context? This seems like just another monomorphizing/specialization case, though I don't know what subtle tradeoffs are involved.
Since tag unions are a prominent concept, couldn't the compiler defer making assessments about runtime characteristics until the calling context is known? If the caller passes a Parser Nat []
, then it's known that the final shape of the returned type is Parser Nat [TooShort]
. If the caller passes a Parser Nat [A X, B Y, C Z]
, then likewise, the resulting shape can be concretely evaluated (at some point, all abstractions must concretize, otherwise the compiler is dealing with an incomplete program).
I'd expect that, in the worst case, where there may be thousands of parameterized and simple tags in the same union, the compiler would need to arrange for the memory representation to hold some marker to identify the tag, as well as sufficient memory to hold the largest of all the tags in that union? iow, in the worst case, there's just less optimization opportunity, but it's still a tractable problem.
if you have f = \err -> if Bool.true then err else TooShort
and you call this with [A]
then f
needs to give you back a value of shape [A, TooShort]
. with roc's runtime model, this involves a conversion in the runtime representation of the value. such conversions aren't done implicitly, which is why this isn't permitted and you get a type error.
as another example if you have f : [A, B] -> [A, B, C]
this can't be implemented as f = \x -> x
today, instead you must do
f = \x -> when x is
A -> A
B -> B
again for the same reason
_ in place of * for the above.
what did you try here? this should work
Hmmmm. I guess it does work with:
min : Parser a _, Nat -> Parser a [TooShort]err
I hadn't tried that, but I had tried it with:
min : Parser a _, Nat -> Parser a [TooShort]*
Which yields a warning.
I guess it's a bit strange to me to have a constrained tag union where the constraint type variable (err
) only appears in the result. For the _
variant which yields no errors or warnings, does Roc just infer that the _
is [TooShort]err
as well?
Is it common to have implied/inferred portions of declarations I intend to expose to other modules?
yes it infers [TooShort]err
. in the former case [TooShort]err
can be simplified to [TooShort]
as well since if it only appears once in a signature, err
behaves like *
its not super common but it can be useful e.g. if you want to type the "surface" of a type but there are large nested types you don't care too much about and are fine letting being inferred
Ayaz Hafiz said:
such conversions aren't done implicitly, which is why this isn't permitted and you get a type error.
Is the only option for performing explicit conversions this "repacking" of tags that you demonstrated? I think I've seen this happen a bit with Result handling (i.e. using Err err -> Err err
instead of _ -> result
).
Why aren't/can't implicit conversions be permitted? I've seen a somewhat analogous case in at least one other language (interface itable subsetting in Go) where the caller context, or whole-of-program analysis, where the subset/intersecting entity (let's pretend it's [B, C]
in this case) might be encoded in such a way as to be compatible with the superset entity or union of entities (let's pretend it's [A, B, C]
, though similar would apply for [B, C] -> [A, B]
).
In Go, behind the scenes (at least with the main toolchain), if you have an interface { Read(); Write(); Close() }
(simplified for brevity), it'll encode an itable like 0:Read|1:Write|2:Close
(essentially an array of signature definitions), and anywhere you use a { Read(); Write() }
or an { Write(); Close() }
, it'll just re-reference different slices into that same itable. Conversely, a { Read(); Close() }
would need a different itable entry, but it could be satisfied with a packing like 0:Close|1:Read|2:Write|3:Close
. All of this would be arranged during linking, similarly to how a linker might deduplicate identical strings (and substrings) into a packed portion of the readonly segment of the binary using an intermediate suffix tree.
If the Roc compiler determines that the optimal encoding for [B, C]
is a single bit, assigning 0
for B
and 1
for C
, but in practice it must be padded up to at least a byte if not a cpu word, then it's really got 256 or 2^32 or 2^64 possible tags, depending. In such cases, the compiler could determine that both input and output in [B, C] -> [A, B, C]
can share the same tag identifier assignments and memory layout without any loss in efficiency, and consistently assign the value of 1
to B
. There would only be a potential efficiency loss, for example, when transitioning from a tag union consisting only of unparameterized ("simple") tags to one that has at least one parameterized tag, or growing from at most 256 simple tags to 257+ (or growing past other thresholds), or including a parameterized tag which requires more memory than the prior union of tags.
I'm not sure any of those cases are worth forcing the programmer to micromanage A -> A
instead of being able to handle the new unique cases while using x -> x
as a fallback case.
Yes, the only option for explicit conversions is the repacking. Implicit conversions can be done but they aren't because they are implicit, and can potentially be very expensive (if you need very deeply nested repacking).
There is no "global analysis" for tags done because at the limit that would require all tags to be represented uniformly (e.g. as an int) which could waste a lot of space for the common case of tags, which survive only between 1-2 function calls (and don't escape outside of that). Also, a global analysis means that tags with different names can collide, at which point you have to either say that you can't have two tags that have the same representation, or change the representation for all tags in the global program. OCaml has both those issues.
Implicit conversions can be done and the design space is wide. I have (much longer) discussions of this in this blog post and this previous message. The design choice Roc has made today is for the sake of optimizing individual representations of tag unions, and for surfacing conversions explicitly to the developer.
Awesome, I'll read through those links, thanks!
Last updated: Jul 06 2025 at 12:14 UTC