I was playing around with single character type.
What would be the best way to go from 12363 Int* back to String or Char? Should 'か' default to List U8 instead of Int *?
x = 'か'
12363 : Int *
Str.fromUtf8 [227, 129, 139]
Ok "か" : Result Str
Ah, cause we don't have Str.appendScalar
anymore.
@Richard Feldman I don't quite understand why we removed Str.appendScalar
. It feels like an important primitive for using characters with Strings. I don't think it falls into the same complexity as other Unicode function and it can return a result to avoid errors with invalid Unicode scalar.
That or at least need some way to go from a character literal to a string literal
I guess currently the best option would be to store か
as a string and use Str.concat
@Brian Teague what are you actually trying to do?
Nothing actually productive. I'm just trying to learn the specifics of ROC's implementation. I mean the easiest thing to do is Str.toUtf8 "か"
, so maybe a better question is why not treat everything as a Str instead of Int unless there is a specific use case for single characters integers?
I guess the larger ones are problematic. The U8 ones are great for pattern matching.
Before it was useful cause you could convert a string into a list of scalars (I32) and then match on any of these values.
We just need to add it in roc-lang/unicode. The module is there just needs some love.
part of the motivation for removing it is to make it more obvious that in practical scenarios, you should either be working in terms of Str
or in terms of List U8
99.99% of the time, and doing anything at all with code point integers should be microscopically rare in practice
(other than the ones that overlap with ASCII, which comes up in parsing textual data formats like JSON and source code, in which case List U8
is definitely the right thing to reach for!)
someone pointed out in a comment somewhere (reddit I think?) that they weren't sure what Str
was encouraging them to do in terms of these different primitives, and I think that criticism was valid
so I think there's value in not having any Str
functions at all that work in terms of code points, and instead having all of that logic live in roc-lang/unicode
I guess it just really weird having the 'か'
literal then.
It can't be used with Str
or List U8
We just need to add it in roc-lang/unicode
I don't think that is the issue. Unicode is a power module for special use cases. Most users should never need to touch it. Adding a character to the end of a Str is not a special use case. We need to make sure there is a clear story of how that works.
Note, the clear story may be to remove literals like 'か'
and require "か"
instead. Then Str.concat
just works.
I would use Str.concat mystr "か"
. It is helpful to have the codepoint literals though, so I would rather not lose that
I'm not sold. It is really strange to have a literal type that doesn't work with any of the standard library.
I agree that it's strange
so then would single quotes only accept things that fit in U8
?
(I think it's reasonable to try that and see if there's demand in practice for expanding it; I suspect there would be little or none)
Yeah, I think that would make more sense 'c'
for list U8 and "c"
for string use.
oh I meant 'c'
for U8
(maybe that's what you meant too though!)
Yeah, sorry, I meant 'c'
for use with List U8. The value would just be a U8.
Richard Feldman said:
so then would single quotes only accept things that fit in
U8
?
If I understand you correctly, only convert chars to U8 if they fit, otherwise return a compile error?
An alternative is converting to List U8 if it doesn't fit in a U8, but I could see that leading to unexpected outcomes because of the different types single quote chars could return.
Yeah, exactly that
With the exact same concern for different values
Last updated: Jul 06 2025 at 12:14 UTC