What's it called when you can use single quotes '
to directly refer to a byte like 'a'
gives 97 : Int *
etc. I couldn't find it in the tutorial from a quick scan. I'm not sure where I learnt that from.
The reason I bring this up is because it caught me out for an hour or so today when I used a utf8 two byte character in this way and Roc didnt complain, so I didn't realise that something fishy was going on.
I'm using the following character. I have no idea why the unicode data files chose to use this (and some other strange ones) for segmenting information in their data files
Screenshot-2023-12-16-at-20.39.11.png
» Str.toUtf8 "×"
[195, 151] : List U8
» '×'
215 : Int *
Should that second operation using single quotes not be permitted in Roc?
so single quotes compile to the Unicode Code Point, not (necessarily, except by coincidence) utf8
when the code point is less than 128, it'll be one utf8 byte, but any code point of 128 or higher needs 2+ utf8 bytes to represent
so this looks likely correct to me (although I can definitely see why it would be surprising!)
Thats actually much more helpful for what I need. For some reason I thought it was only ASCII permitted
Is 215 the correct value though? It's too small to be 2 byte
utf8 encodes anything over 127 in 2+ bytes
Oh, its not the literally 2byte value cast to an int?
correct, it's the Unicode Code Point - which is a single integer
correct, it's the Unicode Code Point - which is a single integer
as Luke noted, that tends to be the useful thing in practice :big_smile:
casting the 2 byte value to an int is also a single integer
Would be 50071
in this case
Or 38851
if the bytes need to be reversed due to endian handling
Oh yeah, codepoint it what number it would be in utf32
. Which is different than having the utf8
bytes cast to a single integer.
Cause currently, utf8
is also limited to 4 bytes, so you could just cast it to a U32
to make it a single integer. Though if utf expands enough, theoretically, utf8
could have values larger than 4 bytes.
Last updated: Jul 06 2025 at 12:14 UTC