Stream: beginners

Topic: Single Quote Things


view this post on Zulip Luke Boswell (Dec 16 2023 at 09:40):

What's it called when you can use single quotes ' to directly refer to a byte like 'a' gives 97 : Int * etc. I couldn't find it in the tutorial from a quick scan. I'm not sure where I learnt that from.

The reason I bring this up is because it caught me out for an hour or so today when I used a utf8 two byte character in this way and Roc didnt complain, so I didn't realise that something fishy was going on.

I'm using the following character. I have no idea why the unicode data files chose to use this (and some other strange ones) for segmenting information in their data files
Screenshot-2023-12-16-at-20.39.11.png

» Str.toUtf8 "×"

[195, 151] : List U8
» '×'

215 : Int *

Should that second operation using single quotes not be permitted in Roc?

view this post on Zulip Richard Feldman (Dec 16 2023 at 11:24):

so single quotes compile to the Unicode Code Point, not (necessarily, except by coincidence) utf8

view this post on Zulip Richard Feldman (Dec 16 2023 at 11:25):

when the code point is less than 128, it'll be one utf8 byte, but any code point of 128 or higher needs 2+ utf8 bytes to represent

view this post on Zulip Richard Feldman (Dec 16 2023 at 11:26):

so this looks likely correct to me (although I can definitely see why it would be surprising!)

view this post on Zulip Luke Boswell (Dec 16 2023 at 11:27):

Thats actually much more helpful for what I need. For some reason I thought it was only ASCII permitted

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:12):

Is 215 the correct value though? It's too small to be 2 byte

view this post on Zulip Richard Feldman (Dec 16 2023 at 15:29):

utf8 encodes anything over 127 in 2+ bytes

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:30):

Oh, its not the literally 2byte value cast to an int?

view this post on Zulip Richard Feldman (Dec 16 2023 at 15:40):

correct, it's the Unicode Code Point - which is a single integer

view this post on Zulip Richard Feldman (Dec 16 2023 at 15:40):

correct, it's the Unicode Code Point - which is a single integer

view this post on Zulip Richard Feldman (Dec 16 2023 at 15:40):

as Luke noted, that tends to be the useful thing in practice :big_smile:

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:41):

casting the 2 byte value to an int is also a single integer

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:42):

Would be 50071 in this case

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:43):

Or 38851 if the bytes need to be reversed due to endian handling

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:45):

Oh yeah, codepoint it what number it would be in utf32. Which is different than having the utf8 bytes cast to a single integer.

view this post on Zulip Brendan Hansknecht (Dec 16 2023 at 15:47):

Cause currently, utf8 is also limited to 4 bytes, so you could just cast it to a U32 to make it a single integer. Though if utf expands enough, theoretically, utf8 could have values larger than 4 bytes.


Last updated: Jul 06 2025 at 12:14 UTC