Hi, I've been reading the Str
docs but can't seem to find out how to get the first utf8 "slice" of a string. If I have "Hello", I want the "H" in a new string. Is this possible?
Depends on what you want really. Roc doesn't have full unicode support in the standard library. If you know you're working with ascii, you can convert to bytes with Str.toUtf8
.
There is a very much work in progress package 'roc-unicode', which will be able to get the first codepoint or grapheme cluster, but as far as I know, it doesn't have a release yet
https://github.com/roc-lang/unicode
This is the wip unicode package
Ah, okay. Guess I have to hope no one puts a non-ASCII character in there then :sweat_smile:. Thanks!
Oh, apparently there's been a release yesterday, so you can just import the unicode package like usual
Still missing a lot though
Yeah, Unicode has a lot of sharp edges that are easy to miss and also gets updated over time. So don't really want to tie Unicode complexity and release in with standard library/roc versioning.
Also, it leads to people asking questions and getting clarification. For example, it's really hard to define what a "character" is in Unicode. Many multiple codepoints things combine into a single glyph.
Even if we split codepoints or graphemes, it isn't really guaranteed to be correct.
Totally understand. I also read that section in the docs. Was just a bit confused on how to do it, but I have something to roll with now
The text segmentation algorithm in roc-lang/unicode passes the full unicode test suite. But I'm reasonably confident some strange edge cases exist, we've left a crash
in there to print out an error message and ask you to report the issue instead of silently failing.
Quick plug for roc-ascii if you're only using ASCII characters :)
Hannes said:
Quick plug for roc-ascii if you're only using ASCII characters :)
Just took a look at it for the first time! Looks real cool. I think there is a typo at the end of the Char.roc file though. del
is @Char 12
, but should be @Char 128
. I can make a pr if you want, but I think it is easier for both of us if you just do it :sweat_smile:
Or even @Char 127
:grinning:
Created a fix via github web interface: https://github.com/Hasnep/roc-ascii/pull/12
Good thing I didn't make the PR
Kiryl Dziamura said:
Created a fix via github web interface: https://github.com/Hasnep/roc-ascii/pull/12
Never used it, but for changes like these it does seem really handy. Maybe I should use it more!
Jamie Neubert Pedersen has marked this topic as resolved.
Last updated: Jul 06 2025 at 12:14 UTC