Stream: beginners

Topic: ✔ Get slice of string


view this post on Zulip Jamie Neubert Pedersen (Jul 11 2024 at 18:42):

Hi, I've been reading the Str docs but can't seem to find out how to get the first utf8 "slice" of a string. If I have "Hello", I want the "H" in a new string. Is this possible?

view this post on Zulip Kilian Vounckx (Jul 11 2024 at 18:46):

Depends on what you want really. Roc doesn't have full unicode support in the standard library. If you know you're working with ascii, you can convert to bytes with Str.toUtf8.

There is a very much work in progress package 'roc-unicode', which will be able to get the first codepoint or grapheme cluster, but as far as I know, it doesn't have a release yet

view this post on Zulip Kilian Vounckx (Jul 11 2024 at 18:46):

https://github.com/roc-lang/unicode

view this post on Zulip Kilian Vounckx (Jul 11 2024 at 18:47):

This is the wip unicode package

view this post on Zulip Jamie Neubert Pedersen (Jul 11 2024 at 18:48):

Ah, okay. Guess I have to hope no one puts a non-ASCII character in there then :sweat_smile:. Thanks!

view this post on Zulip Kilian Vounckx (Jul 11 2024 at 18:48):

Oh, apparently there's been a release yesterday, so you can just import the unicode package like usual
Still missing a lot though

view this post on Zulip Brendan Hansknecht (Jul 11 2024 at 19:25):

Yeah, Unicode has a lot of sharp edges that are easy to miss and also gets updated over time. So don't really want to tie Unicode complexity and release in with standard library/roc versioning.

view this post on Zulip Brendan Hansknecht (Jul 11 2024 at 19:26):

Also, it leads to people asking questions and getting clarification. For example, it's really hard to define what a "character" is in Unicode. Many multiple codepoints things combine into a single glyph.

view this post on Zulip Brendan Hansknecht (Jul 11 2024 at 19:27):

Even if we split codepoints or graphemes, it isn't really guaranteed to be correct.

view this post on Zulip Jamie Neubert Pedersen (Jul 11 2024 at 19:32):

Totally understand. I also read that section in the docs. Was just a bit confused on how to do it, but I have something to roll with now

view this post on Zulip Luke Boswell (Jul 11 2024 at 21:00):

The text segmentation algorithm in roc-lang/unicode passes the full unicode test suite. But I'm reasonably confident some strange edge cases exist, we've left a crash in there to print out an error message and ask you to report the issue instead of silently failing.

view this post on Zulip Hannes (Jul 12 2024 at 07:55):

Quick plug for roc-ascii if you're only using ASCII characters :)

view this post on Zulip Kilian Vounckx (Jul 12 2024 at 11:40):

Hannes said:

Quick plug for roc-ascii if you're only using ASCII characters :)

Just took a look at it for the first time! Looks real cool. I think there is a typo at the end of the Char.roc file though. del is @Char 12, but should be @Char 128. I can make a pr if you want, but I think it is easier for both of us if you just do it :sweat_smile:

view this post on Zulip Kiryl Dziamura (Jul 12 2024 at 11:51):

Or even @Char 127 :grinning:

view this post on Zulip Kiryl Dziamura (Jul 12 2024 at 11:58):

Created a fix via github web interface: https://github.com/Hasnep/roc-ascii/pull/12

view this post on Zulip Kilian Vounckx (Jul 12 2024 at 11:58):

Good thing I didn't make the PR

view this post on Zulip Kilian Vounckx (Jul 12 2024 at 12:00):

Kiryl Dziamura said:

Created a fix via github web interface: https://github.com/Hasnep/roc-ascii/pull/12

Never used it, but for changes like these it does seem really handy. Maybe I should use it more!

view this post on Zulip Notification Bot (Jul 13 2024 at 06:41):

Jamie Neubert Pedersen has marked this topic as resolved.


Last updated: Jul 06 2025 at 12:14 UTC