missing string methods · beginners

Stream: beginners

Topic: missing string methods

dank (Feb 09 2023 at 21:41):

I see Str doesnt have len or reverse or get functions
is it planned?

Wolfgang Schuster (Feb 09 2023 at 21:49):

I can't say for certain, but I'd guess that these are platform independent and would be provided by the platform. E.g. the length of a string sent to your terminal has varying lengths depending whether you're talking about the byte length, character length, or display length (width)

Nick Hallstrom (Feb 09 2023 at 22:43):

You could do "foo" |> Str.graphemes |> List.len. But I do agree that just having something like Str.len and Str.byte_len would be nice.

Nick Hallstrom (Feb 09 2023 at 22:44):

Oh wait, there is Str.countGraphemes :thumbs_up:

Brendan Hansknecht (Feb 09 2023 at 22:49):

I think these functions are in weird unicode related limbo. I think they likely won't be added and instead have alternatives that care either about bytes or about graphemes.

Brendan Hansknecht (Feb 09 2023 at 22:50):

That said, I don't really understand unicode and the complexities here. I just know that a lot of those functions have sharp edges so we are likely to give alternatives and no default so that people are more likely to read the docs and pick the function that does the right thing.

Brendan Hansknecht (Feb 09 2023 at 22:51):

So no Str.len instead there are Str.countGraphemes and Str.countUtf8Bytes

Brendan Hansknecht (Feb 09 2023 at 22:55):

similar with the other functions: should reverse be on bytes or graphemes. One answer can give wrong results, the other is exceptionally slow.

Brendan Hansknecht (Feb 09 2023 at 22:57):

For get, I think the plan is to have people convert to lists to bytes and get the byte from the list. With seamless slices (which will be added one day) this should be doable without copying the string. Though that wouldn't work with graphemes, so I'm not sure how that affects the plan.

Luke Boswell (Feb 09 2023 at 23:05):

I thought there was a plan to have a Unicode package, and not a builtin to provide some of this functionality?

Luke Boswell (Feb 09 2023 at 23:06):

Found Richards comment on zulip

Luke Boswell (Feb 09 2023 at 23:07):

I might be misunderstanding as that discussion was around scalars.

dank (Feb 09 2023 at 23:07):

Brendan Hansknecht said:

similar with the other functions: should reverse be on bytes or graphemes. One answer can give wrong results, the other is exceptionally slow.

when can reverse give wrong answers

Luke Boswell (Feb 09 2023 at 23:10):

I think there a invisible characters with unicode that make it more complicated than reversing bytes.

Luke Boswell (Feb 09 2023 at 23:19):

Actually I think it might be due to surrogates. I'm not 100% here.

Joshua Warner (Feb 09 2023 at 23:38):

I believe that technically speaking, you're never supposed to see "unpaired surrogate pairs" at the level of unicode codepoints. That's a detail specifically designed for UTF-16 encoding. Decoding the UTF-16 is supposed to eliminate them - but in practice bugs are a thing.

Brendan Hansknecht (Feb 09 2023 at 23:40):

Luke Boswell said:

I thought there was a plan to have a Unicode package, and not a builtin to provide some of this functionality?

Yeah, a major goal is to enable a lot of this to be done in userland. So a Unicode package in userland is very likely.

Brendan Hansknecht (Feb 09 2023 at 23:43):

dank said:

when can reverse give wrong answers

https://mortoray.com/the-string-type-is-broken/

Richard Feldman (Feb 10 2023 at 01:06):

I definitely think we should not have a reverse function

Joshua Warner (Feb 10 2023 at 01:08):

Whatever will kids do if they can't make their computer do funny things by reversing a string and piping it to the say command?

Richard Feldman (Feb 10 2023 at 01:08):

yeah the thing is it's super error prone and I can't think of any use cases I've ever seen for it other than tutorials

Last updated: Aug 17 2025 at 12:14 UTC