I see Str doesnt have len or reverse or get functions
is it planned?
I can't say for certain, but I'd guess that these are platform independent and would be provided by the platform. E.g. the length of a string sent to your terminal has varying lengths depending whether you're talking about the byte length, character length, or display length (width)
You could do "foo" |> Str.graphemes |> List.len
. But I do agree that just having something like Str.len
and Str.byte_len
would be nice.
Oh wait, there is Str.countGraphemes
:thumbs_up:
I think these functions are in weird unicode related limbo. I think they likely won't be added and instead have alternatives that care either about bytes or about graphemes.
That said, I don't really understand unicode and the complexities here. I just know that a lot of those functions have sharp edges so we are likely to give alternatives and no default so that people are more likely to read the docs and pick the function that does the right thing.
So no Str.len
instead there are Str.countGraphemes
and Str.countUtf8Bytes
similar with the other functions: should reverse be on bytes or graphemes. One answer can give wrong results, the other is exceptionally slow.
For get, I think the plan is to have people convert to lists to bytes and get the byte from the list. With seamless slices (which will be added one day) this should be doable without copying the string. Though that wouldn't work with graphemes, so I'm not sure how that affects the plan.
I thought there was a plan to have a Unicode package, and not a builtin to provide some of this functionality?
Found Richards comment on zulip
I might be misunderstanding as that discussion was around scalars.
Brendan Hansknecht said:
similar with the other functions: should reverse be on bytes or graphemes. One answer can give wrong results, the other is exceptionally slow.
when can reverse give wrong answers
I think there a invisible characters with unicode that make it more complicated than reversing bytes.
Actually I think it might be due to surrogates. I'm not 100% here.
I believe that technically speaking, you're never supposed to see "unpaired surrogate pairs" at the level of unicode codepoints. That's a detail specifically designed for UTF-16 encoding. Decoding the UTF-16 is supposed to eliminate them - but in practice bugs are a thing.
Luke Boswell said:
I thought there was a plan to have a Unicode package, and not a builtin to provide some of this functionality?
Yeah, a major goal is to enable a lot of this to be done in userland. So a Unicode package in userland is very likely.
dank said:
when can reverse give wrong answers
https://mortoray.com/the-string-type-is-broken/
I definitely think we should not have a reverse
function
Whatever will kids do if they can't make their computer do funny things by reversing a string and piping it to the say
command?
yeah the thing is it's super error prone and I can't think of any use cases I've ever seen for it other than tutorials
Last updated: Jul 05 2025 at 12:14 UTC