Stream: API design

Topic: Additional functions for Str / Unicode or the like


view this post on Zulip Tobias Steckenborn (Nov 26 2025 at 07:15):

Hey there,

Is there somewhere a list of "functions" thought about for the different "modules"? Looking through e.g. Str I found a few things I'd eventually miss, but didn't know if they're just not added yet, or if that's something not yet fully discussed. Examples:

For the package concerned with capitalization and the like (which I think is referred to in the docs to being somewhere else) eventually - though likely not a to high priority:

P.s.: I assume that len is for length, which then https://www.roc-lang.org/builtins/alpha4/Str/#len right now in the current docs likely still has some placeholder (len : Str -> [LearnAboutStringsInRoc Str] A stub function to help people discover how they should handle this in Roc.)

view this post on Zulip Luke Boswell (Nov 26 2025 at 07:21):

We're porting accross the builtins from the old compiler

https://www.roc-lang.org/builtins/alpha4/

view this post on Zulip Luke Boswell (Nov 26 2025 at 07:22):

If were making any changes or additions I would suggest raising a new thread in theideas channel and we discuss each change individually.

view this post on Zulip Luke Boswell (Nov 26 2025 at 07:23):

That other package is https://github.com/roc-lang/unicode and we haven't ported that to the new compiler yet

view this post on Zulip Anton (Nov 26 2025 at 10:33):

still has some placeholder

For clarification, this is not a temporary placeholder, we do not plan to implement a working Str.len function in the compiler.

view this post on Zulip Hannes (Nov 26 2025 at 14:50):

And the lack of a Str.len function means that it's impossible to pad a string to a certain length without specifying what you mean by that string's length.

view this post on Zulip Tobias Steckenborn (Nov 26 2025 at 15:03):

Okay, so such functionalities would likely also end up in additional packages?

view this post on Zulip Richard Feldman (Nov 26 2025 at 15:18):

yeah, padding strings was a trivial operation in the 1970s when Unicode didn't exist yet, but today "pad" is innately ambiguous - https://lobste.rs/s/bokqwe/breaking_provably_correct_leftpad

view this post on Zulip Richard Feldman (Nov 26 2025 at 15:22):

e.g. when you say you want to pad it to a "length of 5" do you mean 5 Unicode extended grapheme clusters? 5 UTF-8 bytes? 5 Unicode scalar values?

these can all output completely different strings, and different programming languages use different definitions of what "length" refers to; there's no consistency or standard here, so we are taking the stance of "do not offer the footgun"

view this post on Zulip Richard Feldman (Nov 26 2025 at 15:25):

also, UTF-8 bytes is the fastest one in terms of runtime performance, but is unlikely what you want to pad to if you're padding for display reasons; extended grapheme clusters (worst perf) are probably what you want to pad to for display purposes but not if you're trying to get an exact fit on a database column length, which will of course be in bytes, etc.

Unicode scalar values (or code points in general) are slower in terms of perf than UTF-8 and incorrect in terms of display, so basically the worst of both worlds, but a lot of languages (including Rust!) make them a first-class thing because they're easier to understand than extended grapheme clusters, which is one reason emojis cause so many bugs in practice :stuck_out_tongue:

view this post on Zulip Richard Feldman (Nov 26 2025 at 15:28):

anyway, it's not usable yet but https://github.com/roc-lang/unicode is the long-term plan for unicode operations

view this post on Zulip Anton (Nov 26 2025 at 18:37):

We already have some useful things in the unicode package: https://github.com/roc-lang/unicode/tree/main/examples (for use with the old compiler)


Last updated: Nov 28 2025 at 12:16 UTC