I just implemented functions to convert between lower_case_names and UpperCaseNames, and ran into a few obstacles that I thought might be worth talking about.
My first attempt for the upperName conversion (the forward direction, in the order I've written them), I tried to split on '_' and then convert each element to title case, then join again. Turns out there's no converting to title case. Nor is there a way to split off the first scalar, convert it to uppercase, and concat it back.
For the lowerName direction, I was originally going to find anything matching /[A-Z]/, insert underscores at each location, then lowercase the whole thing. However, there doesn't seem to be any functions to locate a pattern in a string (regex or not).
Eventually I found Str.walkScalars, which worked, and landed on this:
underscoreScalar = 95 # Yes, I now know I could have just used '_' literals
aLowerScalar = 97
zLowerScalar = 122
aUpperScalar = 65
zUpperScalar = 90
# map from a lower_case_name to a UpperCaseName
upperName : Str -> Str
upperName = \name ->
result = Str.walkScalars name {text: "", needUpper: Bool.true} \{text, needUpper}, c ->
if c == underscoreScalar then
{text, needUpper: Bool.true}
else
newText =
if needUpper then
Str.appendScalar text (c - aLowerScalar + aUpperScalar) |> orCrash
else
Str.appendScalar text c |> orCrash
{text: newText, needUpper: Bool.false}
result.text
expect (upperName "hello_world") == "HelloWorld"
orCrash : Result a e -> a
orCrash = \result ->
when result is
Ok a -> a
Err e -> crash "orCrash"
lowerName : Str -> Str
lowerName = \name ->
result = Str.walkScalars name {text: "", needUnder: Bool.false} \{text, needUnder}, c ->
newText =
if c >= aUpperScalar && c <= zUpperScalar then
if needUnder then
text
|> Str.appendScalar underscoreScalar
|> orCrash
|> Str.appendScalar (c - aUpperScalar + aLowerScalar)
|> orCrash
else
text
|> Str.appendScalar (c - aUpperScalar + aLowerScalar)
|> orCrash
else
Str.appendScalar text c |> orCrash
{text: newText, needUnder: Bool.true}
result.text
expect
theResult = (lowerName "HelloWorld")
theResult == "hello_world"
There are of course a few things that are non-ideal about that:
Scalar.toAsciiUpper and Scalar.toAsciiLower functions that do this for meResult.orCrash function, since I 100% know these errors will never happen, but it doesn't appear to exist.Scalar type that's guaranteed to be a valid unicode code point, then many uses of appendScalar and walkScalar could operate without needing Result.orCrash in the first place. The Scalar module would have functions for converting to/from U32, and _that's_ where we'd return the Err result.Here's what I would have liked to have written:
upperName : Str -> Str
upperName = \name ->
result = Str.walkScalars name {text: "", needUpper: Bool.true} \{text, needUpper}, c ->
if c == '_' then
{text, needUpper: Bool.true}
else
newText =
if needUpper then
Str.appendScalar text (Scalar.toAsciiUpper c)
else
Str.appendScalar text c
{text: newText, needUpper: Bool.false}
result.text
lowerName : Str -> Str
lowerName = \name ->
result = Str.walkScalars name {text: "", needUnder: Bool.false} \{text, needUnder}, c ->
newText =
if needUnder then
text
|> Str.appendScalar underscoreScalar
|> Str.appendScalar (Scalar.toAsciiLower c)
else
text |> Str.appendScalar (Scalar.toAsciiLower c)
{text: newText, needUnder: Bool.true}
result.text
can we split this into 2 ideas? :big_smile:
uppercase/lowercase and scalars both have nontrivial design considerations, so I think it'll be helpful to have different threads about them!
I guess this current proposal is really _all_ about scalars
I don't really need Str.toUpper - I need Scalar.toUpper
I suppose we could have a Scalar module without a Scalar type, or just throw these in the Str module as Str.scalarToAsciiUpper or something
so, here are some considerations I've thought about with regard to uppercase:
i - in English, the uppercase version of that letter is I, whereas in Turkish the uppercase version is İ (with a dot on top). So any language that has some equivalent of Str.toUpperCase (read: most of them) either has a built-in locale concept, such that it automatically works differently if the current OS's locale is set to Turkish, or else it has bugs for Turkish users. There are other examples of this category of bug; Turkish is just an easy one to explain.Locale that the function can accept - should that be part of builtins? Probably not, because locale-specific information can change (like, in the real world) often enough that we probably don't want to have to do a language release to update itlocale package and then call Locale.toUpperCase or whatever), and there is a Str.toAsciiUppercase available, then people will overwhelmingly choose the path of least resistance and use the Str one so they don't have to use the package or obtain a Locale to provide, meaning that while trying to help people write better software, we've ended up creating an incentive structure that means in practice Turkish users end up with incorrect uppercasing in programs written in Rocof course, one more consideration there is the use case you mentioned here: if you're just trying to convert some programmatic text from snake_case to camelCase or PascalCase, it's proabably all ASCII anyway and so getting locales involved wouldn't matter
scalars are another rabbit hole; basically the more I've learned about Unicode, the more strongly I've become convinced that they should not get special treatment in stdlibs (which is why they don't in Roc), because I think the vast majority of the time they look like the right thing to reach for, they're actually the wrong thing to reach for
(grapheme clusters are)
some examples of why: https://github.com/roc-lang/roc/issues/4780
there is basically no hope that anyone doing string processing in terms of Unicode scalars will implement the correct Unicode semantics for the cases mentioned in that issue
so their best hope is that those cases don't come up (e.g. certain modifiers, emojis, etc.)
whereas if you're doing string processing in terms of graphemes (with each grapheme represented as a Str) we can actually offer corect semantics by default in the stlib! (such as in that issue)
Unicode is hard :sweat_smile:
but I want to create a Pit of Success around it if possible, instead of the footguns most stdlibs have!
FWIW this is exactly why I was calling it toAsciiUpper, not toUpper - with the intention that it very clearly only handles ascii.
At work a few years ago we had a meeting room named "turkish i", because of exactly the problem you describe.
yeah, my concern with offering Str.toAsciiUpper as the only one in the stdlib is that a ton of people will reach for it when they shouldn't, because it's convenient, and then Turkish users (among others) will be sad
Under what circumstances would an env var observably change during the running of a process? Windows perhaps? That cannot happen on unixy systems.
Certainly if there are other mechanisms for locale specification, then those could change at runtime, but perhaps the need to react to such changes is niche, such as for long running processes or interactive (gui/tui) apps, but it's likely quite an anti-feature for non-interactive cli apps and batch processing, in which determinism and speed are much more important
like for example suppose if we had a Locale.toUpper and if you know your text doesn't need to handle anything but ASCII, you just always pass it the same hardcoded locale
I would favor at least Unicode case changing as the basic option, since it does the right thing for ascii (from an ascii interpretation)
That makes code that only wants to deal with ascii needlessly more complicated, since (for example), upper/lower case can no longer operate over scalars - e.g. because of the german uppercase ß ("Double S").
@Kevin Gillette here's an easier way to see why that path won't work: if we're compiling to wasm which might run on any of:
...which wasm bytecode instructions should we emait to determine the locale in the builtins?
locale handling is exceptionally hard, has been done poorly many, many times, and so when we tackle that, it should be considered a major initiative involving many perspectives, probably closer to a Roc 1.0 release, not something we just have one person roll up their sleeves to "solve"
I totally buy that locale is a big project :+1:
but I'm also pretty sure it's a big project that shouldn't be in the stdlib
@Joshua Warner there might actually be a simple solution here: what if we just intentionally didn't put a toAsciiUppercase in the stdlib, and instead left it for a third party package to provide?
that way the path of least resistance is not to reach for it inappropriately because it's in the stdlib
Sure, that works fine, as long as it's easy to pull in a third party package
(currently it's not!!!!)
it should be now! I just landed that last week
should be able to publish them and import them via URLs just like platforms now
Oh :sweat_smile: I guess I'm behind the times!
I haven't announced it yet bc it's not totally complete yet - there's no documentation for it, and also transitive dependencies don't work yet (e.g. packages that depend on other packages)
but we have a test in the test suite that makes use of the basic functionality now!
I think for this use case it should already work, since it should only need to depend on builtins
yeah, my concern with offering Str.toAsciiUpper as the only one in the stdlib is that a ton of people will reach for it when they shouldn't, because it's convenient, and then Turkish users (among others) will be sad
On the other hand, if we include it in the stdlib we can include warnings in the docs and perhaps on autocompletion in the editor. I think if it wasn't in the stdlib, users would often write their own toAsciiUpper without being educated about the potential shortcomings.
We could also call it something like toUpperDangerous
We could solve that in documentation by explaining that toUpper does the right thing for ascii as well
We could also have a search function in documentation that supports synonyms or related functionality: you can search for plausible functions which don't actually exist, and it'll show you the closest function to what you want (or an explanation about why it doesn't exist in the stdlib)... A bit like the style used for compiler messages, but you get the assistance even when the compiler isn't involved.
I think if it wasn't in the stdlib, users would often write their own toAsciiUpper without being educated about the potential shortcomings.
Personally, I think this is totally fine. If the use case does not matter to the user, it doesn't matter. They can write/import whatever library that solves the problem. It may be naive, but most users really only need the naive solution.
Sure a user may not currently know the shortcomings, but they would learn when they run into it or if they work on internationalization for their company. I think a locale library and a lot of userland experimentation make way more sense then adding something like this to the standard library.
yeah also I think a vanishingly small percentage of people would actually read the documentation for a function like Str.toUpperCase - they'd probably just see the name and reach for it right away.
As evidence for this, there's an easter egg in the docs for Elm's function like this, and I've met very few Elm programmers who know about it:
https://package.elm-lang.org/packages/elm/core/latest/String#toUpper
They can write/import whatever library that solves the problem.
Asking the user to write something manually or search for a library for such a common case does not feel delightful.
yeah also I think a vanishingly small percentage of people would actually read the documentation for a function like Str.toUpperCase - they'd probably just see the name and reach for it right away
I'd bet the percentage would be a lot better for toUpperDangerous :)
Asking the user to write something manually or search for a library for such a common case does not feel delightful.
Fair, i guess. Though it is a super simple function to write in the common case. Even in the single language Turkish case, which is more complex, it should just be a small when expression. So i really don't think it hurts until the complex i18n cases. I don't think we should add the foot guns of ignoring i18n to the language and instead let libraries decide that individually. If we add anything i think it should be much later when we have a full design around locales.
Last updated: Jun 16 2026 at 16:19 UTC