I wrote up a draft of a language reference page for strings in Roc!
it's intended to be fairly comprehensive (although of course there is a tradeoff in terms of overall length), so if you think of any important string details that aren't mentioned in there, let me know
any feedback on it welcome!
You can put any expression you like inside the parentheses, as long as it doesn't contain any newlines
Why no newlines?
In string interpolation expression values
conceptually equivalent strings (like "caf\u(e9)" "caf\u(301)e")
I think that last e
should be before the Unicode escape rather than after it
oops, fixed!
Brian Carroll said:
You can put any expression you like inside the parentheses, as long as it doesn't contain any newlines
Why no newlines?
basically a combination of implementation complexity (then we'd have to track and enforce indentation level inside that arbitrary expression, to make sure it's "still inside the multiline string," which would affect parsing for all expressions) and incentives ("if you're writing a multiline expression in string interpolation, you're probably making code that's hard to read - just put it outside and give it a name")
Ooooh do you mean newlines in the source code, as opposed to escaped newlines in the expression value? Because that isn't how I read it!
ohhhhh gotcha! I'll edit that to clarify
A "length" unsigend integer
Typo here in "unsigned"
Richard Feldman said:
ohhhhh gotcha! I'll edit that to clarify
Yeah, I guess on my first reading I am not thinking this far out into edge-case land! I am thinking about strings.
I edited it - how does the current wording look?
Yeah great :+1::smiley:
Each of these three fields is the same size: 64 bits on a 64-bit system, and 32 bits on a 32-bit system. Empty strings do not have heap allocations, so an empty Str on a 64-bit system still takes up 24 bytes on the stack (due to its three 64-bit fields). The actual contents of the string are stored in one contiguous sequence of bytes, encoded as UTF-8.
I think maybe the 2nd and 3rd sentences could be swapped. The last sentence switches back to being about general strings rather than just empty ones and that didn't match my expectation.
The reference count is stored on the heap immediately before the first byte of the string's contents, and it has the same size as a memory address.
As users of the language, do we care where it's stored? This implies we do.
I figure someone might care if they're thinking about memory cache locality and things
Fair enough.
Great to have this doc, nice work!
This looks really good. :grinning:
Represents is mistyped twice in the performance section. I was also confused about graphemes not being part of Str, because I remember using https://www.roc-lang.org/builtins/Str#graphemes, I assume this means it will be moved, right? Other than that the document is really well written and was a joy to read :big_smile:
sorry haha, yeah the plan is to remove it from Str
:big_smile:
This is great! Very informative and friendly :grinning:
Nice read! I think it would be a very useful read for many programmers :)
I learned so much by reading this.
Maybe you could add \\
as another Escape sequence.
What was not obvious to me is, that this
"\nVery long string with a newline at the beginning"
|> Str.trim
|> doSomeStrManipulation
will copy the long string to a new allocated space in memory, even if it looks like, it could manipulate the string in place.
Does Str.trim return a seamless slice, if the white spaces are at the end, or does it return the original string with a smaller length?
Excellent explanation!
Not sure current impl, but should be slice if beginning, same str if just end.
Richard Feldman has marked this topic as resolved.
Last updated: Jul 06 2025 at 12:14 UTC