Stream: show and tell

Topic: ✔ language reference: strings


view this post on Zulip Richard Feldman (Dec 29 2023 at 17:50):

I wrote up a draft of a language reference page for strings in Roc!

view this post on Zulip Richard Feldman (Dec 29 2023 at 17:51):

it's intended to be fairly comprehensive (although of course there is a tradeoff in terms of overall length), so if you think of any important string details that aren't mentioned in there, let me know

view this post on Zulip Richard Feldman (Dec 29 2023 at 17:51):

any feedback on it welcome!

view this post on Zulip Brian Carroll (Dec 29 2023 at 19:52):

You can put any expression you like inside the parentheses, as long as it doesn't contain any newlines

Why no newlines?

view this post on Zulip Brian Carroll (Dec 29 2023 at 19:53):

In string interpolation expression values

view this post on Zulip Brian Carroll (Dec 29 2023 at 19:59):

conceptually equivalent strings (like "caf\u(e9)" "caf\u(301)e")

I think that last e should be before the Unicode escape rather than after it

view this post on Zulip Richard Feldman (Dec 29 2023 at 20:04):

oops, fixed!

view this post on Zulip Richard Feldman (Dec 29 2023 at 20:05):

Brian Carroll said:

You can put any expression you like inside the parentheses, as long as it doesn't contain any newlines

Why no newlines?

basically a combination of implementation complexity (then we'd have to track and enforce indentation level inside that arbitrary expression, to make sure it's "still inside the multiline string," which would affect parsing for all expressions) and incentives ("if you're writing a multiline expression in string interpolation, you're probably making code that's hard to read - just put it outside and give it a name")

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:12):

Ooooh do you mean newlines in the source code, as opposed to escaped newlines in the expression value? Because that isn't how I read it!

view this post on Zulip Richard Feldman (Dec 29 2023 at 20:13):

ohhhhh gotcha! I'll edit that to clarify

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:13):

A "length" unsigend integer

Typo here in "unsigned"

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:15):

Richard Feldman said:

ohhhhh gotcha! I'll edit that to clarify

Yeah, I guess on my first reading I am not thinking this far out into edge-case land! I am thinking about strings.

view this post on Zulip Richard Feldman (Dec 29 2023 at 20:17):

I edited it - how does the current wording look?

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:20):

Yeah great :+1::smiley:

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:22):

Each of these three fields is the same size: 64 bits on a 64-bit system, and 32 bits on a 32-bit system. Empty strings do not have heap allocations, so an empty Str on a 64-bit system still takes up 24 bytes on the stack (due to its three 64-bit fields). The actual contents of the string are stored in one contiguous sequence of bytes, encoded as UTF-8.

I think maybe the 2nd and 3rd sentences could be swapped. The last sentence switches back to being about general strings rather than just empty ones and that didn't match my expectation.

view this post on Zulip Brian Carroll (Dec 29 2023 at 20:24):

The reference count is stored on the heap immediately before the first byte of the string's contents, and it has the same size as a memory address.

As users of the language, do we care where it's stored? This implies we do.

view this post on Zulip Richard Feldman (Dec 29 2023 at 20:25):

I figure someone might care if they're thinking about memory cache locality and things

view this post on Zulip Brian Carroll (Dec 29 2023 at 21:10):

Fair enough.

view this post on Zulip Brian Carroll (Dec 29 2023 at 21:11):

Great to have this doc, nice work!

view this post on Zulip Luke Boswell (Dec 29 2023 at 21:29):

This looks really good. :grinning:

view this post on Zulip Domas Tamašauskas (Dec 29 2023 at 22:53):

Represents is mistyped twice in the performance section. I was also confused about graphemes not being part of Str, because I remember using https://www.roc-lang.org/builtins/Str#graphemes, I assume this means it will be moved, right? Other than that the document is really well written and was a joy to read :big_smile:

view this post on Zulip Richard Feldman (Dec 29 2023 at 23:17):

sorry haha, yeah the plan is to remove it from Str :big_smile:

view this post on Zulip Agus Zubiaga (Dec 30 2023 at 02:36):

This is great! Very informative and friendly :grinning:

view this post on Zulip Hannes Nevalainen (Dec 30 2023 at 04:20):

Nice read! I think it would be a very useful read for many programmers :)

view this post on Zulip Oskar Hahn (Dec 30 2023 at 08:10):

I learned so much by reading this.

Maybe you could add \\ as another Escape sequence.

What was not obvious to me is, that this

"\nVery long string with a newline at the beginning"
|> Str.trim
|> doSomeStrManipulation

will copy the long string to a new allocated space in memory, even if it looks like, it could manipulate the string in place.

Does Str.trim return a seamless slice, if the white spaces are at the end, or does it return the original string with a smaller length?

view this post on Zulip Anton (Dec 30 2023 at 12:32):

Excellent explanation!

view this post on Zulip Brendan Hansknecht (Dec 30 2023 at 12:42):

Not sure current impl, but should be slice if beginning, same str if just end.

view this post on Zulip Notification Bot (Jan 25 2024 at 23:29):

Richard Feldman has marked this topic as resolved.


Last updated: Jul 06 2025 at 12:14 UTC