Stream: ideas

Topic: unprintable characters in strings


view this post on Zulip Richard Feldman (Jul 24 2024 at 22:08):

I think we should special-case this in expect if possible :thinking:

view this post on Zulip Richard Feldman (Jul 24 2024 at 22:09):

we can't do it right now, but when we start showing function calls we can special-case string equality comparisons

view this post on Zulip Richard Feldman (Jul 24 2024 at 22:09):

to report if the displayed text is the same but the bytes are different

view this post on Zulip Basile Henry (Jul 25 2024 at 07:36):

I think a better solution is to not allow these non-characters in string literals at all and provide a good error message when parsing fails.
If a user really wants to insert them into a string they can make use of a unicode escape instead.
Now I guess the difficult part might be to figure out what characters are disallowed, and I think a good starting point (or maybe the solution) is this: https://github.com/dhall-lang/dhall-lang/blob/4895b7c760723583fe8c415152001f1ee5d30d58/standard/dhall.abnf#L107-L148

view this post on Zulip Basile Henry (Jul 25 2024 at 11:02):

If there's interest in my suggestion, I am happy to do the work associated with implementing/documenting it :blush:

view this post on Zulip Richard Feldman (Jul 25 2024 at 11:17):

I think literals are a separate consideration (but worth starting a thread in #ideas about it!)

view this post on Zulip Richard Feldman (Jul 25 2024 at 11:18):

in this case a literal happened to be used, but it's still possible to end up reading valid UTF-8 strings from different sources that give a confusing message when expect fails on their equality checks :big_smile:

view this post on Zulip Basile Henry (Jul 25 2024 at 11:22):

Right, but I think in this case where the weird UTF-8 comes from another source and not a literal in code it would be serialised to "\u(<num>)FOO" in the expect message in order to be a valid literal which would make the issue obvious.

view this post on Zulip Basile Henry (Jul 25 2024 at 11:24):

I don't think focusing on expect would solve this issue in general though. As we can see the small repro expect output doesn't have any strings, it's too late to catch the issue

view this post on Zulip Notification Bot (Jul 25 2024 at 12:03):

9 messages were moved here from #beginners > If equality with a Str by Richard Feldman.

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:06):

Basile Henry said:

Right, but I think in this case where the weird UTF-8 comes from another source and not a literal in code it would be serialised to "\u(<num>)FOO" in the expect message in order to be a valid literal which would make the issue obvious.

I see, so basically in the Inspect implementation for Str, we escape unprintable characters?

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:09):

that could mess up emojis though

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:10):

might need to be careful about detecting things like "this wouldn't be visible in a diff" and only escaping those

view this post on Zulip Basile Henry (Jul 25 2024 at 12:22):

I imagine the logic is any string character that can be represented without an escape should. This seems to be the logic for "\n" and "\t" at the moment.
A quick test with the online REPL does show that "hello\nworld" gets shown back to the user with an actual newline. This seems fine since the user could have used a multiline string literal as well.

view this post on Zulip Basile Henry (Jul 25 2024 at 12:25):

It looks like this breaks down for splice escapes, where the REPL returns an invalid string literal:

» "splice escape: \$here"

"splice escape: $here" : Str

view this post on Zulip Basile Henry (Jul 25 2024 at 12:27):

If I copy and paste what looks like valid code it fails. I presume this is because "here" is not in scope :thinking:

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:27):

that should be valid

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:27):

$(here) would be interpolation

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:27):

but only if there are parens

view this post on Zulip Basile Henry (Jul 25 2024 at 12:28):

Oh :sweat_smile: then it crashed for other reasons

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:28):

can you open an issue for that? :big_smile:

view this post on Zulip Basile Henry (Jul 25 2024 at 12:32):

So I guess the real issue I meant to share, relevant to this conversation would be this:

» "splice \$(here)"

"splice $(here)" : Str

My assumption is that what the REPL prints is supposed to be valid code which I could copy and paste. Or is this a faulty assumption? In this case it is valid code but with a different meaning (splicing a variable)

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:32):

your assumption is correct!

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:33):

it should insert the backslash but doesn't know to

view this post on Zulip Richard Feldman (Jul 25 2024 at 12:33):

we never implemented that case when introducing the $ interpolation syntax

view this post on Zulip Basile Henry (Jul 25 2024 at 12:36):

Alright, I'll open an issue for this (escaped slice printing) as well and look into the printing logic.

Then the question to restrict a valid string literal further is a separate question, but I think knowing that it would at least get printed consistently would be a good starting point.

view this post on Zulip Luke Boswell (Jul 25 2024 at 22:28):

@Basile Henry do you think we still need this issue? https://github.com/roc-lang/roc/issues/6919

I'm thinking of closing it as no longer required -- now we know the root cause.

view this post on Zulip Basile Henry (Jul 25 2024 at 22:30):

Closing makes sense :+1: I might reference it if/when I propose restricting what is allowed in a string literal :blush:


Last updated: Jun 16 2026 at 16:19 UTC