Stream: ideas

Topic: Str.displayUtf8


view this post on Zulip Richard Feldman (Aug 06 2024 at 17:08):

Should we have this?

Str.displayUtf8 : List U8 -> Str

...where if it encounters any invalid UTF-8 sequences, it replaces them with the Unicode Replacement Character?

view this post on Zulip Richard Feldman (Aug 06 2024 at 17:08):

it's needed for the Path API, but not sure if it would be helpful elsewhere too :thinking:

view this post on Zulip Sam Mohr (Aug 06 2024 at 17:18):

Maybe Str.sanitizeUtf8?

view this post on Zulip Sam Mohr (Aug 06 2024 at 17:22):

displayUtf8 feels like it's just joining the utf-8 together

view this post on Zulip Anton (Aug 06 2024 at 17:30):

Not sure about the name, but I like the function :)

view this post on Zulip Isaac Van Doren (Aug 06 2024 at 18:24):

This would be great

view this post on Zulip Richard Feldman (Aug 06 2024 at 18:52):

Rust sometimes calls this fromUtf8Lossy but I don't like that because it sounds like it's always going to lose something (like lossy JPEG compression) but that's not really what's happening here

view this post on Zulip Brendan Hansknecht (Aug 06 2024 at 23:34):

It does lose something

view this post on Zulip Brendan Hansknecht (Aug 06 2024 at 23:34):

It loses any character that isn't valid utf8

view this post on Zulip Brendan Hansknecht (Aug 06 2024 at 23:35):

So if you go back to utf8 from that string, you will have lost information. And instead have the urf8 replacement character

view this post on Zulip Brendan Hansknecht (Aug 06 2024 at 23:36):

I definitely think this API should be fromUtf8Something to make it discoverable.

view this post on Zulip Brendan Hansknecht (Aug 06 2024 at 23:38):

You could make it fromUtf8WithReplacement if you want to be really explicit.

view this post on Zulip Richard Feldman (Aug 06 2024 at 23:49):

Brendan Hansknecht said:

It does lose something
It loses any character that isn't valid utf8

yeah I just think of "lossy" meaning "you can expect it to lose something" whereas this is more "in very rare edge cases it can lose something"

view this post on Zulip Luke Boswell (Aug 06 2024 at 23:54):

Str.fromBytesReplaceInvalidUtf8


Last updated: Jun 16 2026 at 16:19 UTC