Stream: ideas

Topic: CStr or Non Refcounted Str type?


view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 01:35):

What are peoples thoughts around Roc having a Str type that is always assumed to be truly immutable and without a reference count? My main thought for the use case is host interactions. I feel like it may be relatively common for a host to have a pointer and length pair that represents a string. With current Roc, the host can not pass that to Roc without without incurring a copy. The host must make a copy in order to add the refcount to the beginning of the Str.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 01:35):

This type would avoid that copy and enable roc to use the the Str. It would work with regular string APIs, but would always be treated as constant. Thus it would always be copied on modification.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 01:36):

Note: I think this feature has significant potential overlap with seamless slices.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 01:36):

Though seamless slices still theoretically require a refcount.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 08:21):

Could you elaborate on why it is not possible to pass a unique string to Roc today?

view this post on Zulip Brian Carroll (Jul 24 2022 at 09:15):

There's no way to guarantee it stays unique. And if any Roc code copies it or drops it, it will try to modify the reference count, which doesn't exist. So that's memory unsafe.

view this post on Zulip Brian Carroll (Jul 24 2022 at 09:17):

You could insert a reference count of "infinity" to make Roc treat it as a constant and not modify the ref count. But the ref count is in front of the character bytes so you have to copy them anyway.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:01):

:point_up: this is the big issue

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:01):

Refcount at beginning and needing to set it means copying the entire string anyway.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:02):

Ah! Now it makes sense!

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:03):

In that case, I very much agree that a special way to indicate 'this is a refcountless string constant' that does not need this extra copy would be a good idea

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:03):

As a small aside, you can choose to give roc ownership of the string and set the refcount to 1, but that incurs the copy mentioned above

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:03):

:thinking: If we can somehow express 'constantness' generally for functions moving between Roc and the platform, then that might be interesting in general

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:04):

It is more or less the opposite of uniqueness

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:05):

Well then you have to make sure the host cleans it up too, or else limits the number of such values created. Because it's a leak.

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:05):

But that might work in some cases

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:05):

True, it would be a 'view' into something that is fully host-owned

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:06):

And you have no way to know if a Roc structure is referring to it, since we froze the ref count

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:07):

:thinking: @Brendan Hansknecht Are there use cases for a type like 'CStr' which would not be covered by e.g. string slices by the way? You already mention that there is a lot of overlap, but maybe they overlap fully?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:07):

String slices still refer to a refcount

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:07):

So they don't get around the issues

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:07):

At least as currently design

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:07):

Cause they may need to cleanup the string they are a slice of

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:07):

Ah, of course

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:08):

Though I guess you could use them as a hack around the problem even as currently designed

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:08):

If there's no ref count then how can the host know when it's safe to free?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:08):

Only by context

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:09):

Roc is pure and can't hold a reference tos something

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:09):

Right so in a Roc callback called from the host or something

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:09):

Yeah

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:10):

I think it would have to be scoped similar to how we want to scope arena allocation for roc

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:10):

Limited to a request or something similar

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:10):

That or truly constant

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:12):

But yeah if the host received back a closure, a box of some type, or a type where it didn't understand every field, it would not be able to tell if the cstr is still being referenced in roc.

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:13):

Yeah exactly, it's risky stuff. "Safe as long as you're careful"

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:13):

Another thing to think about, is that all types currently have the refcount at the start, but if it makes a lot of sense to vary this for e.g. strings, arrays and other dynamically-sized types, then maybe we can special-case them.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:14):

Although I guess that if you have a reference to part of a larger string constant then you still need to reallocate to put the refcount at the end as well :thinking:

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:15):

I get that idea in the abstract but ref counting is complicated enough already!

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:16):

You're not wrong

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:17):

It's a nice simplification that it's always in the same place independent of type. And that matters for speed too.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:18):

Definitely!

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:19):

Is there currently a difference in Roc between a compiled string constant and a string built at runtime by the way? (Besides small-string optimization)

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:20):

The types you listed are basically all the refcounted types besides box of x. So we would probably change all refcounts instead of just those?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:20):

And a constant string should just point to a location in the binary and have an infinite refcount.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 15:21):

Well runtime will point to the heap and generally not have an infinite refcount.

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:25):

If a list has ref count at the end, does that cause problems when you grow it?

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 15:26):

Putting the refcount at the end is a bad idea at hindsight. It means that all operations that modify it need to move it and all checks for the refcount need to read the size as well

view this post on Zulip Brian Carroll (Jul 24 2022 at 15:34):

The big issue for me is one we mentioned earlier. Roc can return a struct containing multiple references to the string and the host doesn't always fully know that structure (because it wants to let the user to define an app state type). So the use cases need to be limited in some way. And it seems easy to get wrong.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 16:09):

I mean the host already has other potential memory safety issues and that is part of creating a host. We should definitely should make it as safe as possible, but there may be limits especially when it comes to performance related issues. This should minimally affect regular Roc programmers (though hosts can have bugs).

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 16:10):

What prompted this for me was thinking about the web server use case. Where the host may have access to the body as a list of bytes. You really don't want to copy that on every web request if you don't have to.

view this post on Zulip Richard Feldman (Jul 24 2022 at 16:45):

one potentially interesting idea: special-case a capacity of 0 to mean "this is not refcounted and also is not eligible for in-place mutation" and then use it for both this use case as well as (for example) static heap allocations in the binary, like string literals

view this post on Zulip Richard Feldman (Jul 24 2022 at 16:45):

that way for those we can tell they're statically allocated (and don't need refcounting) without having to chase a pointer

view this post on Zulip Richard Feldman (Jul 24 2022 at 16:46):

also saves an extra usize worth of bytes per static allocation

view this post on Zulip Richard Feldman (Jul 24 2022 at 16:46):

(in the binary)

view this post on Zulip Brian Carroll (Jul 24 2022 at 16:53):

The case I was imagining was one where there's long term state like in the Elm architecture. App returns a type that's opaque to the host, which could contain references to these "host strings", so they have to be kept alive forever.
That wouldn't apply in the web server case. You don't have in-memory state there that outlives a request. But it might apply to GUIs.

view this post on Zulip Brian Carroll (Jul 24 2022 at 16:55):

If it's ok to keep it alive forever then great. If it's not ok to keep it forever then the zero capacity trick doesn't help.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 17:01):

Maybe we don't let the host str type be returned to the host? So make it impossible to outlive request?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 17:01):

Must copy to return to the host

view this post on Zulip Brian Carroll (Jul 24 2022 at 17:20):

Hmm but how to enforce or check that? If the type is opaque, which it has to be to allow the app to store its own relevant state, then it can contain anything.

view this post on Zulip Brian Carroll (Jul 24 2022 at 17:21):

So I mean the host can't make assumptions about it

view this post on Zulip Brian Carroll (Jul 24 2022 at 17:22):

I think that kind of host just can't use this technique.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 18:22):

I mean that we make the Roc compiler enforce that.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 18:23):

It could do this in the background theoretically, but would maybe be better explicitly.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 20:45):

Richard Feldman said:

one potentially interesting idea: special-case a capacity of 0 to mean "this is not refcounted and also is not eligible for in-place mutation" and then use it for both this use case as well as (for example) static heap allocations in the binary, like string literals

If I understand the use case correctly, the idea is to be able to pass a 'view' into an existing external string buffer. So this means that adding any refcount or other data surrounding the string is impossible (without first copying it, which is the thing we want to avoid).

view this post on Zulip Richard Feldman (Jul 24 2022 at 20:54):

adding a refcount would be possible - you could Box it

view this post on Zulip Richard Feldman (Jul 24 2022 at 20:55):

so you'd need a separate heap allocation, but I don't see how that would be avoidable :big_smile:

view this post on Zulip Richard Feldman (Jul 24 2022 at 20:55):

basically what Rust's Rc and Arc do

view this post on Zulip Ayaz Hafiz (Jul 24 2022 at 21:12):

I feel like it may be relatively common for a host to have a pointer and length pair that represents a string. With current Roc, the host can not pass that to Roc without without incurring a copy.

Are there applications of the host where a copy of a pointer/length would incur too much latency, relative to what the Roc code is doing? IIRC today you can memcpy a C-style string into a RocStr buffer. but maybe too slow for some use cases?

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:15):

There is at least --unless I'm missing something-- a very real disadvantage to adding an extra possible runtime representation for Str (and maybe List x, namely that all functions manipulating them need to check whether they are working on the refcounted or the non-refcounted one. So there are many new branches introduced in common operations

view this post on Zulip Ayaz Hafiz (Jul 24 2022 at 21:18):

If I understand correctly, the non-RC'd Str would have to be separately monomorphized as that in each place it's used, since there would be no flag to tell whether its referenced counted or not, right? Otherwise you wouldn't be able to construct it from a ptr/length without a copy anyway. If that's the case there wouldn't be a runtime cost (other than instruction cache stuff maybe), though there would be a cost to compile time and compiler complexity

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:20):

Yeah, if it just is a plain separate type then that would solve some things (but it would mean that either not all of the Str / List API would be available, or the implementation code would need to be copied)

view this post on Zulip Ayaz Hafiz (Jul 24 2022 at 21:21):

yes it would need to be copied

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:23):

On the other hand, if it is its own special type, lets name it 'StrView' then we could make Roc programmers more conscious about when it would be copied or not. I.e. only make stuff that does not require copying available as well as a toStr function.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:24):

It makes me thing of C++17-and-newer's std::string_view.

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:24):

And I guess it also is very similar to Rust's str but looking at that directly is a bit confusing since you only really can use it as &str.

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:52):

There is at least --unless I'm missing something-- a very real disadvantage to adding an extra possible runtime representation for Str (and maybe List x, namely that all functions manipulating them need to check whether they are working on the refcounted or the non-refcounted one. So there are many new branches introduced in common operations

so in both cases (let's just assume List because it's simpler than Str) you have:

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:53):

the check for "working on refcounted or non-refcounted one" would only occur when incrementing or decrementing the refcount

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:54):

we already have a branching conditional in that logic: check for a refcount of "this is stored in the readonly section, so do not write to it or else you'll segfault!"

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:54):

so instead we would move that branch to before we dereference the refcount

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:55):

so it would be the same number of branches, just in a different place

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:55):

that said, there's another case to consider: what happens when refcount overflows?

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:55):

this can theoretically happen, although it's of course extremely unlikely

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:55):

Is the refcount not a saturating add?

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:56):

yes, but the add is not the problem

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:56):

the problem is that once we've overflowed, we need to disable decrementing

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:56):

otherwise we could theoretically end up with a use-after-free

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:56):

But if the add is saturating, how can it overflow?

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:57):

I mean once it saturates

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:57):

once we've incremented all the way to the maximum number we can store, we have to never decrement it again

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:57):

because the refcount is no longer necessarily accurate

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:57):

there might actually be more references, but they weren't captured because when they incremented the refcount, it was silently lost in the saturating add

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:58):

so once it saturates, we have to leak

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:58):

Once you fill a full 64-bit (32-bit on WASM) number, you have the choice between a use-after-free and a memory leak. I believe we currently say that the refcount is 'infinite' in this situation and choose the memory leak option

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:58):

or a panic

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:58):

right, correct

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:58):

but the way to implement the leak is to have decrement always check to see if the refcount is the saturated number

view this post on Zulip Richard Feldman (Jul 24 2022 at 21:58):

and if it is, then don't actually decrement

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 21:59):

Yep. What is the problem there?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:00):

so today, we use the saturated reference count to be the refcount number that indicates "this is stored in the readonly section of memory"

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:00):

hm, I was thinking this might net us an extra conditional, but actually in retrospect I don't think so

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:00):

in fact I think that case actually gets better, because this can become a non-branching conditional

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:01):

since we can just always write the new refcount back, even if the refcount is unchanged

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:01):

(whereas before we couldn't)

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:01):

and then decrementing or leaking can just be a cmov

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:01):

ok, never mind - I think this is...actually maybe a net performance win?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:02):

compared to status quo

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:02):

I'm not completely following. What is a better situation compared to what?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:02):

thinking through it, I tentatively believe that "capacity is 0 means it isn't refcounted" is actually a more performant design than our current design

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:03):

like if we switched over static strings and such to use that design, I think things would run very slightly faster and use a bit less memory

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:03):

It might be the case, although 1111111...111 is also a bit pattern for which there often are special instructions (e.g. non-branching conditional changes) available since '-1' is a common thing to keep track of

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:03):

and then as a bonus, host authors could send preallocated memory into Roc as a List or Str

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:04):

same is true of 0 though

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:05):

Yeah, I meant 'there probably is no difference in efficiency on the assembly layer'.

But being able to use it with e.g. calloc might be a good reason to pick 0

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:06):

ah, interesting

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:06):

It still does not address @Brendan Hansknecht 's original problem of wanting 'zero copy' read-only strings however

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:07):

:thinking: why wouldn't it?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:08):

Brendan Hansknecht said:

What are peoples thoughts around Roc having a Str type that is always assumed to be truly immutable and without a reference count?

that's exactly what the "capacity 0 means no refcount" design would be :smiley:

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:08):

:face_palm: And here I thought it was "refcount 0 means no refcount"

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:08):

ah!

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:08):

yeah sorry

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:09):

the key point is that capacity is stored on the stack :big_smile:

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:10):

:+1: So you just create a (pointer to start, length in bytes, capacity of 0} thing and that gets put on the stack?

view this post on Zulip Qqwy / Marten (Jul 24 2022 at 22:10):

Yes, that would be a nice way to solve it I think!

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:16):

yeah exactly!

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:17):

the performance win comes because we already need to do a branching conditional to see if it's safe to write the new refcount, and we'd just move that branching conditional to check the capacity on the stack instead of the refcount on the heap

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:17):

and then the heap logic stays the same in both cases

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:21):

So the refcount logic is will be. Check if capacity is positive, negative, or 0. If negative, this is a seamless slice. If positive, this is a regular list on the heap, follow the current refcount logic (do we still saturating add, or do we check for max and set the capacity to 0). If zero, do nothing.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:22):

Note, this still enables @Brian Carroll's concerns. The app could keep reference to a host string and then the host could free it leading to a use after free error. This can only happen if the host has a bug and Roc returns a type that captures the host data.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:24):

I think the main advantage of a totally separate type is that it makes it easy to see when you might be enabling bugs by returning something to the host that you really want to copy first.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:27):

Oh, also, with setting the capacity to zero after saturating the refcount, you still need all of the same checks and saturating refcount changes because you are only setting your local copy to zero. All other copies will not have a zero capacity and will still check the refcount. So I think this is strictly more work in the common case. It is only less work for totally constant strings and leaked strings(but this probably will almost never happen due to refcount size)

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:27):

I think long-term the ratio of application authors to platform authors will be over 100:1, so I think the bar should be super high for features of the form "makes things nicer for platform authors at the expense of making things less nice for application authors" :big_smile:

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:27):

For sure. But I think the explicit type could be better for both

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:28):

hm, how would it be better for application authors? :thinking:

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:28):

it seems like the main selling point is that it makes it easier for host authors to tell the types apart, yeah?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:28):

(which, incidentally, they can do with an opaque type wrapper themselves in userspace!)

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:29):

The app could keep reference to a host string and then the host could free it leading to a use after free error.

actually, this should probably never happen - in that host authors always need to check for "is it safe for me to free this?" for any string that came from the app

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:29):

Think of it like this, if an app crashes or starts acting wrong due to the use after free, the end user has no easy way to tell what is going wrong. They have no indication that it is related to the Str the host passed into them.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:30):

because they can't know whether it came from readonly memory, for example

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:30):

so whatever the indicator is (capacity, refcount, etc.) they always need to check at runtime

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:33):

and I think things like roc_std should make this easy to do

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:33):

Think of this host app. It is a web server with an opaque model.
Host gets a web request, passes a "StrView" into the roc app of the request body.
The Roc app then updates its model and returns that to the host.
The web request is now over and the host dumps the data that was backing the "StrView"
It turns out that the roc app held a copy of the "StrView" in its opaque data type.
Next call to the Roc app, it reference the "StrView" that it kept around longer than the host expected and the program crashes.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:34):

The host author never thought that a Roc app would keep a reference to the "StrView"
they just expect them to use it to update the model

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:34):

ok, but how does having a separate StrView type help prevent this from happening? :big_smile:

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:34):

The Roc app author has no idea why the app is crashing. They just stored a Str in their model.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:35):

hm, so is the idea that we add StrView to the language with the expectation that Roc app authors are expected to be distrustful of it? :thinking:

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:36):

like the selling point to app authors is that if their app segfaults, it might be because they used a StrView?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:37):

I think it has a few possibilities to help:

  1. It doesn't read Str and the end user would now it was special somehow. Could just be common knowledge/word of mouth/doc based defense. Not great, but better
  2. We could simply force that types with StrView in them can not be returned to the host. If you are returning it to the host, it must be a Str. Now we will always add the copy if we want to keep the reference.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:37):

I would much rather authors are careful around using StrView than them feeling like Roc is a broken language cause Strpassed in from the host can cause segfaults.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:39):

Of course, there could be cases where it is valid to keep a reference to a StrView for a number of calls, but I think this is a case where we could easily enforce safety and that would trump performance (just too niche of a use case).

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:40):

to be totally honest, I think if application segfaults turn out to be such a big problem in practice that it's beneficial for application authors to get assistence in debugging and avoiding them, then I'd probably conclude that the whole "platforms and applications" design turned out to be a failed experiment :sweat_smile:

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:40):

to me, the whole premise is that platform authors are going to be able to reliably avoid segfaults, using tools like bindgen and roc_std

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:40):

and if it turns out they're unable to provide a segfault-free experience, I really don't think this language is gonna make it

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:41):

I totally agree, but some edge cases are complex.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:41):

I think this will be a common performance issue.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:41):

Passing lists and strings to roc will be super common

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:41):

Always incurring a copy when you can't start with a Roc data type is not great.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:42):

there doesn't have to be a copy though - you can Box it

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:42):

and then check the refcount on the box if you get it back to determine if it's safe to free

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:42):

What is stop roc from trying to access the refcount of the Str?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:43):

as in, the application?

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:43):

hostStr = Box.unbox boxedHostStr
#hostStr is still a broken str without a refcount

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:43):

oh

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:43):

well I mean the platform author and the host author are presumably the same person or organization haha

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:44):

they have complete control over both code bases

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:44):

But how do you use a Box Str in Roc without eventually unboxing and accessing the non-existant refcount?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:45):

yeah good point

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:45):

well

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:45):

I guess it depends on what you're doing with it

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:46):

but yeah if you offer a way for the application to get at the underlying Str without copying it, that could definitely cause problems

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:46):

Yeah, so I don't think it solves the problem. The goal is to avoid that copy.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:47):

One hack that is safe is to store the refcount out of band (in rare cases it could theoretically fail).

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:47):

well then there has to be a separate pointer to it on the stack, right?

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:47):

and Roc needs to know about it, in order to update it automatically

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:49):

Basically, you pass the string in as a seamless slice, but you set the capacity to point to a refcount that is in some random location that was allocated separately. Not great for caching, but should solve the problem assuming you can always build said slice.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:49):

oh yeah, I thought about that hack at one point haha

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:50):

although I was thinking about it as a way to force things to be immutable

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:50):

point it to a maxed-out refcount

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:50):

that's clever!

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:50):

Actually, if we treat a negative capacity as a pointer instead of an offset, it should be completely safe.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:50):

so you'd point it to like a threadlocal or something

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:50):

Yeah, or just usize on the stack

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:50):

*allocated in the heap

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:51):

right - so one trick there that I was trying to figure out is if you can be certain the allocated refcount (whether on the stack or the heap) would have a higher address

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:51):

If it were on the stack you would hit the same lifetime issues.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:51):

so you could get there via a subtracted offset :big_smile:

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:51):

Yeah, so that is why instead you store the pointer bit shifted over cause they always have some zero bits

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:52):

that should work :thumbs_up:

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:53):

and I think it would take the same number of machine instructions as the "subtract an offset" design

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:53):

cool!

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:54):

well that's another argument for seamless slices haha

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:54):

And since the host can check the refcount, in the case were the slice should never be stored (due to lifetime limitations), the host can add a assert or debug asssert that will crash with a message explaining the problem to the Roc app author instead of segfaulting

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:54):

well I think if there's a known lifetime concern, the host just can't pass it to the app

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:55):

I don't think "sometimes I will pass you a value, but make sure not to store it or else it'll crash your app!" is a good platform design :laughing:

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:56):

I think the issue is mostly a problem where the host does not expect the app to store the entire string, but the app does it for some reason. So the host does not want to copy or pessimize the use case.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:56):

If the app stores the value, it is on the app to copy it.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:57):

No need to pessimize every user due to one user keeping reference to a Str when it isn't expected.

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:57):

I think the host could be more graceful about it than that

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:58):

for example, if the host checks the refcount and sees that it's still in use, it could opt to leak memory instead of crashing

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:58):

that's the usual symptom if you write an application that holds onto a reference for longer than it should

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:58):

Yeah, that may be a possibility in some cases.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:59):

Also, you need to make sure to never let Roc free these special slices.

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:59):

It would try to free wrong and crash

view this post on Zulip Richard Feldman (Jul 24 2022 at 22:59):

well that's easy enough - just make sure the refcount is incremented one more time before passing it to Roc

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 22:59):

true

view this post on Zulip Brendan Hansknecht (Jul 24 2022 at 23:02):

But yeah, I like this method. It is now a much less likely to cause any issues. Issues should be check-able by the platform. No extra check for zero capacity (though we could still add that for truly const data if that makes sense). Only extra cost is that the refcount is elsewhere in memory and will be less likely to be in cache. Which is probably a fine tradeoff for not copying a theoretically larger Str/List.

view this post on Zulip Richard Feldman (Jul 24 2022 at 23:03):

yeah 0-capacity seems decoupled from this now :smiley:

view this post on Zulip Richard Feldman (Jul 24 2022 at 23:03):

could do it or not, but either way seamless slices can address the use case!

view this post on Zulip Richard Feldman (Jul 24 2022 at 23:04):

super cool!

view this post on Zulip Richard Feldman (Jul 24 2022 at 23:04):

I'll add a note to that issue about the bit shift


Last updated: Jun 16 2026 at 16:19 UTC