Stream: beginners

Topic: API design around non-memory resources


view this post on Zulip Asier Elorz (he/him) (Dec 05 2023 at 08:31):

Let's say I'm designing an API for a platform, and my platform offers access to the file system. Most commonly, we would have two functions:

readFile : Path -> Task (List U8) FileError
writeFile : Path, List U8 -> Task {} FileError

This is great. It is safe, easy and enough for many users.

However, there are use cases when it is not enough. A file may be too big to fit into memory. Or it may be a zip or other kind of archive, and I just want to read the index and maybe parts of the content depending on what the index says, or I may want to patch a few bytes from a very big file without having to load it all to memory, write to it in memory and then write back everything again, which would be very wasteful.

These sort of things are usually achieved through APIs like fopen in the C standard library, where you get a file handle, you work on that file handle and then you free it (fclose) when you are done. This lets the programmer do much more granular operations on a file than a path based API, but also poses a new problem: the file handle needs to be freed when it is not used anymore.

Languages like C++ and Rust solve this by encapsulating the resource in a class with a destructor or Drop trait that will automatically run cleanup code when the object goes out of scope. Roc manages memory automatically, but as far as I know (please do correct me if I am wrong) doesn't let the user write arbitrary cleanup procedures.

Which takes me back to the original question. What would be the idiomatic way of designing an API in Roc for such a task? I would very strongly prefer if manual fclose was not the answer, because it is very error prone, and because statically ensuring no leaks is a solved problem since Bjarne Stroustrup made C with classes in 1980, so Roc should be able to be at least as good.

The best I can think for now is a function that takes another function as a parameter, and this inner function is the only one that will ever have access to the file handle. This way we can open the file, call the user function, close the file, and the handle is never risked to be leaked. This is a common pattern in some Rust APIs, such as std::thread::scope, where the thread scope object is only exposed within the user's callback function and never leaks outside.

So the signature would be something like:

openFile : Path, (FileHandle -> Task a e) -> Task a e

This would sort of work, I think, but I am not sure that this is the ideal solution.

I am very interested in what you all think about how you would design such an API.

view this post on Zulip Luke Boswell (Dec 05 2023 at 09:21):

I feel like I've seen a proposal somewhere that has streams in it as an example for a platform agnostic low level API. The idea being package authors can build on top on stream operations as the primitive. I'll dig and see what I can find.

view this post on Zulip Luke Boswell (Dec 05 2023 at 09:26):

There is an example in Module Params proposal that includes something similar in the Sandboxing and polyfill section.

view this post on Zulip Luke Boswell (Dec 05 2023 at 09:30):

I think the WASI filesystem design is very similar API and could be implemented in Roc without any issues.

view this post on Zulip Luke Boswell (Dec 05 2023 at 09:33):

The file descriptor can be an Opaque type, and I guess the platform could handle automatically cleaning things up.

view this post on Zulip Luke Boswell (Dec 05 2023 at 09:53):

I guess another key aspect to this is the effect inerpreter. My lay man's understanding is the host builds a runtime that executes the task or effect descriptions from Roc. This part of the platforms host would be responsible for managing the resources like file descriptors, which I imagine could be represented by an index into a list of handles in the host, and in Roc application is just an Opaque type, so the app never accesses the raw file descriptor. Instead the app creates a Task passes the descriptor back to the platform which in turn unwraps the index and returns that to the host. You could provide a Task to close a file which would destroy a handle, and I imagine any future Tasks like read or write would fail.

My apologies if I've butchered this description.

view this post on Zulip joshi (Dec 05 2023 at 11:58):

I like openFile : Str, (Handle -> Task a b) -> Task a b as a design, and I think it might fit roc quite well, since you could use backpassing:

fd <- openFile "archive.zip"
entries <- readZipDictionary fd |> Task.await
# ...
# file automatically closed at the end of this scope

The platform can automatically close the file when the callback returns, and check that its refcount at that point is 0, so the user didn't try to make the fd escape. Ocaml has (proposed) uniqueness annotations for those kinds of things, but I think roc can do that without any type-level magic! I don't know how hard it is to keep track of those things inside of the Tasks though, especially if Task becomes a built-in type

view this post on Zulip joshi (Dec 05 2023 at 12:19):

but I think roc can do that without any type-level magic

On the other hand, doesn't the compiler already keep track of that information somehow on order to be able to remove refcount increments/decrements? unique ~~ an argument whose refcount is guaranteed to be decremented by the function?

view this post on Zulip timotree (Dec 05 2023 at 13:31):

Maybe this is what Luke was saying, but the callback approach looks the same as a direct task approach openFile : Path -> Task FileHandle e composed with Task.await. Is there a way to put the cleanup logic in Task.await instead so you can have a more uniform API?

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:36):

I think of Str, (Handle -> Task a FileErr) -> Task a FileErr compared to Str -> Task Handle FileErr as sort of "inlining the await" if that makes sense - it means you don't have to |> Task.await that call because the await is baked into the original call

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:40):

I thought about using this in a couple of places, but I decided against it because:

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:42):

Asier Elorz (he/him) said:

Languages like C++ and Rust solve this by encapsulating the resource in a class with a destructor or Drop trait that will automatically run cleanup code when the object goes out of scope. Roc manages memory automatically, but as far as I know (please do correct me if I am wrong) doesn't let the user write arbitrary cleanup procedures.

so it turns out that an equivalent of Drop can be done in the host today through clever use of roc_alloc and roc_dealloc

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:44):

the basic design is that you have some Task that opens a file handle/descriptor/stream (whatever you decide to call it! I'll just call it "fd" here because it's shortest, although it's probably not the best name for a real Roc API) and that fd has a Roc type of Box I32 - meaning, it's the underlying OS file descriptor (the I32) but, importantly, it's boxed on the heap

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:44):

in the host, it doesn't get allocated on the normal heap though. Instead, it gets allocated into a separate region of memory that's dedicated to storing only file descriptors

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:45):

so because it's a Box, Roc will automatically reference count it, and then when the reference count gets to 0, it calls roc_dealloc on it as normal

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:45):

then the host's roc_dealloc function looks at the address it was told to deallocate. If that address is in the range of the special file descriptor range, then it knows "aha, this is a file descriptor Box we have here" and it can go and fclose the integer before freeing up the slot in memory

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:47):

so it's not a first-class Drop language feature, but it's a way that you can get the behavior of "file descriptors always get closed as soon as they are no longer referenced anywhere," which is what Drop gets you in that context anyway

view this post on Zulip Asier Elorz (he/him) (Dec 05 2023 at 13:48):

This makes a lot of sense. I hadn't thought about that way of implementing destructors at the platform level. It's quite clever. Thanks for the detailed answer!

view this post on Zulip Richard Feldman (Dec 05 2023 at 13:50):

absolutely, thanks for diving into this! :smiley:

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 14:48):

@Richard Feldman Could roc_dealloc run some Roc code before freeing the resource? This could be useful for protocols implemented in pure Roc (such as Postgres).
The platform only exposes a TCP effect so it knows how to close it, but it doesn't know the protocol-specific graceful termination procedure.
This would work great, if I can give a "cleanup" callback to the initial connect function that allows me to send some messages before shutting it down for real.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 14:49):

My guess is that roc_dealloc is supposed to be sync, so maybe it couldn't just run a Task, but maybe it can add it to some sort of queue?

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 17:16):

I would feel exceptionally uncomfortable if roc_dealloc could touch tasks at all

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 17:16):

It means that tasks could run randomly in pure sections of code cause they drop a value

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 17:17):

Maybe that would be hidden and ok, but it kinda allows tasks anywhere and feels like it could be abused.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:16):

That shouldn’t affect running pure functions since their input is already established

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:20):

I do see your point, though. If misused this could make it hard to track down why an effect is occurring

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:21):

That said, closing a file or a connection also is an effect, and with the suggested approach, it would indeed run in pure sections of code cause they drop a value

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:25):

If we are ok with one, shouldn’t we be ok with the other?

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:28):

I am ok with that happen in the platform side. Fundamentally the platform can do anything.

But I don't like is someone could write

ConsumeLine := {} implements [
    Drop {dropTask}
]

dropTask = Stdin.line |> Task.await \_ -> {}



SomePureFunc : OtherData, ConsumeLine -> Maybe ConsumeLine
SomePureFunc = \data, cl ->
    if NeedsLineCleared data then
        None
    else
        Some cl

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:32):

Do we even need language support for something like this? I’m just thinking you’d give the platform a Roc callback and it would just choose to call it as part of it being able to do anything

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:33):

Oh, when you open a tcp stream, you also pass a graceful closure callback (or at some point you call something to add a graceful closure callback)?

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:38):

Yeah, exactly. As part of the original task that opens it.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:41):

Yeah, I would be totally for that.

I guess I just misunderstood your original prompt.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:42):

Though I guess you need to be careful of closure capture keeping something alive and stopping it from every being deallocated

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:47):

Would that work with postgres? do you need to capture anything? assuming the tcp connection that is about to be cleared is passed into the lambda

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:49):

I need to send a Terminate message. I need the connection to do that, so the platforms needs to delay closing until my task is resolved.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:50):

I wouldn’t be capturing the connection, I guess I would get it as an argument

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:51):

I couldn’t even capture it because I would be defining this function before the connection is returned from the “open” task

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:54):

In the case of Postgres, the termination message doesn’t need to provide any information that was established during the lifetime of the connection. It’s always the same.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:56):

Yeah, that last point is my biggest concern, but not sure how common it is.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:56):

Also, I guess there would be a weird case were you never connect but still send a termination message, but that probably doesn't really matter.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:58):

I guess if you really need state, eventually we will have Stored and you can give the platform a Task instead of a closure. That task can called Stored, load some state, and then generate a message as needed from that.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:58):

Couldn’t the platform just skip calling the function in that case? Presumably it wouldn’t even store the pointer to it if it fails to connect

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 19:59):

Yeah, Stored would work. Or any other stateful effect that the platform might provide.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 19:59):

Couldn’t the platform just skip calling the function in that case? Presumably it wouldn’t even store the pointer to it if it fails to connect

Not sure this truly applies to postgres, I mean some sort of state where you have a working tcp connection but haven't communicated with postgres at all. Then for some reason you drop the connection and the function runs. So from the postgres server side, it would see a tcp connection opening, a termination message, and a tcp connection closing.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 20:00):

Ah, I see. Yeah, that could happen.

view this post on Zulip Agus Zubiaga (Dec 05 2023 at 20:01):

I think in the case of Postgres that’s totally valid. You are just terminating before authenticating, but yeah, you’d need some sort of state if you had to avoid that.

view this post on Zulip Richard Feldman (Dec 11 2023 at 17:24):

I think this is possible in an effect interpreters world

view this post on Zulip Richard Feldman (Dec 11 2023 at 17:24):

wouldn't need any special language features

view this post on Zulip Richard Feldman (Dec 11 2023 at 17:25):

the basic idea would be to start with the "host runs some code when a particular thing gets dropped" technique (mentioned earlier)

view this post on Zulip Richard Feldman (Dec 11 2023 at 17:25):

and then assume this API:

Agus Zubiaga said:

I’m just thinking you’d give the platform a Roc callback and it would just choose to call it as part of it being able to do anything

so in the platform's API for opening the TCP connection, you specify a Task to run when it's going to get closed

view this post on Zulip Richard Feldman (Dec 11 2023 at 17:28):

the platform holds onto that task, and then when it goes to close the tcp connection, it can run that task - possibly synchronously (as in, go interpret that state machine entry right away, and don't go back to interpreting the main state machine until it's all done), or possibly by having an async pool of things that can run concurrently (presumably desirable for map2 concurrency anyway) and just add it to that

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 17:35):

Yeah, and with Stored the drop could load extra state if it is needed.


Last updated: Jul 06 2025 at 12:14 UTC