API design around non-memory resources · beginners

Let's say I'm designing an API for a platform, and my platform offers access to the file system. Most commonly, we would have two functions:

readFile : Path -> Task (List U8) FileError
writeFile : Path, List U8 -> Task {} FileError

However, there are use cases when it is not enough. A file may be too big to fit into memory. Or it may be a zip or other kind of archive, and I just want to read the index and maybe parts of the content depending on what the index says, or I may want to patch a few bytes from a very big file without having to load it all to memory, write to it in memory and then write back everything again, which would be very wasteful.

These sort of things are usually achieved through APIs like fopen in the C standard library, where you get a file handle, you work on that file handle and then you free it (fclose) when you are done. This lets the programmer do much more granular operations on a file than a path based API, but also poses a new problem: the file handle needs to be freed when it is not used anymore.

Languages like C++ and Rust solve this by encapsulating the resource in a class with a destructor or Drop trait that will automatically run cleanup code when the object goes out of scope. Roc manages memory automatically, but as far as I know (please do correct me if I am wrong) doesn't let the user write arbitrary cleanup procedures.

Which takes me back to the original question. What would be the idiomatic way of designing an API in Roc for such a task? I would very strongly prefer if manual fclose was not the answer, because it is very error prone, and because statically ensuring no leaks is a solved problem since Bjarne Stroustrup made C with classes in 1980, so Roc should be able to be at least as good.

The best I can think for now is a function that takes another function as a parameter, and this inner function is the only one that will ever have access to the file handle. This way we can open the file, call the user function, close the file, and the handle is never risked to be leaked. This is a common pattern in some Rust APIs, such as std::thread::scope, where the thread scope object is only exposed within the user's callback function and never leaks outside.

openFile : Path, (FileHandle -> Task a e) -> Task a e

This would sort of work, I think, but I am not sure that this is the ideal solution.

I am very interested in what you all think about how you would design such an API.

Luke Boswell (Dec 05 2023 at 09:21):

I feel like I've seen a proposal somewhere that has streams in it as an example for a platform agnostic low level API. The idea being package authors can build on top on stream operations as the primitive. I'll dig and see what I can find.

Luke Boswell (Dec 05 2023 at 09:26):

There is an example in Module Params proposal that includes something similar in the Sandboxing and polyfill section.

Luke Boswell (Dec 05 2023 at 09:30):

I think the WASI filesystem design is very similar API and could be implemented in Roc without any issues.

Luke Boswell (Dec 05 2023 at 09:33):

The file descriptor can be an Opaque type, and I guess the platform could handle automatically cleaning things up.

Luke Boswell (Dec 05 2023 at 09:53):

I guess another key aspect to this is the effect inerpreter. My lay man's understanding is the host builds a runtime that executes the task or effect descriptions from Roc. This part of the platforms host would be responsible for managing the resources like file descriptors, which I imagine could be represented by an index into a list of handles in the host, and in Roc application is just an Opaque type, so the app never accesses the raw file descriptor. Instead the app creates a Task passes the descriptor back to the platform which in turn unwraps the index and returns that to the host. You could provide a Task to close a file which would destroy a handle, and I imagine any future Tasks like read or write would fail.

joshi (Dec 05 2023 at 11:58):

I like openFile : Str, (Handle -> Task a b) -> Task a b as a design, and I think it might fit roc quite well, since you could use backpassing:

fd <- openFile "archive.zip"
entries <- readZipDictionary fd |> Task.await
# ...
# file automatically closed at the end of this scope

The platform can automatically close the file when the callback returns, and check that its refcount at that point is 0, so the user didn't try to make the fd escape. Ocaml has (proposed) uniqueness annotations for those kinds of things, but I think roc can do that without any type-level magic! I don't know how hard it is to keep track of those things inside of the Tasks though, especially if Task becomes a built-in type

joshi (Dec 05 2023 at 12:19):

On the other hand, doesn't the compiler already keep track of that information somehow on order to be able to remove refcount increments/decrements? unique ~~ an argument whose refcount is guaranteed to be decremented by the function?

timotree (Dec 05 2023 at 13:31):

Maybe this is what Luke was saying, but the callback approach looks the same as a direct task approach openFile : Path -> Task FileHandle e composed with Task.await. Is there a way to put the cleanup logic in Task.await instead so you can have a more uniform API?

Richard Feldman (Dec 05 2023 at 13:36):

I think of Str, (Handle -> Task a FileErr) -> Task a FileErr compared to Str -> Task Handle FileErr as sort of "inlining the await" if that makes sense - it means you don't have to |> Task.await that call because the await is baked into the original call

Richard Feldman (Dec 05 2023 at 13:40):

I thought about using this in a couple of places, but I decided against it because:

Richard Feldman (Dec 05 2023 at 13:42):

so it turns out that an equivalent of Drop can be done in the host today through clever use of roc_alloc and roc_dealloc

Richard Feldman (Dec 05 2023 at 13:44):

the basic design is that you have some Task that opens a file handle/descriptor/stream (whatever you decide to call it! I'll just call it "fd" here because it's shortest, although it's probably not the best name for a real Roc API) and that fd has a Roc type of Box I32 - meaning, it's the underlying OS file descriptor (the I32) but, importantly, it's boxed on the heap

Richard Feldman (Dec 05 2023 at 13:44):

in the host, it doesn't get allocated on the normal heap though. Instead, it gets allocated into a separate region of memory that's dedicated to storing only file descriptors

Richard Feldman (Dec 05 2023 at 13:45):

so because it's a Box, Roc will automatically reference count it, and then when the reference count gets to 0, it calls roc_dealloc on it as normal

Richard Feldman (Dec 05 2023 at 13:45):

then the host's roc_dealloc function looks at the address it was told to deallocate. If that address is in the range of the special file descriptor range, then it knows "aha, this is a file descriptor Box we have here" and it can go and fclose the integer before freeing up the slot in memory

Richard Feldman (Dec 05 2023 at 13:47):

so it's not a first-class Drop language feature, but it's a way that you can get the behavior of "file descriptors always get closed as soon as they are no longer referenced anywhere," which is what Drop gets you in that context anyway

Asier Elorz (he/him) (Dec 05 2023 at 13:48):

This makes a lot of sense. I hadn't thought about that way of implementing destructors at the platform level. It's quite clever. Thanks for the detailed answer!

Richard Feldman (Dec 05 2023 at 13:50):

Agus Zubiaga (Dec 05 2023 at 14:48):

@Richard Feldman Could roc_dealloc run some Roc code before freeing the resource? This could be useful for protocols implemented in pure Roc (such as Postgres).
The platform only exposes a TCP effect so it knows how to close it, but it doesn't know the protocol-specific graceful termination procedure.
This would work great, if I can give a "cleanup" callback to the initial connect function that allows me to send some messages before shutting it down for real.

Agus Zubiaga (Dec 05 2023 at 14:49):

My guess is that roc_dealloc is supposed to be sync, so maybe it couldn't just run a Task, but maybe it can add it to some sort of queue?

Brendan Hansknecht (Dec 05 2023 at 17:16):

I would feel exceptionally uncomfortable if roc_dealloc could touch tasks at all

Brendan Hansknecht (Dec 05 2023 at 17:16):

It means that tasks could run randomly in pure sections of code cause they drop a value

Brendan Hansknecht (Dec 05 2023 at 17:17):

Maybe that would be hidden and ok, but it kinda allows tasks anywhere and feels like it could be abused.

Agus Zubiaga (Dec 05 2023 at 19:16):

That shouldn’t affect running pure functions since their input is already established

Agus Zubiaga (Dec 05 2023 at 19:20):

I do see your point, though. If misused this could make it hard to track down why an effect is occurring

Agus Zubiaga (Dec 05 2023 at 19:21):

That said, closing a file or a connection also is an effect, and with the suggested approach, it would indeed run in pure sections of code cause they drop a value

Agus Zubiaga (Dec 05 2023 at 19:25):

Brendan Hansknecht (Dec 05 2023 at 19:28):

I am ok with that happen in the platform side. Fundamentally the platform can do anything.

ConsumeLine := {} implements [
    Drop {dropTask}
]

dropTask = Stdin.line |> Task.await \_ -> {}



SomePureFunc : OtherData, ConsumeLine -> Maybe ConsumeLine
SomePureFunc = \data, cl ->
    if NeedsLineCleared data then
        None
    else
        Some cl

Agus Zubiaga (Dec 05 2023 at 19:32):

Do we even need language support for something like this? I’m just thinking you’d give the platform a Roc callback and it would just choose to call it as part of it being able to do anything

Brendan Hansknecht (Dec 05 2023 at 19:33):

Oh, when you open a tcp stream, you also pass a graceful closure callback (or at some point you call something to add a graceful closure callback)?

Agus Zubiaga (Dec 05 2023 at 19:38):

Brendan Hansknecht (Dec 05 2023 at 19:41):

Brendan Hansknecht (Dec 05 2023 at 19:42):

Though I guess you need to be careful of closure capture keeping something alive and stopping it from every being deallocated

Brendan Hansknecht (Dec 05 2023 at 19:47):

Would that work with postgres? do you need to capture anything? assuming the tcp connection that is about to be cleared is passed into the lambda

Agus Zubiaga (Dec 05 2023 at 19:49):

I need to send a Terminate message. I need the connection to do that, so the platforms needs to delay closing until my task is resolved.

Agus Zubiaga (Dec 05 2023 at 19:50):

Agus Zubiaga (Dec 05 2023 at 19:51):

I couldn’t even capture it because I would be defining this function before the connection is returned from the “open” task

Agus Zubiaga (Dec 05 2023 at 19:54):

In the case of Postgres, the termination message doesn’t need to provide any information that was established during the lifetime of the connection. It’s always the same.

Brendan Hansknecht (Dec 05 2023 at 19:56):

Also, I guess there would be a weird case were you never connect but still send a termination message, but that probably doesn't really matter.

Brendan Hansknecht (Dec 05 2023 at 19:58):

I guess if you really need state, eventually we will have Stored and you can give the platform a Task instead of a closure. That task can called Stored, load some state, and then generate a message as needed from that.

Agus Zubiaga (Dec 05 2023 at 19:58):

Couldn’t the platform just skip calling the function in that case? Presumably it wouldn’t even store the pointer to it if it fails to connect

Agus Zubiaga (Dec 05 2023 at 19:59):

Yeah, Stored would work. Or any other stateful effect that the platform might provide.

Brendan Hansknecht (Dec 05 2023 at 19:59):

Not sure this truly applies to postgres, I mean some sort of state where you have a working tcp connection but haven't communicated with postgres at all. Then for some reason you drop the connection and the function runs. So from the postgres server side, it would see a tcp connection opening, a termination message, and a tcp connection closing.

Agus Zubiaga (Dec 05 2023 at 20:00):

Agus Zubiaga (Dec 05 2023 at 20:01):

I think in the case of Postgres that’s totally valid. You are just terminating before authenticating, but yeah, you’d need some sort of state if you had to avoid that.

Richard Feldman (Dec 11 2023 at 17:24):

Richard Feldman (Dec 11 2023 at 17:25):

the basic idea would be to start with the "host runs some code when a particular thing gets dropped" technique (mentioned earlier)

Richard Feldman (Dec 11 2023 at 17:25):

so in the platform's API for opening the TCP connection, you specify a Task to run when it's going to get closed

Richard Feldman (Dec 11 2023 at 17:28):

the platform holds onto that task, and then when it goes to close the tcp connection, it can run that task - possibly synchronously (as in, go interpret that state machine entry right away, and don't go back to interpreting the main state machine until it's all done), or possibly by having an async pool of things that can run concurrently (presumably desirable for map2 concurrency anyway) and just add it to that

Stream: beginners

Topic: API design around non-memory resources

Asier Elorz (he/him) (Dec 05 2023 at 08:31):

Luke Boswell (Dec 05 2023 at 09:21):

Luke Boswell (Dec 05 2023 at 09:26):

Luke Boswell (Dec 05 2023 at 09:30):

Luke Boswell (Dec 05 2023 at 09:33):

Luke Boswell (Dec 05 2023 at 09:53):

joshi (Dec 05 2023 at 11:58):

joshi (Dec 05 2023 at 12:19):

timotree (Dec 05 2023 at 13:31):

Richard Feldman (Dec 05 2023 at 13:36):

Richard Feldman (Dec 05 2023 at 13:40):

Richard Feldman (Dec 05 2023 at 13:42):

Richard Feldman (Dec 05 2023 at 13:44):

Richard Feldman (Dec 05 2023 at 13:44):

Richard Feldman (Dec 05 2023 at 13:45):

Richard Feldman (Dec 05 2023 at 13:45):

Richard Feldman (Dec 05 2023 at 13:47):

Asier Elorz (he/him) (Dec 05 2023 at 13:48):

Richard Feldman (Dec 05 2023 at 13:50):

Agus Zubiaga (Dec 05 2023 at 14:48):

Agus Zubiaga (Dec 05 2023 at 14:49):

Brendan Hansknecht (Dec 05 2023 at 17:16):

Brendan Hansknecht (Dec 05 2023 at 17:16):

Brendan Hansknecht (Dec 05 2023 at 17:17):

Agus Zubiaga (Dec 05 2023 at 19:16):

Agus Zubiaga (Dec 05 2023 at 19:20):

Agus Zubiaga (Dec 05 2023 at 19:21):

Agus Zubiaga (Dec 05 2023 at 19:25):

Brendan Hansknecht (Dec 05 2023 at 19:28):

Agus Zubiaga (Dec 05 2023 at 19:32):

Brendan Hansknecht (Dec 05 2023 at 19:33):

Agus Zubiaga (Dec 05 2023 at 19:38):

Brendan Hansknecht (Dec 05 2023 at 19:41):

Brendan Hansknecht (Dec 05 2023 at 19:42):

Brendan Hansknecht (Dec 05 2023 at 19:47):

Agus Zubiaga (Dec 05 2023 at 19:49):

Agus Zubiaga (Dec 05 2023 at 19:50):

Agus Zubiaga (Dec 05 2023 at 19:51):

Agus Zubiaga (Dec 05 2023 at 19:54):

Brendan Hansknecht (Dec 05 2023 at 19:56):

Brendan Hansknecht (Dec 05 2023 at 19:56):

Brendan Hansknecht (Dec 05 2023 at 19:58):

Agus Zubiaga (Dec 05 2023 at 19:58):

Agus Zubiaga (Dec 05 2023 at 19:59):

Brendan Hansknecht (Dec 05 2023 at 19:59):

Agus Zubiaga (Dec 05 2023 at 20:00):

Agus Zubiaga (Dec 05 2023 at 20:01):

Richard Feldman (Dec 11 2023 at 17:24):

Richard Feldman (Dec 11 2023 at 17:24):

Richard Feldman (Dec 11 2023 at 17:25):

Richard Feldman (Dec 11 2023 at 17:25):

Richard Feldman (Dec 11 2023 at 17:28):

Brendan Hansknecht (Dec 11 2023 at 17:35):