Let's say I'm designing an API for a platform, and my platform offers access to the file system. Most commonly, we would have two functions:
readFile : Path -> Task (List U8) FileError
writeFile : Path, List U8 -> Task {} FileError
This is great. It is safe, easy and enough for many users.
However, there are use cases when it is not enough. A file may be too big to fit into memory. Or it may be a zip or other kind of archive, and I just want to read the index and maybe parts of the content depending on what the index says, or I may want to patch a few bytes from a very big file without having to load it all to memory, write to it in memory and then write back everything again, which would be very wasteful.
These sort of things are usually achieved through APIs like fopen
in the C standard library, where you get a file handle, you work on that file handle and then you free it (fclose
) when you are done. This lets the programmer do much more granular operations on a file than a path based API, but also poses a new problem: the file handle needs to be freed when it is not used anymore.
Languages like C++ and Rust solve this by encapsulating the resource in a class with a destructor or Drop
trait that will automatically run cleanup code when the object goes out of scope. Roc manages memory automatically, but as far as I know (please do correct me if I am wrong) doesn't let the user write arbitrary cleanup procedures.
Which takes me back to the original question. What would be the idiomatic way of designing an API in Roc for such a task? I would very strongly prefer if manual fclose
was not the answer, because it is very error prone, and because statically ensuring no leaks is a solved problem since Bjarne Stroustrup made C with classes in 1980, so Roc should be able to be at least as good.
The best I can think for now is a function that takes another function as a parameter, and this inner function is the only one that will ever have access to the file handle. This way we can open the file, call the user function, close the file, and the handle is never risked to be leaked. This is a common pattern in some Rust APIs, such as std::thread::scope
, where the thread scope object is only exposed within the user's callback function and never leaks outside.
So the signature would be something like:
openFile : Path, (FileHandle -> Task a e) -> Task a e
This would sort of work, I think, but I am not sure that this is the ideal solution.
I am very interested in what you all think about how you would design such an API.
I feel like I've seen a proposal somewhere that has streams in it as an example for a platform agnostic low level API. The idea being package authors can build on top on stream operations as the primitive. I'll dig and see what I can find.
There is an example in Module Params proposal that includes something similar in the Sandboxing and polyfill section.
I think the WASI filesystem design is very similar API and could be implemented in Roc without any issues.
The file descriptor can be an Opaque type, and I guess the platform could handle automatically cleaning things up.
I guess another key aspect to this is the effect inerpreter. My lay man's understanding is the host builds a runtime that executes the task or effect descriptions from Roc. This part of the platforms host would be responsible for managing the resources like file descriptors, which I imagine could be represented by an index into a list of handles in the host, and in Roc application is just an Opaque type, so the app never accesses the raw file descriptor. Instead the app creates a Task passes the descriptor back to the platform which in turn unwraps the index and returns that to the host. You could provide a Task to close a file which would destroy a handle, and I imagine any future Tasks like read or write would fail.
My apologies if I've butchered this description.
I like openFile : Str, (Handle -> Task a b) -> Task a b
as a design, and I think it might fit roc quite well, since you could use backpassing:
fd <- openFile "archive.zip"
entries <- readZipDictionary fd |> Task.await
# ...
# file automatically closed at the end of this scope
The platform can automatically close the file when the callback returns, and check that its refcount at that point is 0, so the user didn't try to make the fd
escape. Ocaml has (proposed) uniqueness annotations for those kinds of things, but I think roc can do that without any type-level magic! I don't know how hard it is to keep track of those things inside of the Tasks though, especially if Task
becomes a built-in type
but I think roc can do that without any type-level magic
On the other hand, doesn't the compiler already keep track of that information somehow on order to be able to remove refcount increments/decrements? unique ~~ an argument whose refcount is guaranteed to be decremented by the function?
Maybe this is what Luke was saying, but the callback approach looks the same as a direct task approach openFile : Path -> Task FileHandle e
composed with Task.await
. Is there a way to put the cleanup logic in Task.await
instead so you can have a more uniform API?
I think of Str, (Handle -> Task a FileErr) -> Task a FileErr
compared to Str -> Task Handle FileErr
as sort of "inlining the await
" if that makes sense - it means you don't have to |> Task.await
that call because the await
is baked into the original call
I thought about using this in a couple of places, but I decided against it because:
<-
should end with a |>
" (to Task.await
most common, but also sometimes things like Result.try
) because now some lines need to end that way but other lines need to not end that way, so lines that are missing it don't jump out as much anymoreawait
, but in the future we'll have other ways to combine tasks concurrently (e.g. "run these two tasks, and then once they have both finished - in any order - then continue"), and inlining await
means that by default you're missing out on those performance benefits, and you actually have to pass in Task.ok
to "cancel out" the inlined await
and then do the more concurrent thingAsier Elorz (he/him) said:
Languages like C++ and Rust solve this by encapsulating the resource in a class with a destructor or
Drop
trait that will automatically run cleanup code when the object goes out of scope. Roc manages memory automatically, but as far as I know (please do correct me if I am wrong) doesn't let the user write arbitrary cleanup procedures.
so it turns out that an equivalent of Drop
can be done in the host today through clever use of roc_alloc
and roc_dealloc
the basic design is that you have some Task
that opens a file handle/descriptor/stream (whatever you decide to call it! I'll just call it "fd" here because it's shortest, although it's probably not the best name for a real Roc API) and that fd has a Roc type of Box I32
- meaning, it's the underlying OS file descriptor (the I32
) but, importantly, it's boxed on the heap
in the host, it doesn't get allocated on the normal heap though. Instead, it gets allocated into a separate region of memory that's dedicated to storing only file descriptors
so because it's a Box
, Roc will automatically reference count it, and then when the reference count gets to 0, it calls roc_dealloc
on it as normal
then the host's roc_dealloc
function looks at the address it was told to deallocate. If that address is in the range of the special file descriptor range, then it knows "aha, this is a file descriptor Box
we have here" and it can go and fclose
the integer before freeing up the slot in memory
so it's not a first-class Drop
language feature, but it's a way that you can get the behavior of "file descriptors always get closed as soon as they are no longer referenced anywhere," which is what Drop
gets you in that context anyway
This makes a lot of sense. I hadn't thought about that way of implementing destructors at the platform level. It's quite clever. Thanks for the detailed answer!
absolutely, thanks for diving into this! :smiley:
@Richard Feldman Could roc_dealloc
run some Roc code before freeing the resource? This could be useful for protocols implemented in pure Roc (such as Postgres).
The platform only exposes a TCP effect so it knows how to close it, but it doesn't know the protocol-specific graceful termination procedure.
This would work great, if I can give a "cleanup" callback to the initial connect function that allows me to send some messages before shutting it down for real.
My guess is that roc_dealloc
is supposed to be sync, so maybe it couldn't just run a Task
, but maybe it can add it to some sort of queue?
I would feel exceptionally uncomfortable if roc_dealloc
could touch tasks at all
It means that tasks could run randomly in pure sections of code cause they drop a value
Maybe that would be hidden and ok, but it kinda allows tasks anywhere and feels like it could be abused.
That shouldn’t affect running pure functions since their input is already established
I do see your point, though. If misused this could make it hard to track down why an effect is occurring
That said, closing a file or a connection also is an effect, and with the suggested approach, it would indeed run in pure sections of code cause they drop a value
If we are ok with one, shouldn’t we be ok with the other?
I am ok with that happen in the platform side. Fundamentally the platform can do anything.
But I don't like is someone could write
ConsumeLine := {} implements [
Drop {dropTask}
]
dropTask = Stdin.line |> Task.await \_ -> {}
SomePureFunc : OtherData, ConsumeLine -> Maybe ConsumeLine
SomePureFunc = \data, cl ->
if NeedsLineCleared data then
None
else
Some cl
Do we even need language support for something like this? I’m just thinking you’d give the platform a Roc callback and it would just choose to call it as part of it being able to do anything
Oh, when you open a tcp stream, you also pass a graceful closure callback (or at some point you call something to add a graceful closure callback)?
Yeah, exactly. As part of the original task that opens it.
Yeah, I would be totally for that.
I guess I just misunderstood your original prompt.
Though I guess you need to be careful of closure capture keeping something alive and stopping it from every being deallocated
Would that work with postgres? do you need to capture anything? assuming the tcp connection that is about to be cleared is passed into the lambda
I need to send a Terminate
message. I need the connection to do that, so the platforms needs to delay closing until my task is resolved.
I wouldn’t be capturing the connection, I guess I would get it as an argument
I couldn’t even capture it because I would be defining this function before the connection is returned from the “open” task
In the case of Postgres, the termination message doesn’t need to provide any information that was established during the lifetime of the connection. It’s always the same.
Yeah, that last point is my biggest concern, but not sure how common it is.
Also, I guess there would be a weird case were you never connect but still send a termination message, but that probably doesn't really matter.
I guess if you really need state, eventually we will have Stored
and you can give the platform a Task
instead of a closure. That task can called Stored
, load some state, and then generate a message as needed from that.
Couldn’t the platform just skip calling the function in that case? Presumably it wouldn’t even store the pointer to it if it fails to connect
Yeah, Stored
would work. Or any other stateful effect that the platform might provide.
Couldn’t the platform just skip calling the function in that case? Presumably it wouldn’t even store the pointer to it if it fails to connect
Not sure this truly applies to postgres, I mean some sort of state where you have a working tcp connection but haven't communicated with postgres at all. Then for some reason you drop the connection and the function runs. So from the postgres server side, it would see a tcp connection opening, a termination message, and a tcp connection closing.
Ah, I see. Yeah, that could happen.
I think in the case of Postgres that’s totally valid. You are just terminating before authenticating, but yeah, you’d need some sort of state if you had to avoid that.
I think this is possible in an effect interpreters world
wouldn't need any special language features
the basic idea would be to start with the "host runs some code when a particular thing gets dropped" technique (mentioned earlier)
and then assume this API:
Agus Zubiaga said:
I’m just thinking you’d give the platform a Roc callback and it would just choose to call it as part of it being able to do anything
so in the platform's API for opening the TCP connection, you specify a Task
to run when it's going to get closed
the platform holds onto that task, and then when it goes to close the tcp connection, it can run that task - possibly synchronously (as in, go interpret that state machine entry right away, and don't go back to interpreting the main
state machine until it's all done), or possibly by having an async pool of things that can run concurrently (presumably desirable for map2
concurrency anyway) and just add it to that
Yeah, and with Stored
the drop could load extra state if it is needed.
Last updated: Jul 06 2025 at 12:14 UTC