Stream: ideas

Topic: Temp Dir


view this post on Zulip Luke Boswell (Jul 16 2024 at 08:12):

I'd like to have an effect that provides a path to a temporary directory.

I propose we add an effect to basic-cli and basic-webserver like Path.tmpDir : Task Str []_ which calls std::env::temp_dir.

Here is an implementation;

# Path.roc
tmpDir : Task Str []_
tmpDir =
    Effect.tmpDir
    |> Effect.map Ok
    |> InternalTask.fromEffect
# Effect.roc
tmpDir : Effect Str
// lib.rs
#[no_mangle]
pub extern "C" fn roc_fx_tmpDir() -> RocStr {
    format!("{}",std::env::temp_dir().display()).as_str().into()
}

view this post on Zulip Luke Boswell (Jul 16 2024 at 08:13):

If this is acceptable, I can tack onto the refactor-host PRs.

view this post on Zulip Luke Boswell (Jul 16 2024 at 08:14):

Alternatively we might want to put it in Env.tmpDir instead.

view this post on Zulip Luke Boswell (Jul 16 2024 at 09:40):

Ok, added to the branches

view this post on Zulip Kilian Vounckx (Jul 16 2024 at 10:20):

If it is in the Path module, wouldn't it make more sense to have it return Task Path _ instead of Str?

view this post on Zulip Luke Boswell (Jul 16 2024 at 10:22):

I moved it to Env due to an import cycle.

view this post on Zulip Luke Boswell (Jul 16 2024 at 10:25):

I've been staring at basic-cli and basic-webserver a lot lately. I'm feeling too far in the weeds here to give a good objective opinion.

One thing that comes to mind is we switched to e.g. File API taking a Str instead of a Path. I think that was to make it more usable.

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:35):

random thoughts on this!

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:35):

no operating system we target uses UTF-8 for its file names/paths

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:35):

(and Str is UTF-8)

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:36):

for example, valid UNIX paths can contain any byte except 0

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:36):

whereas UTF-8 allows 0 bytes, and also disallows certain sequences of bytes

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:36):

so you can't go from UNIX path to Str (or from Str to UNIX path) without a potential error occurring, which must be checked for

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:42):

fortunately, if I give a Str to a UNIX operation that wants a path, and that Str happens to have a 0 byte in it, the outcome will just be that the UNIX operation treats the Str as ending wherever that 0 byte occurred

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:43):

(there's probably a side discussion to be had there about whether an attacker could use that somehow; we could always guard against it by manually checking Str for 0 bytes before sending it to the OS, but that would be costly...although maybe if we implemented the check entirely in Roc code on the platform, with no host involvement, then LLVM would optimize it away when being used on string literals?)

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:44):

of course, passing a Str is very convenient, because I like to write File.readUtf8 "foo/bar/baz.txt"

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:44):

so that's why the File APIs exist - convenience

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:44):

however, what happens if I'm reading a directory?

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:44):

e.g. I say "UNIX, give me all the filenames in this directory"

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:45):

those filenames very much might not be valid UTF-8

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:45):

it might not be possible to convert them to a Str, because they contain byte sequences that aren't valid UTF-8 (but which are totally valid in UNIX)

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:48):

so what happens if I want to read all the filenames in a directory, and then read the contents of each of those files?

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:48):

well, if the only API we had was to convert them to Str first, then this just wouldn't be possible to docorrectly for some directories

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:49):

we'd get the filename, it wouldn't be convertible to Str, and then we'd be unable to File.readUtf8 it because File.readUtf8 takes a Str (for convenience, but in this case that convenience is preventing us from actually making the program work)

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:49):

this is where Path comes in

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:50):

Path can represent an arbitrary sequence of bytes that we got from the OS, and it also tags it as "I got this from the OS, so I know it's a valid OS path"

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:51):

so when we read the contents of a directory, we always get back Path entries - never Str, because we can't guarantee that what we get back will actually be representable as valid Strs!

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:54):

this is also why Path.display exists - it's so that if you want to display those paths to the user, you can render the Path as a Str with the Unicode Replacement Character (this question mark in a diamond: �) used for any invalid sequences. This Str should only be used for display purposes, because if you give it back to the OS for any operations, they'll fail (or do something different) due to the path itself being different

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:56):

so if I wanted to do "user specifies a directory as a Str, I print out a list of all the files in that directory, and then read their contents" using the current APIs, the way I'd do this is:

  1. Use Dir.list (which takes a Str) to get a List Path of directory entries
  2. Iterate over those and print them all out using Path.display, so the user can see as much of the path as is valid UTF-8, and then Unicode Replacement Characters for the parts that aren't displayable
  3. To read their contents, use Path.readUtf8 instead of File.readUtf8, because in this case what we have are the Paths we got back from Dir.list, not Strs. (If we used the Strs we got back from Path.display, we would have bugs!)

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:58):

incidentally, I've heard of people running into bugs around these edge cases in practice; e.g. old files that were encoded with the old JIS encoding for Japanese (which predated Unicode) just breaking programs that tried to do anything with them

view this post on Zulip Richard Feldman (Jul 16 2024 at 12:59):

presumably because they contained bytes that weren't valid UTF-8 (or Widechar on Windows)

view this post on Zulip Richard Feldman (Jul 16 2024 at 13:00):

bringing it back to tempdir, I think it needs to produce a Path and not a Str because the OS tempdir could be configured to be a path that isn't valid UTF-8

view this post on Zulip Anton (Jul 16 2024 at 13:05):

I'll make the change

view this post on Zulip Richard Feldman (Jul 16 2024 at 13:08):

it's also normal to be able to specify a suffix or something for the tempdir or tempfile name, which we might want to support

view this post on Zulip Richard Feldman (Jul 16 2024 at 13:08):

but maybe not for the first version; we can always add that later


Last updated: Jun 16 2026 at 16:19 UTC