✔ randomness in Task · ideas · Zulip Chat Archive

one of the most frequent examples of shadowing that comes up is threading around seeds for random numbers. Threading the seeds can be avoided by using a wrapper (e.g. Random.Generator) but this doesn't work very well in conjunction with Task.

something we haven't talked about (that I can remember) is the idea of making Random a builtin, and giving incorporating it into Task, e.g.

Random.generate : Generator a -> Task a *
Random.setSeed : U64 -> Task {} *

we could set an initial seed using the trick @Brendan Hansknecht came up with for Dict's hash flooding defense: use a pointer into the binary and ASLR

Richard Feldman (Dec 10 2023 at 21:21):

this would also probably be easier to use for new users, because it's pretty easy to understand what this is doing:

randomI64 <- Random.generate Random.i64 |> Task.await

Richard Feldman (Dec 10 2023 at 21:22):

and with setSeed you could also make it deterministic for tests or reproducing (as long as you weren't using it in a multithreaded context)

Richard Feldman (Dec 10 2023 at 21:22):

if you needed it to be deterministic given a particular seed across multiple threads, then you'd be back in today's status quo world of tradeoffs, but that seems like it would apply to a very small percentage of use cases

Richard Feldman (Dec 10 2023 at 21:24):

the perf would be slightly worse than doing it today's way because Random.generate and Random.setSeed would need to do atomic loads and stores on the global seed

Richard Feldman (Dec 10 2023 at 21:26):

assuming we did this, it seems like it could noticeably reduce demand for shadowing in application code, although not necessarily in library code

Brendan Hansknecht (Dec 10 2023 at 21:30):

Brendan Hansknecht (Dec 10 2023 at 21:31):

But yeah, would work nice for cases of only needing 1 RNG (which is most cases).

Brendan Hansknecht (Dec 10 2023 at 21:31):

Richard Feldman (Dec 10 2023 at 21:32):

possible, although I think it could get weird because platforms can put Roc callbacks on different threads

Richard Feldman (Dec 10 2023 at 21:32):

so you might get some really surprising behavior around that depending on the platform, especially if libraries did it

Richard Feldman (Dec 10 2023 at 21:32):

e.g. one time in Elm I used task-based randomness in a library to put a UUID on a http request

Brendan Hansknecht (Dec 10 2023 at 21:33):

Brendan Hansknecht (Dec 10 2023 at 21:34):

Richard Feldman (Dec 10 2023 at 21:34):

also I'm not sure if compiling to threadlocals might affect Roc's portability; right now we don't explicitly depend on OS libraries or libc things from a design perspective (I believe we either can or already do compile down to just CPU instructions with no dependencies other than roc_* functions in the host), but I think threadlocals might require some integration there (I'm not sure)

Brendan Hansknecht (Dec 10 2023 at 21:34):

Richard Feldman (Dec 10 2023 at 21:34):

Brendan Hansknecht (Dec 10 2023 at 21:36):

I think thread local is part of the binary. Not a system call. Just 1 memory location per thread. But I don't recall fully. Would need to double check/dig in. I am pretty sure it is dealt with automatically by the loader though.

Richard Feldman (Dec 10 2023 at 21:36):

yeah I just think thread storage is an OS concept that the CPU doesn't know about

Richard Feldman (Dec 10 2023 at 21:36):

Brendan Hansknecht (Dec 10 2023 at 21:37):

Yeah, os concept, but dealt with by the loader when the OS makes the thread. So roc doesn't have to do anything special just mark the memory area as thread local. That at least was my understanding. Basically a flag in the binary for a specific memory region

Richard Feldman (Dec 10 2023 at 21:39):

Richard Feldman (Dec 10 2023 at 21:40):

although I guess if it's a builtin, we could compile it to something different depending on target OS

Brendan Hansknecht (Dec 10 2023 at 21:54):

Brendan Hansknecht (Dec 10 2023 at 21:55):

We would need to update the surgical linker to deal with the thread local storage sections, but otherwise should just work I think.

Brendan Hansknecht (Dec 10 2023 at 21:55):

Brendan Hansknecht (Dec 10 2023 at 21:57):

hmm, though maybe some extra complexities that will be a pain and handled differently depending on the OS. But I think zig would deal with those details and we would just have to deal with proper final linkage.

Isaac Van Doren (Dec 10 2023 at 23:17):

Couldn't you still use the current task-less approach if you didn't want a task?

Richard Feldman (Dec 10 2023 at 23:28):

Brendan Hansknecht (Dec 10 2023 at 23:38):

For this task version, I assume we are going to pick a fast in roc RNG? Do we need to pick a giant or slow one for extra statistical backing or or a small and fast but seems statistically good one fine?

Brendan Hansknecht (Dec 10 2023 at 23:39):

Richard Feldman (Dec 10 2023 at 23:47):

Brendan Hansknecht (Dec 11 2023 at 00:00):

I found the pcg author tends to be less belligerent, but that doesn't say much about the algorithms.

Brendan Hansknecht (Dec 11 2023 at 00:03):

I never use random number generators in a way that this bias matters, but to my understanding, pcg falls in a really nice balance of statically safe, fast, and small.

Brendan Hansknecht (Dec 11 2023 at 00:07):

Note: when reading about issues and debates on these generators, you have to be really careful about which exact version of each generator people are talking about. For example, I think the PCG ext variants are deprecated because they have issues (they were only ever added for more speed at the cost of quality I think), but people who talk against PCG tend to bring them up even though no one recommends them at all.

Kevin Gillette (Dec 11 2023 at 01:04):

Agreed that thread-local might get tricky, but it seems like it could be worth having some kind of opaque Roc context (or a pointer to such data) that is passed to the platform and which is expected for the platform to pass back. Among the data in that context could be a random seed

Kevin Gillette (Dec 11 2023 at 01:07):

It wouldn't be os-thread-local, but could correspond to the semantic flow control of the code.

Richard Feldman (Dec 11 2023 at 01:23):

thinking about it some more, it might be a bad idea to have Random.setSeed at all

Richard Feldman (Dec 11 2023 at 01:24):

because as previously noted, the determinism basically won't be reliable no matter what, because the platform can run Roc functions on different threads

Richard Feldman (Dec 11 2023 at 01:24):

Richard Feldman (Dec 11 2023 at 01:25):

so really it should just be Random.generate which generates you a random thing, and the seed it uses is entirely behind the scenes

Richard Feldman (Dec 11 2023 at 01:29):

at which point we could use a threadlocal or global with atomics (or even change from one to the other) and end users couldn't tell the difference

Brendan Hansknecht (Dec 11 2023 at 01:32):

I like that idea. Also, just an impl detail, but probably a lock if it is a global and not atomics due to the multitude of operations that must be done.

Brendan Hansknecht (Dec 11 2023 at 01:33):

I just imagine two rngs on different threads running at the same time on the atomic state.

Brendan Hansknecht (Dec 11 2023 at 01:33):

Sky Rose (Dec 11 2023 at 02:51):

An observation: PRNG as a Task seems a lot like Stored: It's stateful / an effect that has to be handled with a Task outside the application's pure Roc code, but it's not IO that has to go through the platform, so Roc can handle it itself. Maybe you could even implement it in terms of Stored by setting/getting the seed.

Sky Rose (Dec 11 2023 at 02:52):

That makes me wonder: how is Stored going to handle multiple threads / locking / determinism, and can this PRNG idea and Stored handle threading in the same way?

Brendan Hansknecht (Dec 11 2023 at 03:18):

That a good point. It would be trivial to use Stored to load a prng state. Something like:

generate = \{} ->
    state <- Stored.get prngStateKey |> Task.await
    (next, rand) = step state
    _ <- Stored.store prngStateKey next |> Task.await
    Task.ok rand

Brendan Hansknecht (Dec 11 2023 at 03:21):

With the current plans for Stored, I think it would be planned to be a Single writer multi reader protected global variable.

Brendan Hansknecht (Dec 11 2023 at 03:21):

Brendan Hansknecht (Dec 11 2023 at 03:31):

Also, cause each key require a compile time know unique type, it would be the case the each Stored key maps to exactly 1 global variable that is protected by a multi reader single writer lock of some sort, probably.

Brendan Hansknecht (Dec 11 2023 at 03:32):

Also, I just realized that my example above is broken. If you call generate at the same time in two different threads, they would both grab the same version of the state. Would need something like Stored.update that takes a lambda and holds the lock until the new value is saved.

Brendan Hansknecht (Dec 11 2023 at 03:35):

That would be really terrible perf for random number generation. Just multithreads constantly fighting over the same state. So for this either want a thread local using a custom Random.generate, or a Stored value per thread (not sure how that would be supported in practice)

Richard Feldman (Dec 11 2023 at 03:45):

another problem with Stored is that Stored has no way to be initialized to a random value

Richard Feldman (Dec 11 2023 at 03:46):

you'd still need to come up with a random initial seed, which makes it less ergonomic than Random.generate - which can immediately give you a pseudorandom value the first time you call it, and it should be different when you rerun the program thanks to ASLR

Richard Feldman (Dec 11 2023 at 03:46):

(but also we can make it be deterministic based on e.g. the hash of the source code when you run it in tests, at least in the future when we have tests that can run tasks!)

Richard Feldman (Dec 15 2023 at 13:19):

hm, I realized that if we have Random.generate but not Random.setSeed, then it's not possible to create deterministic simulation tests involving Random.generate

Richard Feldman (Dec 15 2023 at 13:23):

Brendan Hansknecht (Dec 15 2023 at 15:30):

Unless you also force them throughexpect-fx and are required to mock them. Maybe even record and save to mock. That should be fine

Kevin Gillette (Dec 15 2023 at 15:54):

I wasn't able to see anything about random in the builtins or cli/webserver docs. For reference, do we have a link to what we have today?

Brendan Hansknecht (Dec 15 2023 at 15:59):

Today there exists a roc-random package written fully in roc that implements pcg

Brendan Hansknecht (Dec 15 2023 at 16:00):

It would have to be seeded with the time or something cause there is no way to seed (with random data) in pure roc

Kevin Gillette (Dec 15 2023 at 16:01):

Brendan Hansknecht (Dec 15 2023 at 16:07):

Hey Richard, thinking about random more, I don't think we want to seed it the same way as we seed dictionaries. Depending on how the final linking is done with the host, the final binary may not be PIE. (Most likely would happen for any sort of future embedded work, but maybe someone consuming an object file would disable that). Also, not sure if wasm gets ASLR.

For dict, this isn't that big of a deal. Seed is random per compilation which is still a great defense from anyone who doesn't have a copy of your binary. With random on the other hand, your app would do the same thing every run in that case.

Kevin Gillette (Dec 15 2023 at 16:14):

I can see that it might be tricky to use with tasks unless given specific support to that effect, but can we also have a non-task approach for simpler cases (since tasks virally propagate through function signatures, iiuc).

It seems to me that it'd be sufficient for many cases to have a good quality "outer" PRNG that can be periodically asked to be reseeded via effects. That outer PRNG could just essentially provide seeds, via Tasks, to initialize "inner" PRNGs. Those inner PRNGs all have unique initial seeds, thus unique walks, and are the means to prevent any two invocations of the same function from observing the same values.

The inner PRNG would be passed into the functions that use it, and could either be used task-style or output-seed-passing-style.

Technically, an app function could return a seed for use elsewhere, but I believe it'll be more convenient and lead to better code to just create inner PRNGs at the deepest point in the stack in which tasks are already available, and then just pass those down and throw them away (i.e. consider them cheap).

For testing/reproducible debugging, the outer seed could be explicitly set (though ideally the platform would seed it by default).

Brendan Hansknecht (Dec 15 2023 at 16:24):

Yeah, I think the main need for the task api is due to the annoyance of threading PRNGs and Tasks at the same time.

Brendan Hansknecht (Dec 15 2023 at 16:25):

After that, I think either api is fine, but it is much nicer to contain tasks where possible and not leak them down the stack to an arbitrary function that needs a few random numbers

Kevin Gillette (Dec 15 2023 at 16:28):

If the non-task random functions return something like a (newPRNG, generatedValue) tuple, since there's already thought about shadowing sigils and such, we could have something like:

randIntFloat : PRNG -> (I64, F64)
randIntFloat = \prng ->
    int = randInt &prng
    float = randFloat &prng
    (int, float)

Where & (or whatever) means "declare a same-named, shadowed variable using the first tuple result of a twople, passing the second result on."

y = f &x |> g

# ^^ is sugar for:

(x, tmp) = f x
y = g tmp

Brendan Hansknecht (Dec 15 2023 at 16:33):

I think that is very specific and would be a hard sell as it definitely feels very magical, but it would help with threading. To be fair, I think shadowing alone fixes the threading issue.

(prng, x) = randInt prg

y <- SomeTask |> Task.await

(prng, z) = randFloat prg

That said, a major point of this discussion is that it might remove the need for shadowing due to making PRNG nice without it.

Kevin Gillette (Dec 15 2023 at 16:34):

Random per compilation doesn't seem that fantastic for stable/not-actively-developed projects, which does happen (COBOL is still in widespread use after all). Surely we could encourage platforms to opt-into providing some initial entropy, and use it with the compile-specific value via something like a fair-coin algorithm to get something pretty robust, even in less-trusted environments?

Brendan Hansknecht (Dec 15 2023 at 16:37):

Normally, it wouldn't be per compilation, only would be so in special edge cases where ASLR doesn't exist (very rare in the modern day)

Kevin Gillette (Dec 15 2023 at 16:37):

Agreed that shadowing with tuples isn't all that bad. The magic above would help more if pipelines are used to further manipulate or use the generated value (as in the example of passing to g). Certainly nothing we can't do without.

Kevin Gillette (Dec 15 2023 at 16:38):

There do seem to be cases I've read about where ASLR is available but intentionally disabled for whatever reason.

Kevin Gillette (Dec 15 2023 at 16:42):

Brendan Hansknecht (Dec 15 2023 at 16:43):

That said, I am pretty sure wasm doesn't have ASLR, maybe @Brian Carroll knows for sure. And that 100% would be a stopper for this form of seeding.

Brian Carroll (Dec 15 2023 at 16:43):

Kevin Gillette (Dec 15 2023 at 16:52):

Brendan Hansknecht (Dec 15 2023 at 16:53):

yeah, great talk. Points out just how random address space is per compilation and execution.

Brendan Hansknecht (Dec 15 2023 at 16:53):

But yeah, sounds like roc has no way to consitentally randomly seed Random.generate without depending on the platform or making some for of syscall from the builtins.

Richard Feldman (Dec 15 2023 at 17:25):

ok yeah overall seems like the Random.generate idea won't work out after all :big_smile:

Richard Feldman (Dec 15 2023 at 17:26):

which in turn will mean it will be more important to see how the ergonomics of seed threading go in the shadowing experiment!

Brendan Hansknecht (Dec 15 2023 at 17:30):

Richard Feldman (Dec 15 2023 at 17:38):

Richard Feldman (Dec 15 2023 at 17:39):

because Random wouldn't have a module param, and really shouldn't just for a convenience function

Richard Feldman (Dec 15 2023 at 17:39):

Richard Feldman (Dec 15 2023 at 17:40):

I realized that instead of using Stored for simulating tasks, we could have expect.sim (or whatever it ends up being called take a lambda where the first argument is a { read, write }` record

Richard Feldman (Dec 15 2023 at 17:41):

and then read and write are functions that use tasks to let you read and write to the state type of your choice, but only within the confines of that test

Richard Feldman (Dec 15 2023 at 17:42):

so basically serving the same purpose as Stored except just for the test (which was the main motivation for Stored in the first place)

Richard Feldman (Dec 15 2023 at 17:43):

and we could do something similar for tests: add another function to that record, so it's expect.sim \{ read, write, setSeed } -> ...

Stream: ideas

Topic: ✔ randomness in Task

Richard Feldman (Dec 10 2023 at 21:20):

Richard Feldman (Dec 10 2023 at 21:21):

Richard Feldman (Dec 10 2023 at 21:22):

Richard Feldman (Dec 10 2023 at 21:22):

Richard Feldman (Dec 10 2023 at 21:24):

Richard Feldman (Dec 10 2023 at 21:26):

Brendan Hansknecht (Dec 10 2023 at 21:30):

Brendan Hansknecht (Dec 10 2023 at 21:31):

Brendan Hansknecht (Dec 10 2023 at 21:31):

Brendan Hansknecht (Dec 10 2023 at 21:31):

Richard Feldman (Dec 10 2023 at 21:32):

Richard Feldman (Dec 10 2023 at 21:32):

Richard Feldman (Dec 10 2023 at 21:32):

Brendan Hansknecht (Dec 10 2023 at 21:33):

Brendan Hansknecht (Dec 10 2023 at 21:34):

Richard Feldman (Dec 10 2023 at 21:34):

Brendan Hansknecht (Dec 10 2023 at 21:34):

Richard Feldman (Dec 10 2023 at 21:34):

Brendan Hansknecht (Dec 10 2023 at 21:36):

Richard Feldman (Dec 10 2023 at 21:36):

Richard Feldman (Dec 10 2023 at 21:36):

Brendan Hansknecht (Dec 10 2023 at 21:37):

Richard Feldman (Dec 10 2023 at 21:39):

Richard Feldman (Dec 10 2023 at 21:40):

Brendan Hansknecht (Dec 10 2023 at 21:54):

Brendan Hansknecht (Dec 10 2023 at 21:55):

Brendan Hansknecht (Dec 10 2023 at 21:55):

Brendan Hansknecht (Dec 10 2023 at 21:55):

Brendan Hansknecht (Dec 10 2023 at 21:57):

Isaac Van Doren (Dec 10 2023 at 23:17):

Isaac Van Doren (Dec 10 2023 at 23:17):

Richard Feldman (Dec 10 2023 at 23:28):

Brendan Hansknecht (Dec 10 2023 at 23:38):

Brendan Hansknecht (Dec 10 2023 at 23:39):

Brendan Hansknecht (Dec 10 2023 at 23:39):

Richard Feldman (Dec 10 2023 at 23:47):

Richard Feldman (Dec 10 2023 at 23:47):

Richard Feldman (Dec 10 2023 at 23:47):

Brendan Hansknecht (Dec 11 2023 at 00:00):

Brendan Hansknecht (Dec 11 2023 at 00:03):

Brendan Hansknecht (Dec 11 2023 at 00:07):

Kevin Gillette (Dec 11 2023 at 01:04):

Kevin Gillette (Dec 11 2023 at 01:07):

Richard Feldman (Dec 11 2023 at 01:23):

Richard Feldman (Dec 11 2023 at 01:24):

Richard Feldman (Dec 11 2023 at 01:24):

Richard Feldman (Dec 11 2023 at 01:25):

Richard Feldman (Dec 11 2023 at 01:29):

Brendan Hansknecht (Dec 11 2023 at 01:32):

Brendan Hansknecht (Dec 11 2023 at 01:33):

Brendan Hansknecht (Dec 11 2023 at 01:33):

Sky Rose (Dec 11 2023 at 02:51):

Sky Rose (Dec 11 2023 at 02:52):

Brendan Hansknecht (Dec 11 2023 at 03:18):

Brendan Hansknecht (Dec 11 2023 at 03:21):

Brendan Hansknecht (Dec 11 2023 at 03:21):

Brendan Hansknecht (Dec 11 2023 at 03:31):

Brendan Hansknecht (Dec 11 2023 at 03:32):

Brendan Hansknecht (Dec 11 2023 at 03:35):

Richard Feldman (Dec 11 2023 at 03:45):

Richard Feldman (Dec 11 2023 at 03:46):

Richard Feldman (Dec 11 2023 at 03:46):

Richard Feldman (Dec 15 2023 at 13:19):

Richard Feldman (Dec 15 2023 at 13:23):

Brendan Hansknecht (Dec 15 2023 at 15:30):

Kevin Gillette (Dec 15 2023 at 15:54):

Brendan Hansknecht (Dec 15 2023 at 15:59):

Brendan Hansknecht (Dec 15 2023 at 16:00):

Kevin Gillette (Dec 15 2023 at 16:01):

Brendan Hansknecht (Dec 15 2023 at 16:07):

Kevin Gillette (Dec 15 2023 at 16:14):

Brendan Hansknecht (Dec 15 2023 at 16:24):

Brendan Hansknecht (Dec 15 2023 at 16:25):

Kevin Gillette (Dec 15 2023 at 16:28):

Brendan Hansknecht (Dec 15 2023 at 16:33):

Kevin Gillette (Dec 15 2023 at 16:34):

Brendan Hansknecht (Dec 15 2023 at 16:37):

Kevin Gillette (Dec 15 2023 at 16:37):