Stream: ideas

Topic: ✔ randomness in Task


view this post on Zulip Richard Feldman (Dec 10 2023 at 21:20):

one of the most frequent examples of shadowing that comes up is threading around seeds for random numbers. Threading the seeds can be avoided by using a wrapper (e.g. Random.Generator) but this doesn't work very well in conjunction with Task.

something we haven't talked about (that I can remember) is the idea of making Random a builtin, and giving incorporating it into Task, e.g.

Random.generate : Generator a -> Task a *
Random.setSeed : U64 -> Task {} *

we could set an initial seed using the trick @Brendan Hansknecht came up with for Dict's hash flooding defense: use a pointer into the binary and ASLR

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:21):

this would also probably be easier to use for new users, because it's pretty easy to understand what this is doing:

randomI64 <- Random.generate Random.i64 |> Task.await

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:22):

and with setSeed you could also make it deterministic for tests or reproducing (as long as you weren't using it in a multithreaded context)

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:22):

if you needed it to be deterministic given a particular seed across multiple threads, then you'd be back in today's status quo world of tradeoffs, but that seems like it would apply to a very small percentage of use cases

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:24):

the perf would be slightly worse than doing it today's way because Random.generate and Random.setSeed would need to do atomic loads and stores on the global seed

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:26):

assuming we did this, it seems like it could noticeably reduce demand for shadowing in application code, although not necessarily in library code

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:30):

Thread local instead of global seed?

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:31):

But yeah, would work nice for cases of only needing 1 RNG (which is most cases).

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:31):

Though means RNG always has to pull in tasks, which is a tradeoff.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:31):

I like to keep tasks in as limited of places as possible.

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:32):

Brendan Hansknecht said:

Thread local instead of global seed?

possible, although I think it could get weird because platforms can put Roc callbacks on different threads

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:32):

so you might get some really surprising behavior around that depending on the platform, especially if libraries did it

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:32):

e.g. one time in Elm I used task-based randomness in a library to put a UUID on a http request

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:33):

Ah. Fair

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:34):

Though even with global seed, order is not guaranteed

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:34):

also I'm not sure if compiling to threadlocals might affect Roc's portability; right now we don't explicitly depend on OS libraries or libc things from a design perspective (I believe we either can or already do compile down to just CPU instructions with no dependencies other than roc_* functions in the host), but I think threadlocals might require some integration there (I'm not sure)

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:34):

So I don't think it would make a difference from user persepctive

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:34):

true

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:36):

I think thread local is part of the binary. Not a system call. Just 1 memory location per thread. But I don't recall fully. Would need to double check/dig in. I am pretty sure it is dealt with automatically by the loader though.

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:36):

yeah I just think thread storage is an OS concept that the CPU doesn't know about

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:36):

I could be wrong!

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:37):

Yeah, os concept, but dealt with by the loader when the OS makes the thread. So roc doesn't have to do anything special just mark the memory area as thread local. That at least was my understanding. Basically a flag in the binary for a specific memory region

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:39):

interesting - I wonder if that's a thing in embedded systems

view this post on Zulip Richard Feldman (Dec 10 2023 at 21:40):

although I guess if it's a builtin, we could compile it to something different depending on target OS

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:54):

https://maskray.me/blog/2021-02-14-all-about-thread-local-storage

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:55):

This seems to have some good detail on how it works.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:55):

We would need to update the surgical linker to deal with the thread local storage sections, but otherwise should just work I think.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:55):

So it is an option.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 21:57):

hmm, though maybe some extra complexities that will be a pain and handled differently depending on the OS. But I think zig would deal with those details and we would just have to deal with proper final linkage.

view this post on Zulip Isaac Van Doren (Dec 10 2023 at 23:17):

I think this is a great idea

view this post on Zulip Isaac Van Doren (Dec 10 2023 at 23:17):

Though means RNG always has to pull in tasks, which is a tradeoff.

Couldn't you still use the current task-less approach if you didn't want a task?

view this post on Zulip Richard Feldman (Dec 10 2023 at 23:28):

yeah for sure

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:38):

For this task version, I assume we are going to pick a fast in roc RNG? Do we need to pick a giant or slow one for extra statistical backing or or a small and fast but seems statistically good one fine?

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:39):

Like the traditional option is the mersenne twister

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:39):

But it is like 4kb of state and only medium speed.

view this post on Zulip Richard Feldman (Dec 10 2023 at 23:47):

definitely prefer small and fast

view this post on Zulip Richard Feldman (Dec 10 2023 at 23:47):

like pcg or xoshiro I've heard are both good

view this post on Zulip Richard Feldman (Dec 10 2023 at 23:47):

I'm not sure the exact tradeoffs though

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 00:00):

I found the pcg author tends to be less belligerent, but that doesn't say much about the algorithms.

To my understanding, xoshiro:

  1. has a few bits that are less random due to its core being a linear shift register (even if a super well masked large period one).
  2. due to 1, fails some of the really hardcore random number generation tests, but that generally takes 100s of GB or even terrabytes of data generation.
  3. Is easier to predict if you know someone is using it and get access to a few random numbers
  4. Is faster than pcg by a bit

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 00:03):

I never use random number generators in a way that this bias matters, but to my understanding, pcg falls in a really nice balance of statically safe, fast, and small.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 00:07):

Note: when reading about issues and debates on these generators, you have to be really careful about which exact version of each generator people are talking about. For example, I think the PCG ext variants are deprecated because they have issues (they were only ever added for more speed at the cost of quality I think), but people who talk against PCG tend to bring them up even though no one recommends them at all.

view this post on Zulip Kevin Gillette (Dec 11 2023 at 01:04):

Richard Feldman said:

Brendan Hansknecht said:

Thread local instead of global seed?

possible, although I think it could get weird because platforms can put Roc callbacks on different threads

Agreed that thread-local might get tricky, but it seems like it could be worth having some kind of opaque Roc context (or a pointer to such data) that is passed to the platform and which is expected for the platform to pass back. Among the data in that context could be a random seed

view this post on Zulip Kevin Gillette (Dec 11 2023 at 01:07):

It wouldn't be os-thread-local, but could correspond to the semantic flow control of the code.

view this post on Zulip Richard Feldman (Dec 11 2023 at 01:23):

thinking about it some more, it might be a bad idea to have Random.setSeed at all

view this post on Zulip Richard Feldman (Dec 11 2023 at 01:24):

because as previously noted, the determinism basically won't be reliable no matter what, because the platform can run Roc functions on different threads

view this post on Zulip Richard Feldman (Dec 11 2023 at 01:24):

and they might change that in a patch release

view this post on Zulip Richard Feldman (Dec 11 2023 at 01:25):

so really it should just be Random.generate which generates you a random thing, and the seed it uses is entirely behind the scenes

view this post on Zulip Richard Feldman (Dec 11 2023 at 01:29):

at which point we could use a threadlocal or global with atomics (or even change from one to the other) and end users couldn't tell the difference

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 01:32):

I like that idea. Also, just an impl detail, but probably a lock if it is a global and not atomics due to the multitude of operations that must be done.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 01:33):

I just imagine two rngs on different threads running at the same time on the atomic state.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 01:33):

Probably wouldn't work right

view this post on Zulip Sky Rose (Dec 11 2023 at 02:51):

An observation: PRNG as a Task seems a lot like Stored: It's stateful / an effect that has to be handled with a Task outside the application's pure Roc code, but it's not IO that has to go through the platform, so Roc can handle it itself. Maybe you could even implement it in terms of Stored by setting/getting the seed.

view this post on Zulip Sky Rose (Dec 11 2023 at 02:52):

That makes me wonder: how is Stored going to handle multiple threads / locking / determinism, and can this PRNG idea and Stored handle threading in the same way?

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:18):

That a good point. It would be trivial to use Stored to load a prng state. Something like:

generate = \{} ->
    state <- Stored.get prngStateKey |> Task.await
    (next, rand) = step state
    _ <- Stored.store prngStateKey next |> Task.await
    Task.ok rand

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:21):

With the current plans for Stored, I think it would be planned to be a Single writer multi reader protected global variable.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:21):

Or potentially map to such variables.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:31):

context on stored is in #ideas > Stored ability.

Also, cause each key require a compile time know unique type, it would be the case the each Stored key maps to exactly 1 global variable that is protected by a multi reader single writer lock of some sort, probably.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:32):

Also, I just realized that my example above is broken. If you call generate at the same time in two different threads, they would both grab the same version of the state. Would need something like Stored.update that takes a lambda and holds the lock until the new value is saved.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 03:35):

That would be really terrible perf for random number generation. Just multithreads constantly fighting over the same state. So for this either want a thread local using a custom Random.generate, or a Stored value per thread (not sure how that would be supported in practice)

view this post on Zulip Richard Feldman (Dec 11 2023 at 03:45):

another problem with Stored is that Stored has no way to be initialized to a random value

view this post on Zulip Richard Feldman (Dec 11 2023 at 03:46):

you'd still need to come up with a random initial seed, which makes it less ergonomic than Random.generate - which can immediately give you a pseudorandom value the first time you call it, and it should be different when you rerun the program thanks to ASLR

view this post on Zulip Richard Feldman (Dec 11 2023 at 03:46):

(but also we can make it be deterministic based on e.g. the hash of the source code when you run it in tests, at least in the future when we have tests that can run tasks!)

view this post on Zulip Richard Feldman (Dec 15 2023 at 13:19):

hm, I realized that if we have Random.generate but not Random.setSeed, then it's not possible to create deterministic simulation tests involving Random.generate

view this post on Zulip Richard Feldman (Dec 15 2023 at 13:23):

that's a pretty big thing to give up

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 15:30):

Unless you also force them throughexpect-fx and are required to mock them. Maybe even record and save to mock. That should be fine

view this post on Zulip Kevin Gillette (Dec 15 2023 at 15:54):

I wasn't able to see anything about random in the builtins or cli/webserver docs. For reference, do we have a link to what we have today?

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 15:59):

Today there exists a roc-random package written fully in roc that implements pcg

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:00):

It would have to be seeded with the time or something cause there is no way to seed (with random data) in pure roc

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:01):

Ah, thanks

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:07):

Hey Richard, thinking about random more, I don't think we want to seed it the same way as we seed dictionaries. Depending on how the final linking is done with the host, the final binary may not be PIE. (Most likely would happen for any sort of future embedded work, but maybe someone consuming an object file would disable that). Also, not sure if wasm gets ASLR.

For dict, this isn't that big of a deal. Seed is random per compilation which is still a great defense from anyone who doesn't have a copy of your binary. With random on the other hand, your app would do the same thing every run in that case.

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:14):

I can see that it might be tricky to use with tasks unless given specific support to that effect, but can we also have a non-task approach for simpler cases (since tasks virally propagate through function signatures, iiuc).

It seems to me that it'd be sufficient for many cases to have a good quality "outer" PRNG that can be periodically asked to be reseeded via effects. That outer PRNG could just essentially provide seeds, via Tasks, to initialize "inner" PRNGs. Those inner PRNGs all have unique initial seeds, thus unique walks, and are the means to prevent any two invocations of the same function from observing the same values.

The inner PRNG would be passed into the functions that use it, and could either be used task-style or output-seed-passing-style.

Technically, an app function could return a seed for use elsewhere, but I believe it'll be more convenient and lead to better code to just create inner PRNGs at the deepest point in the stack in which tasks are already available, and then just pass those down and throw them away (i.e. consider them cheap).

For testing/reproducible debugging, the outer seed could be explicitly set (though ideally the platform would seed it by default).

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:24):

Yeah, I think the main need for the task api is due to the annoyance of threading PRNGs and Tasks at the same time.

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:25):

After that, I think either api is fine, but it is much nicer to contain tasks where possible and not leak them down the stack to an arbitrary function that needs a few random numbers

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:28):

If the non-task random functions return something like a (newPRNG, generatedValue) tuple, since there's already thought about shadowing sigils and such, we could have something like:

randIntFloat : PRNG -> (I64, F64)
randIntFloat = \prng ->
    int = randInt &prng
    float = randFloat &prng
    (int, float)

Where & (or whatever) means "declare a same-named, shadowed variable using the first tuple result of a twople, passing the second result on."

y = f &x |> g

# ^^ is sugar for:

(x, tmp) = f x
y = g tmp

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:33):

I think that is very specific and would be a hard sell as it definitely feels very magical, but it would help with threading. To be fair, I think shadowing alone fixes the threading issue.

(prng, x) = randInt prg

y <- SomeTask |> Task.await

(prng, z) = randFloat prg

That said, a major point of this discussion is that it might remove the need for shadowing due to making PRNG nice without it.

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:34):

Random per compilation doesn't seem that fantastic for stable/not-actively-developed projects, which does happen (COBOL is still in widespread use after all). Surely we could encourage platforms to opt-into providing some initial entropy, and use it with the compile-specific value via something like a fair-coin algorithm to get something pretty robust, even in less-trusted environments?

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:37):

Normally, it wouldn't be per compilation, only would be so in special edge cases where ASLR doesn't exist (very rare in the modern day)

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:37):

Agreed that shadowing with tuples isn't all that bad. The magic above would help more if pipelines are used to further manipulate or use the generated value (as in the example of passing to g). Certainly nothing we can't do without.

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:38):

There do seem to be cases I've read about where ASLR is available but intentionally disabled for whatever reason.

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:42):

https://vstinner.github.io/journey-to-stable-benchmark-average.html (oftentimes done for misguided reasons). I can't recall the other examples, but a few seemed legitimate in isolated cases.

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:43):

That said, I am pretty sure wasm doesn't have ASLR, maybe @Brian Carroll knows for sure. And that 100% would be a stopper for this form of seeding.

view this post on Zulip Brian Carroll (Dec 15 2023 at 16:43):

Nope it doesn't

view this post on Zulip Kevin Gillette (Dec 15 2023 at 16:52):

https://youtu.be/r-TLSBdHe1A?si=myy2EzHafDhf8-gg is an example of manipulating address layout to get reliable/meaningful benchmark results. it was an interesting watch.

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:53):

yeah, great talk. Points out just how random address space is per compilation and execution.

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 16:53):

But yeah, sounds like roc has no way to consitentally randomly seed Random.generate without depending on the platform or making some for of syscall from the builtins.

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:25):

ok yeah overall seems like the Random.generate idea won't work out after all :big_smile:

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:26):

which in turn will mean it will be more important to see how the ergonomics of seed threading go in the shadowing experiment!

view this post on Zulip Brendan Hansknecht (Dec 15 2023 at 17:30):

I mean it would work if you add a roc_random_seed to the every platform

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:38):

true

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:38):

although we'd still need a way to actually mock it in tests

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:39):

because Random wouldn't have a module param, and really shouldn't just for a convenience function

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:39):

so the usual strategy wouldn't work

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:39):

I can think of a way to do it though

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:40):

I realized that instead of using Stored for simulating tasks, we could have expect.sim (or whatever it ends up being called take a lambda where the first argument is a { read, write }` record

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:41):

so you'd say like expect.sim \{ read, write } -> ...

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:41):

and then read and write are functions that use tasks to let you read and write to the state type of your choice, but only within the confines of that test

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:42):

so basically serving the same purpose as Stored except just for the test (which was the main motivation for Stored in the first place)

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:43):

and we could do something similar for tests: add another function to that record, so it's expect.sim \{ read, write, setSeed } -> ...

view this post on Zulip Richard Feldman (Dec 15 2023 at 17:43):

and of course you could leave off any of those you didn't want to use

view this post on Zulip Notification Bot (Jan 04 2024 at 16:51):

Richard Feldman has marked this topic as resolved.


Last updated: Jun 16 2026 at 16:19 UTC