`Stored` ability · ideas · Zulip Chat Archive

roc-pg could use Stored for auto-prepared statements, a feature drivers on other languages offer

Brendan Hansknecht (Jun 03 2023 at 19:32):

I thinkStored is a bad idea. I fundamentally think it is something that will be badly abused and make many libraries much less trust-able than they should.

Like I totally get the convenience of it, but I think it will be a huge blow for purity and understandably of roc code, especially without the 3 arg Task. It will lead to a huge swath of functions that could be pure instead returning a Task. It will not be clear if those functions are doing everything under the sun or simply storing a value.

Let's look at random numbers as a concrete idea. Currently, we can make a nice chain-able random number api (still more explorations of a final api should be done, but what is below isn't too bad). Even if we were to force someone to store or pass around the state, what the state is, it's initial value, and if it ever gets reused is very clear. Besides requesting the initial seed, random numbers can be represented in completely pure roc code. So there are no Tasks anywhere else in the api. That is an extremely good thing.

state <- Random.seed |> Task.await # this is the only task required
int = Random.int 0 100
x = int state
y = x |> Random.next int

x.value + y.value # the sum of two random number from 0 to 100

{} <- Random.seed |> Task.await
int = Random.int 0 100 # This is a task.
x <- int |> Task.await
y <- int |> Task.await

x + y

Cool, these apis get the same thing done and look about the same. The first is probably less convenient because you have a state variable that is passed around. So, why do I think the second one is way worse? The loss of purity through proliferation of Task is a huge security hole.

Each of these functions is now another potential security hole that I need to audit. What do they do? Can they make network calls? Will someone later add code that spies on me to those function? Who knows, they are just Task. Task is important to roc and will be required in libraries, but fundamentally, we don't want it everywhere. I think that Stored will lead to Task being everwhere.

On top of that, what if I want multiple random number generators? What if I use a package, but also one of my subpackages uses it as well? Will our stored values/secrets clash? How is that managed?

I think we need to put a lot more design work into keeping things pure and monadic before considering Stored. If we do add Stored I think it should be a lot better defined. I also think that likely it should be used with 3 argument Task. I think it will lead to Task being proliferated so much that it will be very important to have the type system verify exactly what affects a function call can use.

My gut feeling is that Stored should be a clunky api only provide by some platforms unless we commit to 3 arg Task. I don't think this is something we want to promote the spread of.

Fábio Beirão (Jun 03 2023 at 23:59):

:thinking: I am way out of my depth here, but I would like to share my 2 cents. I agree with Brendan, that it would be a "wtf" if a library could have such "tentacles" from within to reach out to global state.

I am thinking that Elm has this kind of "key" system for Url (the user receives it on application.init)

I think that the ergonomics of libraries being able to state "hey, I would like to read and store data such and such" in a strong typed way is paramount.

I think ultimately the user (developer) should be in control. Any library function what would need to interact with the Stored api would need to carry with it this key, for both read and write.

To me this means it becomes clear to the user "ah, there's Stored operations going on on this Task". Also, I am thinking that when user is somehow declaring these keys, this is a place where we can "namespace" the keys to prevent collisions.
Once again this is kind of a stream of thought ramble, but it could be interesting to make it even explicit at the manifest level.

app "hello-world"
  packages { pf: "..." }
  imports []
  provides [main] to pf
  stores [ pg_username, pg_password, aws_api_key, openai_apikey ] # <- this would mean that on `main` we would receive a record like { pg_username : Stored.Key, pg_password: Stored.Key, etc... }

main : { pg_username : Stored.key, .... } -> Task ....
main = \keys ->
    # the user could now invoke library functions, passing each library the corresponding key

Brendan Hansknecht (Jun 04 2023 at 00:15):

A note as I think about this more. I honestly think we should reconsider 3 arg task. I think 3 arg task will be wonted once we have a real package echo system. i think it is only inconvenient now because we mostly have people working on very small apps where it is something to learn.

Brendan Hansknecht (Jun 04 2023 at 00:16):

Know exactly what effect types a task can produce is quite important in many library trust real red situations.

Georges Boris (Jun 04 2023 at 00:40):

@Brendan Hansknecht do you think 3 arg tasks would improve your perception of Stored? (since we could potentially see what is being stored? unless I got it wrong)

I don't really see how having a common storage effect in all applications would necessarily break purity. wouldn't it be the same as something provided by a platform?

Brendan Hansknecht (Jun 04 2023 at 01:23):

I think the real issue isn't technically purity, the real issue is the pollution of task (especially without knowing what the tasks do). It basically leads to programming in what is equivalent to a totally impure language.

I think 3 arg task helps because you can see that the random library only uses the store random seed effect and no other effects. Without that, it little is a totally opaque impure task

Brendan Hansknecht (Jun 04 2023 at 01:27):

But i guess this is also technically the largest argument against tasks in libraries in general, but i think fully restricting them would be too draconic and greatly limit the package ecosystem.

Brendan Hansknecht (Jun 04 2023 at 01:28):

I think stored is just such a large convenience function that is can be extremely problematic. Not only would way more code return tasks, but suddenly tons of code is essentially buckets around mutable state. Instead i think it would be better for a web platform just to provide stored for tokens or something more limited. That would help it not get abused as a mutability escape hatch all over the place

Agus Zubiaga (Jun 04 2023 at 03:43):

The problem with monads is that without HKTs they get more unwieldy the more you combine them.

Luke Boswell (Jun 04 2023 at 04:54):

@Brendan Hansknecht I don't really understand what a three arg task is, I assume something like Task ok err op based on previous discussion. Is the basic idea that each of the Ops or effects are enumerated for the task? I assume this is less ergonomic than the 2-arg task, is that the main tradeoff?

Brendan Hansknecht (Jun 04 2023 at 04:56):

# two arg
# No transparency around what it does. could call any tasks.
printRandomNumber: Task {} []

# three arg
# Can only read random integers and print to stdout
printRandomNumber : Task {} [] [Random.int _, Stdout.line _]

# If you don't care, does not have to be more verbose really
# still three arg, but letting the compiler fill in the op
printRandomNumber : Task {} [] _

# Could even just make an alias in your app if you really don't care
# I think that would work, but I may be wrong.
MyTask ok err : Task ok err _

Important note, even if a library author writes there code always using _ as the third arg, the doc generated type will still fill in the exact args and it can be enforced by the user code.

Main drawback we commented on before was that it is more confusing to teach new users and more verbose. I think that mostly can be alleviated by starting by teaching with it always being _ and saying it is not important yet.

Luke Boswell (Jun 04 2023 at 05:06):

Thank you for that explanation. Is there a difference between Tasks that are composed of other Tasks, and those provided by the platform?

Luke Boswell (Jun 04 2023 at 05:07):

Brendan Hansknecht (Jun 04 2023 at 05:22):

No difference. All tasks would have 3 args. The raw plaform tasks would just be directly defined:

# in Stdout.roc in the platform
line : \Str -> Task {} [] [Stdout.line _]
line = ...

Brendan Hansknecht (Jun 04 2023 at 05:23):

This is the primitive information needed such that any task chain that calls Stdout.line will be required to specify that it uses the Stdout.line op.

Brendan Hansknecht (Jun 04 2023 at 05:23):

Note: exact naming will be slightly different, but fundamentally it will be the same thing. Like the op actual may be Stdout (Line Str) or similar. This means an end user could write any of _, Stdout _, Stdout (Line _), or Stdout (Line Str), depending on how specific they want to be.

Georges Boris (Jun 04 2023 at 11:20):

but what would op mean in the context of Tasks in packages that are not tied to one specific platform? the packages would either need to use a type variable or define what platforms they are meant to work with, right?

Georges Boris (Jun 04 2023 at 11:20):

Georges Boris (Jun 04 2023 at 11:22):

and tbh when I first read about Stored I loved it because it did seem like a giant bucket of possibilities - so I agree with you that having it will probably be a mistake because people will definitely over use it :sweat_smile:

Georges Boris (Jun 04 2023 at 11:23):

maybe having 3 arg would already be enough for solving the same problem? since then we could have something used for aws tokens for instance using a stored-like effect. we would just need a platform that supports it

Luke Boswell (Jun 04 2023 at 11:35):

@Fábio Beirão I think your concern about "tentacles" is answered by the following if I understand correctly?

Luke Boswell (Jun 04 2023 at 11:38):

So I guess this means that a package can only touch state that is internal; i.e. from inside the module that declares an opaque type which implements the Stored ability.

Fábio Beirão (Jun 04 2023 at 12:18):

What I mean by tentacles, might be a bit like the (very unfamiliar to me!) IO monad in Haskel. As in, I would expect that if a function from a platform wants to "do things", its API signature would make it clear to the user (from an input point of view).
To be honest I don't really have a crystalized opinion on this topic, since I haven't yet gotten to experiment deeper with Effects/Tasks, to see how I feel about them.

Brendan Hansknecht (Jun 04 2023 at 14:45):

I believe it. I wish it wasn't the case, but based on what i have seen, seems like a totally plausible conclusion.

Richard Feldman (Jun 04 2023 at 15:30):

Richard Feldman (Jun 04 2023 at 16:12):

it unlocks similar things to Haskell's IORef, which I wouldn't say is overused

Richard Feldman (Jun 04 2023 at 16:14):

there's another way achieve this with 2-arg Task - but that's the thing I haven't finished writing up yet :sweat_smile:

Richard Feldman (Jun 04 2023 at 16:16):

maybe these proposals are more coupled than I thought, although I didn't want to put everything in one huge proposal either haha

Brendan Hansknecht (Jun 04 2023 at 16:42):

So I guess Stored is IORef, but with implicit keys? So instead of passing around a reference value and needing to store that, Stored hides away all of that. So with IORef, you would know that it is used because it wold return the reference. With Stored you have now way to tell.

Brendan Hansknecht (Jun 04 2023 at 16:46):

WIth IORef, I would make a guess that needing to return and store the reference gets rid of most of the benefits. So you would still hit a lot of the same issues we hit with direct monads. That would mean that it isn't any more convenient to use really. Stored on the other hand is super convenient to slip in anywhere. Just make a new opaque type and init function.

Note: total speculation, have never used IORef in haskell, but it looks like it still has the issue of need to pass around a state variable. The state variable just happens to be a reference.

Bryce Miller (Jun 04 2023 at 18:57):

How would Stored.read and Stored.write differ fundamentally from, for example, a task that sends an HTTP request that results in a state mutation on a server? The state lives in memory instead of a remote server, and you don't have to serialize or deserialize the data. But they are all Tasks that represent a state mutation somewhere. Am I missing something?

Bryce Miller (Jun 04 2023 at 19:02):

Also, Stored.read and Stored.write seem meaningfully less convenient than mutations in, say, any imperative language. If someone were going to use Stored values everywhere, wouldn't there be a strong incentive for them to just use a different language instead?

Brian Carroll (Jun 04 2023 at 19:53):

They don't differ fundamentally from HTTP, I think that's the point! Normally a functional program mostly consists of pure functions, and only a few effectful functions at the "edges" return tasks. I think the concern is that this feature makes it easy to make effectful stuff in the "middle" of your program. Then it's easier to write Roc in a style where you just mutate lots of stuff and everything returns tasks. Functional programming is meant to guide you away from that kind of thing.

Sky Rose (Jun 04 2023 at 22:09):

A HTTP request can change based on things outside of the program's control. Stored would be controlled by the language, so could only change based on what's in your program. It's still mutability and effects, but it's still a step more trustworthy than the outside world.

Richard Feldman (Jun 04 2023 at 23:39):

you can implement IORef using Stored, although (like IORef) it would only be usable in a single-threaded environment, since in a multithreaded environment there would be race conditions:

IORef a := Nat

IORefStore a := List a
    implements Stored

newIORef : a -> Task (IORef a) *
newIORef = \val ->
    @IORefStore list <- Stored.read |> Task.await

    # Use the length of the list (before we append to it)
    # to identify the index the val will be stored in
    ref = @IORef (List.len list)

    # This would be a race condition in a multithreaded
    # environment, because the (read+write) here are not atomic
    {} <- Stored.write (@IORefStore (List.append val)) |> Task.await

    Task.succeed ref

readIORef : IORef a -> Task a *
readIORef = \@IORef index ->
    @IORefStore list <- Stored.read |> Task.await

    when List.get list index is
        Ok val -> Task.succeed val
        Err OutOfBounds -> crash "Invalid IORef. This should never happen!"

since the design of Stored is for Stored.read and Stored.write to be locking, you could also use Stored to implement a threadsafe alternative to the above, using a different Stored to lock/unlock the IORefStore, and then block on it using tail recursion...but that implementation would be a lot more complicated of course :big_smile:

Richard Feldman (Jun 04 2023 at 23:41):

as far as I can tell, this is an absolute requirement for simulation tests to be hand editable - there has to be a way to have a primitive which fits this description

Richard Feldman (Jun 04 2023 at 23:41):

for example, let's say I'm trying to create a simulated version of Http.request : Request -> Task Response HttpErr

Richard Feldman (Jun 04 2023 at 23:42):

so I'm trying to create a function with that signature which can be used in place of the real one, in order to write a test which does not have effects

Richard Feldman (Jun 04 2023 at 23:42):

a fact about HTTP requests is that, given the same Request, the server may return a different Response

Richard Feldman (Jun 04 2023 at 23:43):

so I cannot make a simulated version of that function using IORef (and Task.succeed etc), precisely because I would have to pass the IORef through - and I can't do that in a simulation; the simulated function has to be the exact same type as the function it's simulating

Richard Feldman (Jun 04 2023 at 23:43):

Richard Feldman (Jun 04 2023 at 23:44):

as far as I can tell, it's not possible to make a simulated version of Http.request : Request -> Task Response HttpErr which doesn't actually run effects without something that has the same characteristic as Stored—namely, that it lets you read from some modular state using only Task, without having to pass that state through as arguments

Richard Feldman (Jun 04 2023 at 23:46):

so I feel similarly to Stored to how I felt about abilities: I share the concern that it might be overused, but "otherwise there is no possible way to do this in the language" is a very important consideration to balance that against, and the thing we're talking about making possible is being able to simulate effects without actually running the real effect—and I think it's very important that Roc be able to do that!

Richard Feldman (Jun 04 2023 at 23:47):

so I'm totally open to other designs that make it possible to create a simulated version of Http.request : Request -> Task Response HttpErr but I think it's very important that we have a way to do that in the language :big_smile:

Brendan Hansknecht (Jun 04 2023 at 23:48):

:thinking:...that feels like a very specific use case that we could probably deal with via builtins given task itself is a built-in and we control the testing framework. Doesn't feel like it needs Stored. Also, only allowing Stored for simulating tasks in tests is very different than allowing Stored everywhere.

Richard Feldman (Jun 04 2023 at 23:48):

Richard Feldman (Jun 04 2023 at 23:52):

Richard Feldman (Jun 04 2023 at 23:53):

btw I appreciate the push-back on this! I ended up with this design as a way to address a number of known pain points at once, but that doesn't mean there isn't a better solution out there somewhere...and even if we don't find one, I think exploring further will only lead to a better outcome :smiley:

Brendan Hansknecht (Jun 05 2023 at 00:23):

Question on simulation, couldn't the roc effect interpreter just track a state? Have something that has a state and just keeps getting the next task? So when it first sees an Http.request task, it can modify it's internal state to return something different next time it is called? So fundamentally, the task interpreter walks over the task and has a state? Am I missing something here?

Brendan Hansknecht (Jun 05 2023 at 00:24):

Ayaz Hafiz (Jun 05 2023 at 00:28):

I agree with the concerns for Stored re. inducing implicit mutability. Another concern I have is that it potentially allows arbitrary modules to read away types you’ve stored away, without your knowledge - for example, suppose there is some library that exposes an opaque “PrivateToken” type that implements Stored (for use in testing) and a debug printing function over PrivateToken. Now, I construct some PrivateToken is some code path, and in a later but disjoint code path, use some library that, without my knowledge, Stored.read’s a PrivateToken. Without limiting the scope of the storage API, or capturing the side effects in the type system, you have no knowledge of this potential vulnerability.

Ayaz Hafiz (Jun 05 2023 at 00:33):

fwiw I think there are a lot of benefits to “Task as a builtin” that are orthogonal to the “Stored API builtin”, and they can be considered independently, even if they play well together. So it may be worth breaking out this discussion into separate streams, since so far most of the discussion has been about Stored.

Notification Bot (Jun 05 2023 at 00:52):

Richard Feldman (Jun 05 2023 at 00:54):

Ayaz Hafiz (Jun 05 2023 at 01:08):

I also wonder if the restriction of Stored in tests only is indeed that high. I don't think it would be too onerous to say, here is a module, Stored, that you can use to make testing Tasks easier, but you can only use it in expects. That's just my opinion of course, I am biased.

Richard Feldman (Jun 05 2023 at 01:22):

hm, so I don't quite follow this concern - Task is already a black box that can do anything, including arbitrary reads/writes to/from arbitrary state.

put another way: it's already possible in Roc to do everything Stored can do, except:

Richard Feldman (Jun 05 2023 at 01:23):

I totally get that making it more convenient to be able to access mutable state, but you have to use Task to get at it, would create an incentive to reach for Task more often - and I definitely see that as a downside!

Richard Feldman (Jun 05 2023 at 01:24):

but I don't understand the concern that this would change any fundamental characteristics or guarantees about Task, if that makes sense

Richard Feldman (Jun 05 2023 at 01:27):

well you can always have PrivateToken wrap the actual Stored opaque type, and then expose PrivateToken but not the Stored opaque type it wraps, and then that can't happen anymore.

if we're concerned about that, we could always have a compiler warning if you try to expose an opaque type you've given Stored, and suggest that instead you wrap it and expose the wrapper, and if you really want to give people access to the Stored primitives, then implement wrapper functions and expose those

Georges Boris (Jun 05 2023 at 01:29):

maybe it would be worth it to imagine some scenarios where this could get out of hand? what would be a possible terrible misuse of Stored?

Richard Feldman (Jun 05 2023 at 01:30):

I think the basic shape of misuse is where you misuse it in the way that global mutable variables get misused, because that's basically what it wraps :big_smile:

Richard Feldman (Jun 05 2023 at 01:31):

there are cases where they're genuinely the nicest tool for the job, e.g. @Brendan Hansknecht I think you ran into a case recently where mutable references were desirable?

Richard Feldman (Jun 05 2023 at 01:32):

I'd expect a common way that Stored would be misused is to avoid threading arguments through functions

like "I could pass this new value through all these functions that already return Task, but that would take literally dozens of seconds of my life that I'll never get back, so instead I'll make a new Stored thing, have the first function write to it, have the function at the end of the chain read from it, and that will have been faster to implement, plus then I don't have to look at the extra argument in the type signatures"

I'd consider that worse because now it's harder to tell which code paths might have altered that value. When I'm passing it through as an argument, I can see exactly which functions might possibly have altered that value—they're all right there in the call chain. As soon as I put it in Stored to avoid that, now any function which can run a Task can potentially affect its value.

To me, it would have been better for long-term maintainability to just do the function threading.

Ayaz Hafiz (Jun 05 2023 at 01:34):

My concern is exposing this without any visibility of what may be going on, so getting at needing effects in the type system (a-la a third type parameter as Brendan mentioned). I agree you can do all this today. But to do it today you must opt in since only the platform can provide it; with this proposal on its own, there is no way to see what Storage APIs a library you might want to use accesses.

Ayaz Hafiz (Jun 05 2023 at 01:36):

I’m not sure this is enough if the wrapped type exposes a function to show its representation - for example, an “Inspect” implementation that prints the value representation. I agree we could provide warnings, make a convention, and perhaps it’s not something that would happen that often. But I wonder if there is a better design here, that rules it out altogether, or makes opting into potential uses of these behaviors (as a user of these libraries, not an implementor) more explicit.

Richard Feldman (Jun 05 2023 at 01:40):

like let's say I have PrivateToken := StoredToken and StoredToken has Stored but PrivateToken doesn't

Richard Feldman (Jun 05 2023 at 01:40):

Richard Feldman (Jun 05 2023 at 01:41):

how would the fact that it happens to wrap a Stored type change what anyone outside the module can do with PrivateToken?

Richard Feldman (Jun 05 2023 at 01:43):

I see - so I get the concern conceptually, I'm just not seeing any practical impact. :big_smile:

like for example, let's say I know "this Task can do HTTP" versus "this Task can do HTTP and also potentially read from/write to a global variable" - what is an example of a decision that I make differently based on this knowledge?

Richard Feldman (Jun 05 2023 at 01:43):

Ayaz Hafiz (Jun 05 2023 at 01:44):

maybe I misunderstood your initial example, but I guess I don't see the utility of wrapping PrivateToken wrapping StoredToken. When would you used StoredToken then?

Richard Feldman (Jun 05 2023 at 01:44):

Ayaz Hafiz (Jun 05 2023 at 01:44):

Richard Feldman (Jun 05 2023 at 01:44):

Richard Feldman (Jun 05 2023 at 01:45):

I'm taking it as a given that there's some desire to put Stored on PrivateType

Ayaz Hafiz (Jun 05 2023 at 01:45):

Well then we have the same concern, right? Because then you can Stored.read the PrivateToken

Ayaz Hafiz (Jun 05 2023 at 01:45):

Richard Feldman (Jun 05 2023 at 01:45):

Ayaz Hafiz (Jun 05 2023 at 01:47):

Richard Feldman (Jun 05 2023 at 01:48):

Richard Feldman (Jun 05 2023 at 01:51):

Ayaz Hafiz (Jun 05 2023 at 01:51):

Richard Feldman (Jun 05 2023 at 02:02):

interface Aws
    exposes [SecretKey, storeInS3]
    imports [Http]

SecretKey := Str

TempToken := [Uninitialized, Initialized Str]
    implements Stored

secretKeyFromStr: Str -> SecretKey
secretKeyFromStr: = @SecretKey

storeInS3 : SecretKey, DataForS3 -> Task {} Http.Err
storeInS3 = \@SecretKey secretKey, data ->
    tempToken <- getOrInitTempToken secretKey |> Task.await

    # use the temp token to call S3, since that's what S3 requires
    # if the S3 response indicates the temp token was expired,
    # run getOrInitTempToken again and re-run the S3 request
    # with that new token

# note: this is not exposed!
getOrInitTempToken : SecretKey -> Task Str Http.Err
getOrInitTempToken = \@SecretKey secretKey ->
    @TempToken tempToken -> Stored.read

    when tempToken is
        Uninitialized ->
            # assume getNewTempToken has been implemented
            str <- getNewTempToken secretKey |> Task.await
            {} <- Stored.write (@TempToken str)
            Task.succeed str

        Initialized str -> Task.succeed str

Richard Feldman (Jun 05 2023 at 02:03):

so the TempToken opaque type is never exposed, but still allows us to present a public-facing storeInS3 function that looks like a normal API that doesn't require this stateful temporary token concept that AWS has

Richard Feldman (Jun 05 2023 at 02:04):

in other words, it just asks for the secret key and that's it; users of this API don't even need to know that AWS does all this temporary token stuff

Richard Feldman (Jun 05 2023 at 02:04):

btw of note, this exact use case is in my mind the #2 selling point of Stored (#1 being simulation tests)

Richard Feldman (Jun 05 2023 at 02:07):

also note that this is a good example where saying "the host can implement this" is much worse for security than Stored

Richard Feldman (Jun 05 2023 at 02:08):

like yeah a host can offer a key/value store like write : Str, List U8 -> Task {} * and read : Str -> Task (List U8) *

Richard Feldman (Jun 05 2023 at 02:08):

but for Aws.roc to use that, it would have to both say "I need the platform to offer both HTTP as well as a key/value store like this" and then also it would have to pick a specific Str to use as a key for that

Ayaz Hafiz (Jun 05 2023 at 02:09):

Yeah that example makes sense. I agree you can design an API in this manner and it works around the problem.
My concern is, is the current Stored API the pit of success? Like I wonder if there is an alternative here that eliminates the potential to create a less-secure API in libraries like Aws.roc to begin with. In my mind, the first thought in a library designer's mind would be "let me expose SecretKey, and also an implementation of Stored for it, so users of my library can test with an arbitrary token in their unit tests!" - and it takes some knowledge of the best practice/security implications of the API/reading examples like the one you've provided to see that there is a better way.
I wonder if we can make "the best way" the default, or more actively push people in a safe direction, rather than leaving it up to convention that things should be defined this way.

Richard Feldman (Jun 05 2023 at 02:09):

at which point the original concern actually does apply: any other library which requires both HTTP as well as that key/value store can very much call the platform's read passing the same string key that AWS uses

Richard Feldman (Jun 05 2023 at 02:09):

so AWS is an example where it can be done safely and ergonomically using Stored, but I literally do not see a way to have it be both safe and ergonomic in current Roc

Richard Feldman (Jun 05 2023 at 02:10):

(and very unfortunately, the insecure way is more ergonomic, which means there would be demand for such a library despite the insecurity)

Richard Feldman (Jun 05 2023 at 02:11):

yeah that's a great goal! To be honest, the "hey don't expose opaque types that have Stored" warning feels to me like it would accomplish that

Richard Feldman (Jun 05 2023 at 02:11):

Richard Feldman (Jun 05 2023 at 02:13):

just to elaborate on this, I do think it's not the end of the world if Roc doesn't have AWS libraries that are as ergonomic as they are in mainstream languages (but as discussed on other threads, they would really be a lot less ergonomic) but I am genuinely concerned that the degree of painfulness will lead people to reach for insecure solutions that alleviate the user pain at the expense of introducing a vulnerability

Richard Feldman (Jun 05 2023 at 02:15):

so in that sense, if I had to predict, I would guess that in two hypothetical futures which differ only in that one has Stored in Roc (with the warning about exposing types with Stored), I would predict fewer successful security exploits of Roc programs than in the alternate world where everything is the same except there's no Stored

Ayaz Hafiz (Jun 05 2023 at 02:20):

and maybe make it an error instead of a warning - that would push developers to only the secure API :sweat_smile:

Brendan Hansknecht (Jun 05 2023 at 02:37):

I don't think Stored really helps my case. Like, yes, I do need something like a refcounted IORef, but whether I do that with a List {rc: Nat, data: MyType } in pure roc. Then just use Nat as my IORef type, or I wrapped that all in stored, it still really is all the exact same logic. Also, currently my code is totally pure with no tasks. So either way, it is adding a stateful wrapper. Either the state is via Stored in Task, or the state is just in my Evaluator type. Really all the same logic and hassle. Both cases, I have to manage a list of manually refcounted data and write code to find an reuse slots.

Richard Feldman (Jun 05 2023 at 02:58):

well, sure - I mean, warnings create a nonzero exit code, so they unavoidably block CI :big_smile:

Richard Feldman (Jun 05 2023 at 02:59):

so really the only distinction is whether roc dev treats them as blockers to running

Richard Feldman (Jun 05 2023 at 03:00):

and this seems like one that shouldn't block you roc dev from running in the sense that there's nothing broken, you just shouldn't deploy it like this :smiley:

Richard Feldman (Jun 05 2023 at 03:01):

just to clarify: you mean functions that reference non-exposed types in their types, yeah?

Richard Feldman (Jun 05 2023 at 03:02):

if so, I totally agree - in general, I think we should give a warning if you expose anything whose type annotation includes a type that isn't exposed

Richard Feldman (Jun 05 2023 at 03:02):

Richard Feldman (Jun 11 2023 at 18:30):

another potential use of Stored just occurred to me: platform-agnostic caching

e.g. let's say I want to have a platform-agnostic logging library with different levels you can set at runtime (e.g. through an env var or config file, or maybe even while the program is running). You could implement it to parse an env var on every logging event to see what the current level is, or similarly with reading and parsing a config file, but it would be much faster to have the library cache the log level in (the equivalent of) a global mutable variable so you can access it quickly.

also the same security concerns would apply (e.g. with Stored, other libraries can't mess with the log level, but with the type of raw Str or List U8 key/value store a platform could implement, they totally could), although to be fair there's not much of a security risk to someone maliciously changing the log level...or at least not one I can think of :sweat_smile:

Brendan Hansknecht (Jun 11 2023 at 19:12):

Hmm, but doesn't logging require platform support anyway? Is it logging to file, stdout, stderr, some other service like sending a web request of cached logs.

Brendan Hansknecht (Jun 11 2023 at 19:13):

So setting the log level doesn't matter if you literally have nothing to log to anyway.

Brendan Hansknecht (Jun 11 2023 at 19:14):

And if a platform is adding a logging api, I assume that would require some system to setting logging levels. Likely, you would not want a package to set it's own logging level. You would want to be able to configure the logging level for different packages and points in you code from the main app.

Richard Feldman (Jul 05 2023 at 11:18):

Task.simulate : List a -> Task a [SimEnded]

so each time the task runs, it gives the next element in the list, until the list runs out, at which point it gives the SimEnded error

Richard Feldman (Jul 05 2023 at 11:19):

(maybe there's a better name for it, since it doesn't strictly have to be used for simulation, but then again I can't think of another use for it!)

Richard Feldman (Jul 05 2023 at 11:32):

I also separately realized it would be possible to use Stored on opaque types defined in nested scopes, and those couldn't compile to plain global variables.

Richard Feldman (Jul 05 2023 at 11:35):

\arg ->
    Foo := U8 implements Stored { init: 0 }

    @Foo num <- Stored.get |> Task.await
    ...

Richard Feldman (Jul 05 2023 at 11:37):

I guess one possible answer is that it's the same global mutable variable, which might be surprising but maybe also is fine

Brendan Hansknecht (Jul 05 2023 at 15:19):

I thought fundamentally stored was going to compile to global mutable variables. That seems like the best way to make it performant and is fundamentally what Stored is.

Brendan Hansknecht (Jul 05 2023 at 19:45):

Hmm though if roc can be called by multiple threads, it would need to be protected by a mutex or rwlock of some sort.

Brendan Hansknecht (Jul 05 2023 at 19:45):

Richard Feldman (Jul 05 2023 at 19:47):

Richard Feldman (Jul 05 2023 at 19:48):

also threadlocals require libc-like dependencies, which may not always be available

Richard Feldman (Jul 05 2023 at 19:48):

also if it's global and protected by a lock, for some cases (e.g. integers) we can theoretically potentially optimize into single instructions like atomic load/store/etc

Richard Feldman (Dec 15 2023 at 20:36):

thinking about this some more, I think this is actually a subtly error-prone thing to use in the main non-testing use case we've discussed, namely AWS-like APIs

Richard Feldman (Dec 15 2023 at 20:36):

Richard Feldman (Dec 15 2023 at 20:37):

however, as soon as you have multiple AWS accounts, or - much, much worse - different threads want to use different request handlers - then suddenly this design becomes a source of errors and possibly also security vulnerabilities

Richard Feldman (Dec 15 2023 at 20:38):

because it's based on one global for the entire type, not one global value per request handler (or for that matter per thread, but the host is free to put different callbacks even within the same request handler on multiple different threads, so it's not like threadlocals would help here)

Hannes Nevalainen (Dec 15 2023 at 22:09):

Is there a link to this proposal somewhere? Ive seen this Stored thingy referenced a few times now and Im curious :)

Richard Feldman (Dec 15 2023 at 22:11):

Richard Feldman (Dec 16 2023 at 02:03):

relatedly, I had an idea for how to make use cases like AWS SDK more ergonomic without needing Stored

Richard Feldman (Dec 16 2023 at 02:05):

the basic idea is to have the AWS SDK package require two more module params along the lines of :

get : Task token [NotFound] where token implements Decoding,
set : token -> Task {} * where token implements Encoding,

Richard Feldman (Dec 16 2023 at 02:07):

and then server platforms can provide a per-request-handler key/value store where both the keys and values are List U8, and the API exposes them in terms of Encoding and Decoding

Richard Feldman (Dec 16 2023 at 02:07):

so then as the application author, when importing the AWS SDK package, I give it module params which use this key-value store, but I give it functions that I've prepopulated to use my AWS-specific key

Richard Feldman (Dec 16 2023 at 02:08):

that way, the AWS SDK package gets the storage it needs, but - crucially - other (potentially malicious) packages can't access it at all

Richard Feldman (Dec 16 2023 at 02:09):

and the AWS package can't access storage I'm using for other things either, because I'm only passing it sandboxed functions which know how to access my application's AWS keys in the per-request-handler key/value store

Richard Feldman (Dec 16 2023 at 02:10):

so it's very slightly less ergonomic than Stored in that I have to write these two sandboxed one-liner functions once in my entire code base, and then specify them whenever I import the AWS module, but that's still far more convenient than havnig to thread the AWS temporary token through everywhere

Richard Feldman (Dec 16 2023 at 02:10):

and if I take the ergonomics delta between that and Stored, it's so small I think it would be hard to justify introducing Stored using that as a major motivating factor

Luke Boswell (Dec 16 2023 at 02:30):

I don't quite follow this. I think I'm almost there, but the per-request part has me a bit confused. Is the intent here for a cache that the platform provides to the application as a Task. That token value is going to be the same globally, so if it's set using set then every get will return that same token even across different threads.

Richard Feldman (Dec 16 2023 at 02:30):

Luke Boswell (Dec 16 2023 at 02:31):

Without Stored, what would the equivalent in-memory cache look like if I wanted to store session keys and some meta data like userId or user roles?

Luke Boswell (Dec 16 2023 at 02:33):

If we are using Encoding and Decoding so that we are passing List U8 bytes to the platform for safe storage, I assume we are very quickly going to want that compact binary encoding for "all the things" so I can store any data and recover it quickly.

Richard Feldman (Dec 16 2023 at 02:33):

Richard Feldman (Dec 16 2023 at 02:34):

it would be different in that it wouldn't be per-request-handler, but rather something more global. Kind of a different use case honestly

Richard Feldman (Dec 16 2023 at 03:05):

actually for AWS in particular I can see a lot of applications preferring it to be global. What's cool is that this design works for both use cases, and the library has the same API either way! :smiley:

John Murray (Dec 16 2023 at 03:08):

Could stored be used for something like deno kv or would a more specific platform api be better for that use case?

Brendan Hansknecht (Dec 16 2023 at 08:50):

Luke Boswell (Dec 16 2023 at 09:44):

If we are making a (platform independent) package for KV then I assume it would require different API, maybe;

get : U64 -> Task value [NotFound] where value implements Decoding,
set : U64, value -> Task {} * where value implements Encoding,

Not sure if the key should just be a U64 or maybe something like where key implements Eq, Ord or something

Richard Feldman (Dec 16 2023 at 11:19):

Richard Feldman (Dec 16 2023 at 11:20):

the platform would expose an API with both keys and values but the AWS package wouldn't care what key you're using (which is actively important for security!)

Luke Boswell (Dec 16 2023 at 11:25):

Is it too early to add an effect like this to the platforms? I'd like this for webserver in particular. I can use JSON for now and use it as a cache, assuming thats faster than starting a child process to call into sqlite from command line.

Richard Feldman (Dec 16 2023 at 11:33):

Richard Feldman (Dec 16 2023 at 11:34):

I'm not sure about U64 keys though, might not be enough for some use cases :thinking:

Richard Feldman (Dec 16 2023 at 11:34):

Richard Feldman (Dec 16 2023 at 11:35):

I guess worth trying (since it's faster than e.g. keys that just implement Encoding) and seeing if it ends up being a problem in practice :big_smile:

Agus Zubiaga (Dec 16 2023 at 11:53):

I really like the simplicity of this approach, and it sounds wise to try this before something like Stored that might be harder to take back.

Brendan Hansknecht (Dec 16 2023 at 15:21):

Other question... Can we do better than encode? Maybe just a box of anything? Though that isn't your safe.... Hmm

Brendan Hansknecht (Dec 16 2023 at 15:21):

Agus Zubiaga (Dec 16 2023 at 15:51):

Agus Zubiaga (Dec 16 2023 at 15:52):

but I guess you’d have to guarantee that the same key isn’t used twice with different types

Agus Zubiaga (Dec 16 2023 at 15:54):

Agus Zubiaga (Dec 16 2023 at 15:57):

I understand the platform can namespace per request or whatever makes sense, but you might still have conflicts across modules

Brendan Hansknecht (Dec 16 2023 at 16:03):

I mean you could do this, but not sure how the key type having value some how fixes any type safety really. Also, it would mean that the key would need to store a dummy value, which is strange. I mean I guess it could be defined like an option and be the nothing case, but still strange

get : Key value -> Task (Box value) [NotFound]
set : Key value, Box value -> Task {} *

Brendan Hansknecht (Dec 16 2023 at 16:04):

Probably wrap get and set when passing them to a module. That way you could make a wrapping key type. Would require a more complex key type than just an integer.

Agus Zubiaga (Dec 16 2023 at 16:11):

import pf.State

key : State.Key (Result Str [Pending])
key = State.init “unique key here” (Err Pending)

Agus Zubiaga (Dec 16 2023 at 16:12):

Agus Zubiaga (Dec 16 2023 at 16:13):

It’d be nice to have the platform/language come up with the key using a sequence or something

Agus Zubiaga (Dec 16 2023 at 16:54):

Luke Boswell (Dec 16 2023 at 18:09):

I'm a little confused where this ended up, are U64 keys ok? I went with that because thats what we use for Hash and figured you could just hash anything on the platform side before passing to the host. So the Roc to Host interface is just U64 and List U8.

Brian Carroll (Dec 17 2023 at 12:40):

Yeah I'm confused about how U64 could possibly not be enough, at least on a 64-bit machine.

Richard Feldman (Dec 17 2023 at 12:49):

I think it could be not enough in a scenario where someone is using arbitrary strings (e.g. domain names) for the keys, but maybe that's just not something that should be supported - like if you want that, store a dictionary in the value under a hardcoded U64 key

Brian Carroll (Dec 17 2023 at 14:53):

Alexander Kiel (Jan 03 2024 at 10:21):

If I understand the AWS example correctly, it would not necessarily use the token fitting the secret key. As I see it, the storeInS3 function returns a Task that will use the token that is stored at the time the task is executed. This token must not match the secret key. The implementation could be fixed by storing a Dict from secret key to token.

At the end, as @Brendan Hansknecht said, this impurity of using the wrong token is not reflected in the Task type.

@Richard Feldman Do you know https://zio.dev? I can't find anything about ZIO here. It's an effect system for Scala and unlike the current Task type has a third type parameter called Environment. If you like to include state in the execution of your ZIO type, you have to specify a ZState type in the Environment.

So essentially ZIO is an example of a 3-arg Task type @Brendan Hansknecht suggested.

Tobias Steckenborn (Jan 03 2024 at 10:37):

Richard Feldman (Jan 05 2024 at 12:02):

it doesn't need a third argument to Task (which we've had in the past and intentionally decided to remove because it didn't seem to be worth the complexity it introduced) but also doesn't have a concern with token security!

Stream: ideas

Topic: `Stored` ability

Agus Zubiaga (Jun 03 2023 at 16:49):

Brendan Hansknecht (Jun 03 2023 at 19:32):

Fábio Beirão (Jun 03 2023 at 23:59):

Brendan Hansknecht (Jun 04 2023 at 00:15):

Brendan Hansknecht (Jun 04 2023 at 00:16):

Georges Boris (Jun 04 2023 at 00:40):

Brendan Hansknecht (Jun 04 2023 at 01:23):

Brendan Hansknecht (Jun 04 2023 at 01:27):

Brendan Hansknecht (Jun 04 2023 at 01:28):

Agus Zubiaga (Jun 04 2023 at 03:43):

Luke Boswell (Jun 04 2023 at 04:54):

Brendan Hansknecht (Jun 04 2023 at 04:56):

Luke Boswell (Jun 04 2023 at 05:06):

Luke Boswell (Jun 04 2023 at 05:07):

Brendan Hansknecht (Jun 04 2023 at 05:22):

Brendan Hansknecht (Jun 04 2023 at 05:23):

Brendan Hansknecht (Jun 04 2023 at 05:23):

Georges Boris (Jun 04 2023 at 11:20):

Georges Boris (Jun 04 2023 at 11:20):

Georges Boris (Jun 04 2023 at 11:22):

Georges Boris (Jun 04 2023 at 11:23):

Luke Boswell (Jun 04 2023 at 11:35):

Luke Boswell (Jun 04 2023 at 11:38):

Fábio Beirão (Jun 04 2023 at 12:18):

Brendan Hansknecht (Jun 04 2023 at 14:45):

Richard Feldman (Jun 04 2023 at 15:30):

Richard Feldman (Jun 04 2023 at 16:12):

Richard Feldman (Jun 04 2023 at 16:14):

Richard Feldman (Jun 04 2023 at 16:16):

Brendan Hansknecht (Jun 04 2023 at 16:42):

Brendan Hansknecht (Jun 04 2023 at 16:46):

Bryce Miller (Jun 04 2023 at 18:57):

Bryce Miller (Jun 04 2023 at 19:02):

Brian Carroll (Jun 04 2023 at 19:53):

Sky Rose (Jun 04 2023 at 22:09):

Richard Feldman (Jun 04 2023 at 23:39):

Richard Feldman (Jun 04 2023 at 23:41):

Richard Feldman (Jun 04 2023 at 23:41):

Richard Feldman (Jun 04 2023 at 23:42):

Richard Feldman (Jun 04 2023 at 23:42):

Richard Feldman (Jun 04 2023 at 23:43):

Richard Feldman (Jun 04 2023 at 23:43):

Richard Feldman (Jun 04 2023 at 23:43):

Richard Feldman (Jun 04 2023 at 23:44):

Richard Feldman (Jun 04 2023 at 23:46):

Richard Feldman (Jun 04 2023 at 23:47):

Brendan Hansknecht (Jun 04 2023 at 23:48):

Richard Feldman (Jun 04 2023 at 23:48):

Richard Feldman (Jun 04 2023 at 23:52):

Richard Feldman (Jun 04 2023 at 23:53):

Brendan Hansknecht (Jun 05 2023 at 00:23):

Brendan Hansknecht (Jun 05 2023 at 00:24):

Ayaz Hafiz (Jun 05 2023 at 00:28):

Ayaz Hafiz (Jun 05 2023 at 00:33):

Notification Bot (Jun 05 2023 at 00:52):

Richard Feldman (Jun 05 2023 at 00:54):

Ayaz Hafiz (Jun 05 2023 at 01:08):

Richard Feldman (Jun 05 2023 at 01:22):

Richard Feldman (Jun 05 2023 at 01:23):

Richard Feldman (Jun 05 2023 at 01:24):

Richard Feldman (Jun 05 2023 at 01:27):

Georges Boris (Jun 05 2023 at 01:29):

Richard Feldman (Jun 05 2023 at 01:30):

Richard Feldman (Jun 05 2023 at 01:31):

Richard Feldman (Jun 05 2023 at 01:32):

Ayaz Hafiz (Jun 05 2023 at 01:34):

Ayaz Hafiz (Jun 05 2023 at 01:36):

Richard Feldman (Jun 05 2023 at 01:40):

Richard Feldman (Jun 05 2023 at 01:40):

Richard Feldman (Jun 05 2023 at 01:40):

Richard Feldman (Jun 05 2023 at 01:41):

Richard Feldman (Jun 05 2023 at 01:43):

Richard Feldman (Jun 05 2023 at 01:43):

Ayaz Hafiz (Jun 05 2023 at 01:44):

Richard Feldman (Jun 05 2023 at 01:44):

Ayaz Hafiz (Jun 05 2023 at 01:44):

Richard Feldman (Jun 05 2023 at 01:44):

Richard Feldman (Jun 05 2023 at 01:45):