roc-pg could use Stored for auto-prepared statements, a feature drivers on other languages offer
I thinkStored is a bad idea. I fundamentally think it is something that will be badly abused and make many libraries much less trust-able than they should.
Like I totally get the convenience of it, but I think it will be a huge blow for purity and understandably of roc code, especially without the 3 arg Task. It will lead to a huge swath of functions that could be pure instead returning a Task. It will not be clear if those functions are doing everything under the sun or simply storing a value.
Let's look at random numbers as a concrete idea. Currently, we can make a nice chain-able random number api (still more explorations of a final api should be done, but what is below isn't too bad). Even if we were to force someone to store or pass around the state, what the state is, it's initial value, and if it ever gets reused is very clear. Besides requesting the initial seed, random numbers can be represented in completely pure roc code. So there are no Tasks anywhere else in the api. That is an extremely good thing.
here is some simple code with an api achievable in current roc
state <- Random.seed |> Task.await # this is the only task required
int = Random.int 0 100
x = int state
y = x |> Random.next int
x.value + y.value # the sum of two random number from 0 to 100
Now lets imagine a world with Stored used for the random module:
{} <- Random.seed |> Task.await
int = Random.int 0 100 # This is a task.
x <- int |> Task.await
y <- int |> Task.await
x + y
Cool, these apis get the same thing done and look about the same. The first is probably less convenient because you have a state variable that is passed around. So, why do I think the second one is way worse? The loss of purity through proliferation of Task is a huge security hole.
Look at the types of the random functions in the second example:
Random.seed: Task {} errRandom.int: I32, I32 -> Task I32 errTask something errEach of these functions is now another potential security hole that I need to audit. What do they do? Can they make network calls? Will someone later add code that spies on me to those function? Who knows, they are just Task. Task is important to roc and will be required in libraries, but fundamentally, we don't want it everywhere. I think that Stored will lead to Task being everwhere.
On top of that, what if I want multiple random number generators? What if I use a package, but also one of my subpackages uses it as well? Will our stored values/secrets clash? How is that managed?
I think we need to put a lot more design work into keeping things pure and monadic before considering Stored. If we do add Stored I think it should be a lot better defined. I also think that likely it should be used with 3 argument Task. I think it will lead to Task being proliferated so much that it will be very important to have the type system verify exactly what affects a function call can use.
My gut feeling is that Stored should be a clunky api only provide by some platforms unless we commit to 3 arg Task. I don't think this is something we want to promote the spread of.
:thinking: I am way out of my depth here, but I would like to share my 2 cents. I agree with Brendan, that it would be a "wtf" if a library could have such "tentacles" from within to reach out to global state.
I am thinking that Elm has this kind of "key" system for Url (the user receives it on application.init)
I think that the ergonomics of libraries being able to state "hey, I would like to read and store data such and such" in a strong typed way is paramount.
I think ultimately the user (developer) should be in control. Any library function what would need to interact with the Stored api would need to carry with it this key, for both read and write.
To me this means it becomes clear to the user "ah, there's Stored operations going on on this Task". Also, I am thinking that when user is somehow declaring these keys, this is a place where we can "namespace" the keys to prevent collisions.
Once again this is kind of a stream of thought ramble, but it could be interesting to make it even explicit at the manifest level.
app "hello-world"
packages { pf: "..." }
imports []
provides [main] to pf
stores [ pg_username, pg_password, aws_api_key, openai_apikey ] # <- this would mean that on `main` we would receive a record like { pg_username : Stored.Key, pg_password: Stored.Key, etc... }
main : { pg_username : Stored.key, .... } -> Task ....
main = \keys ->
# the user could now invoke library functions, passing each library the corresponding key
A note as I think about this more. I honestly think we should reconsider 3 arg task. I think 3 arg task will be wonted once we have a real package echo system. i think it is only inconvenient now because we mostly have people working on very small apps where it is something to learn.
Know exactly what effect types a task can produce is quite important in many library trust real red situations.
@Brendan Hansknecht do you think 3 arg tasks would improve your perception of Stored? (since we could potentially see what is being stored? unless I got it wrong)
I don't really see how having a common storage effect in all applications would necessarily break purity. wouldn't it be the same as something provided by a platform?
I think the real issue isn't technically purity, the real issue is the pollution of task (especially without knowing what the tasks do). It basically leads to programming in what is equivalent to a totally impure language.
I think 3 arg task helps because you can see that the random library only uses the store random seed effect and no other effects. Without that, it little is a totally opaque impure task
But i guess this is also technically the largest argument against tasks in libraries in general, but i think fully restricting them would be too draconic and greatly limit the package ecosystem.
I think stored is just such a large convenience function that is can be extremely problematic. Not only would way more code return tasks, but suddenly tons of code is essentially buckets around mutable state. Instead i think it would be better for a web platform just to provide stored for tokens or something more limited. That would help it not get abused as a mutability escape hatch all over the place
Brendan Hansknecht said:
I think we need to put a lot more design work into keeping things pure and monadic before considering
Stored.
The problem with monads is that without HKTs they get more unwieldy the more you combine them.
@Brendan Hansknecht I don't really understand what a three arg task is, I assume something like Task ok err op based on previous discussion. Is the basic idea that each of the Ops or effects are enumerated for the task? I assume this is less ergonomic than the 2-arg task, is that the main tradeoff?
Yeah, that is pretty much it:
# two arg
# No transparency around what it does. could call any tasks.
printRandomNumber: Task {} []
# three arg
# Can only read random integers and print to stdout
printRandomNumber : Task {} [] [Random.int _, Stdout.line _]
# If you don't care, does not have to be more verbose really
# still three arg, but letting the compiler fill in the op
printRandomNumber : Task {} [] _
# Could even just make an alias in your app if you really don't care
# I think that would work, but I may be wrong.
MyTask ok err : Task ok err _
Important note, even if a library author writes there code always using _ as the third arg, the doc generated type will still fill in the exact args and it can be enforced by the user code.
Main drawback we commented on before was that it is more confusing to teach new users and more verbose. I think that mostly can be alleviated by starting by teaching with it always being _ and saying it is not important yet.
Thank you for that explanation. Is there a difference between Tasks that are composed of other Tasks, and those provided by the platform?
Like, does a Task from the platform still have a third argument in this design?
No difference. All tasks would have 3 args. The raw plaform tasks would just be directly defined:
# in Stdout.roc in the platform
line : \Str -> Task {} [] [Stdout.line _]
line = ...
This is the primitive information needed such that any task chain that calls Stdout.line will be required to specify that it uses the Stdout.line op.
Note: exact naming will be slightly different, but fundamentally it will be the same thing. Like the op actual may be Stdout (Line Str) or similar. This means an end user could write any of _, Stdout _, Stdout (Line _), or Stdout (Line Str), depending on how specific they want to be.
but what would op mean in the context of Tasks in packages that are not tied to one specific platform? the packages would either need to use a type variable or define what platforms they are meant to work with, right?
that makes a lot of sense
and tbh when I first read about Stored I loved it because it did seem like a giant bucket of possibilities - so I agree with you that having it will probably be a mistake because people will definitely over use it :sweat_smile:
maybe having 3 arg would already be enough for solving the same problem? since then we could have something used for aws tokens for instance using a stored-like effect. we would just need a platform that supports it
@Fábio Beirão I think your concern about "tentacles" is answered by the following if I understand correctly?
... Stored is secure by default. Each opaque type is only accessible by default inside its own module; you have to go out of your way to grant other modules access to each opaque type individually.
So I guess this means that a package can only touch state that is internal; i.e. from inside the module that declares an opaque type which implements the Stored ability.
What I mean by tentacles, might be a bit like the (very unfamiliar to me!) IO monad in Haskel. As in, I would expect that if a function from a platform wants to "do things", its API signature would make it clear to the user (from an input point of view).
To be honest I don't really have a crystalized opinion on this topic, since I haven't yet gotten to experiment deeper with Effects/Tasks, to see how I feel about them.
Agus Zubiaga said:
The problem with monads is that without HKTs they get more unwieldy the more you combine them.
I believe it. I wish it wasn't the case, but based on what i have seen, seems like a totally plausible conclusion.
In that cause, I would say that we should:
Stored should be general or specialized.The problem with monads is that without HKTs they get more unwieldy the more you combine them.
I think this is true even if you have HKTs :big_smile:
Georges Boris said:
and tbh when I first read about Stored I loved it because it did seem like a giant bucket of possibilities - so I agree with you that having it will probably be a mistake because people will definitely over use it :sweat_smile:
it unlocks similar things to Haskell's IORef, which I wouldn't say is overused
Brendan Hansknecht said:
I think the real issue isn't technically purity, the real issue is the pollution of task (especially without knowing what the tasks do). It basically leads to programming in what is equivalent to a totally impure language.
I think 3 arg task helps because you can see that the random library only uses the store random seed effect and no other effects. Without that, it little is a totally opaque impure task
there's another way achieve this with 2-arg Task - but that's the thing I haven't finished writing up yet :sweat_smile:
maybe these proposals are more coupled than I thought, although I didn't want to put everything in one huge proposal either haha
So I guess Stored is IORef, but with implicit keys? So instead of passing around a reference value and needing to store that, Stored hides away all of that. So with IORef, you would know that it is used because it wold return the reference. With Stored you have now way to tell.
WIth IORef, I would make a guess that needing to return and store the reference gets rid of most of the benefits. So you would still hit a lot of the same issues we hit with direct monads. That would mean that it isn't any more convenient to use really. Stored on the other hand is super convenient to slip in anywhere. Just make a new opaque type and init function.
Note: total speculation, have never used IORef in haskell, but it looks like it still has the issue of need to pass around a state variable. The state variable just happens to be a reference.
How would Stored.read and Stored.write differ fundamentally from, for example, a task that sends an HTTP request that results in a state mutation on a server? The state lives in memory instead of a remote server, and you don't have to serialize or deserialize the data. But they are all Tasks that represent a state mutation somewhere. Am I missing something?
Also, Stored.read and Stored.write seem meaningfully less convenient than mutations in, say, any imperative language. If someone were going to use Stored values everywhere, wouldn't there be a strong incentive for them to just use a different language instead?
They don't differ fundamentally from HTTP, I think that's the point! Normally a functional program mostly consists of pure functions, and only a few effectful functions at the "edges" return tasks. I think the concern is that this feature makes it easy to make effectful stuff in the "middle" of your program. Then it's easier to write Roc in a style where you just mutate lots of stuff and everything returns tasks. Functional programming is meant to guide you away from that kind of thing.
How would
Stored.readandStored.writediffer fundamentally from, for example, a task that sends an HTTP request that results in a state mutation on a server?
A HTTP request can change based on things outside of the program's control. Stored would be controlled by the language, so could only change based on what's in your program. It's still mutability and effects, but it's still a step more trustworthy than the outside world.
Brendan Hansknecht said:
So I guess
StoredisIORef, but with implicit keys? So instead of passing around a reference value and needing to store that,Storedhides away all of that. So withIORef, you would know that it is used because it wold return the reference. WithStoredyou have now way to tell.
you can implement IORef using Stored, although (like IORef) it would only be usable in a single-threaded environment, since in a multithreaded environment there would be race conditions:
IORef a := Nat
IORefStore a := List a
implements Stored
newIORef : a -> Task (IORef a) *
newIORef = \val ->
@IORefStore list <- Stored.read |> Task.await
# Use the length of the list (before we append to it)
# to identify the index the val will be stored in
ref = @IORef (List.len list)
# This would be a race condition in a multithreaded
# environment, because the (read+write) here are not atomic
{} <- Stored.write (@IORefStore (List.append val)) |> Task.await
Task.succeed ref
readIORef : IORef a -> Task a *
readIORef = \@IORef index ->
@IORefStore list <- Stored.read |> Task.await
when List.get list index is
Ok val -> Task.succeed val
Err OutOfBounds -> crash "Invalid IORef. This should never happen!"
since the design of Stored is for Stored.read and Stored.write to be locking, you could also use Stored to implement a threadsafe alternative to the above, using a different Stored to lock/unlock the IORefStore, and then block on it using tail recursion...but that implementation would be a lot more complicated of course :big_smile:
Brendan Hansknecht said:
So I guess
StoredisIORef, but with implicit keys? So instead of passing around a reference value and needing to store that,Storedhides away all of that.
as far as I can tell, this is an absolute requirement for simulation tests to be hand editable - there has to be a way to have a primitive which fits this description
for example, let's say I'm trying to create a simulated version of Http.request : Request -> Task Response HttpErr
so I'm trying to create a function with that signature which can be used in place of the real one, in order to write a test which does not have effects
a fact about HTTP requests is that, given the same Request, the server may return a different Response
so I cannot make a simulated version of that function using IORef (and Task.succeed etc), precisely because I would have to pass the IORef through - and I can't do that in a simulation; the simulated function has to be the exact same type as the function it's simulating
it has to be a drop-in replacement
this use case is the main motivation behind Stored
as far as I can tell, it's not possible to make a simulated version of Http.request : Request -> Task Response HttpErr which doesn't actually run effects without something that has the same characteristic as Stored—namely, that it lets you read from some modular state using only Task, without having to pass that state through as arguments
so I feel similarly to Stored to how I felt about abilities: I share the concern that it might be overused, but "otherwise there is no possible way to do this in the language" is a very important consideration to balance that against, and the thing we're talking about making possible is being able to simulate effects without actually running the real effect—and I think it's very important that Roc be able to do that!
so I'm totally open to other designs that make it possible to create a simulated version of Http.request : Request -> Task Response HttpErr but I think it's very important that we have a way to do that in the language :big_smile:
:thinking:...that feels like a very specific use case that we could probably deal with via builtins given task itself is a built-in and we control the testing framework. Doesn't feel like it needs Stored. Also, only allowing Stored for simulating tasks in tests is very different than allowing Stored everywhere.
Brendan Hansknecht said:
:thinking:...that feels like a very specific use case that we could probably deal with via builtins given task itself is a built-in and we control the testing framework.
I couldn't think of a way, but I'm open to suggestions! :smiley:
Brendan Hansknecht said:
Also, only allowing
Storedfor simulating tasks in tests is very different than allowingStoredeverywhere.
sure, although:
Taskroc test it works, but if I run roc build or roc run I get an error if I'm using Stored in any code path...but then now I can't colocate my simulation tests inside my modules, like normal? That doesn't seem great...btw I appreciate the push-back on this! I ended up with this design as a way to address a number of known pain points at once, but that doesn't mean there isn't a better solution out there somewhere...and even if we don't find one, I think exploring further will only lead to a better outcome :smiley:
Question on simulation, couldn't the roc effect interpreter just track a state? Have something that has a state and just keeps getting the next task? So when it first sees an Http.request task, it can modify it's internal state to return something different next time it is called? So fundamentally, the task interpreter walks over the task and has a state? Am I missing something here?
Also, really excited to read this when you write it up:
there's another way achieve this with 2-arg
Task- but that's the thing I haven't finished writing up yet :sweat_smile:
I agree with the concerns for Stored re. inducing implicit mutability. Another concern I have is that it potentially allows arbitrary modules to read away types you’ve stored away, without your knowledge - for example, suppose there is some library that exposes an opaque “PrivateToken” type that implements Stored (for use in testing) and a debug printing function over PrivateToken. Now, I construct some PrivateToken is some code path, and in a later but disjoint code path, use some library that, without my knowledge, Stored.read’s a PrivateToken. Without limiting the scope of the storage API, or capturing the side effects in the type system, you have no knowledge of this potential vulnerability.
fwiw I think there are a lot of benefits to “Task as a builtin” that are orthogonal to the “Stored API builtin”, and they can be considered independently, even if they play well together. So it may be worth breaking out this discussion into separate streams, since so far most of the discussion has been about Stored.
56 messages were moved here from #ideas > Task as builtin by Richard Feldman.
done!
I also wonder if the restriction of Stored in tests only is indeed that high. I don't think it would be too onerous to say, here is a module, Stored, that you can use to make testing Tasks easier, but you can only use it in expects. That's just my opinion of course, I am biased.
Ayaz Hafiz said:
I agree with the concerns for Stored re. inducing implicit mutability.
hm, so I don't quite follow this concern - Task is already a black box that can do anything, including arbitrary reads/writes to/from arbitrary state.
put another way: it's already possible in Roc to do everything Stored can do, except:
List U8, making it slower at runtime than Stored would beso I'm not seeing what Stored would change here :sweat_smile:
I totally get that making it more convenient to be able to access mutable state, but you have to use Task to get at it, would create an incentive to reach for Task more often - and I definitely see that as a downside!
but I don't understand the concern that this would change any fundamental characteristics or guarantees about Task, if that makes sense
Ayaz Hafiz said:
Another concern I have is that it potentially allows arbitrary modules to read away types you’ve stored away, without your knowledge - for example, suppose there is some library that exposes an opaque “PrivateToken” type that implements Stored (for use in testing) and a debug printing function over PrivateToken. Now, I construct some PrivateToken is some code path, and in a later but disjoint code path, use some library that, without my knowledge, Stored.read’s a PrivateToken. Without limiting the scope of the storage API, or capturing the side effects in the type system, you have no knowledge of this potential vulnerability.
well you can always have PrivateToken wrap the actual Stored opaque type, and then expose PrivateToken but not the Stored opaque type it wraps, and then that can't happen anymore.
if we're concerned about that, we could always have a compiler warning if you try to expose an opaque type you've given Stored, and suggest that instead you wrap it and expose the wrapper, and if you really want to give people access to the Stored primitives, then implement wrapper functions and expose those
maybe it would be worth it to imagine some scenarios where this could get out of hand? what would be a possible terrible misuse of Stored?
I think the basic shape of misuse is where you misuse it in the way that global mutable variables get misused, because that's basically what it wraps :big_smile:
there are cases where they're genuinely the nicest tool for the job, e.g. @Brendan Hansknecht I think you ran into a case recently where mutable references were desirable?
I'd expect a common way that Stored would be misused is to avoid threading arguments through functions
like "I could pass this new value through all these functions that already return Task, but that would take literally dozens of seconds of my life that I'll never get back, so instead I'll make a new Stored thing, have the first function write to it, have the function at the end of the chain read from it, and that will have been faster to implement, plus then I don't have to look at the extra argument in the type signatures"
I'd consider that worse because now it's harder to tell which code paths might have altered that value. When I'm passing it through as an argument, I can see exactly which functions might possibly have altered that value—they're all right there in the call chain. As soon as I put it in Stored to avoid that, now any function which can run a Task can potentially affect its value.
To me, it would have been better for long-term maintainability to just do the function threading.
My concern is exposing this without any visibility of what may be going on, so getting at needing effects in the type system (a-la a third type parameter as Brendan mentioned). I agree you can do all this today. But to do it today you must opt in since only the platform can provide it; with this proposal on its own, there is no way to see what Storage APIs a library you might want to use accesses.
well you can always have PrivateToken wrap the actual Stored opaque type, and then expose PrivateToken but not the Stored opaque type it wraps, and then that can't happen anymore.
if we're concerned about that, we could always have a compiler warning if you try to expose an opaque type you've given Stored, and suggest that instead you wrap it and expose the wrapper, and if you really want to give people access to the Stored primitives, then implement wrapper functions and expose those
I’m not sure this is enough if the wrapped type exposes a function to show its representation - for example, an “Inspect” implementation that prints the value representation. I agree we could provide warnings, make a convention, and perhaps it’s not something that would happen that often. But I wonder if there is a better design here, that rules it out altogether, or makes opting into potential uses of these behaviors (as a user of these libraries, not an implementor) more explicit.
Ayaz Hafiz said:
I’m not sure this is enough if the wrapped type exposes a function to show its representation - for example, an “Inspect” implementation that prints the value representation.
hm, how would I use that to get access? :thinking:
like let's say I have PrivateToken := StoredToken and StoredToken has Stored but PrivateToken doesn't
and I expose PrivateToken but not StoredToken
how would the fact that it happens to wrap a Stored type change what anyone outside the module can do with PrivateToken?
Ayaz Hafiz said:
My concern is exposing this without any visibility of what may be going on, so getting at needing effects in the type system (a-la a third type parameter as Brendan mentioned).
I see - so I get the concern conceptually, I'm just not seeing any practical impact. :big_smile:
like for example, let's say I know "this Task can do HTTP" versus "this Task can do HTTP and also potentially read from/write to a global variable" - what is an example of a decision that I make differently based on this knowledge?
I might be missing something, but kinda seems like non-actionable trivia
maybe I misunderstood your initial example, but I guess I don't see the utility of wrapping PrivateToken wrapping StoredToken. When would you used StoredToken then?
only inside the module that defines it
what is the utility of that?
well whatever I'd planned to use it for haha :big_smile:
I'm taking it as a given that there's some desire to put Stored on PrivateType
Well then we have the same concern, right? Because then you can Stored.read the PrivateToken
maybe i'm not following
haha I think I should write this out more thoroughly! 1 sec
Richard Feldman said:
I might be missing something, but kinda seems like non-actionable trivia
I'm thinking of the following case:
Aws.roc for this. I store this away for convinience for the lifetime of the request.Stored.read on the AWS access token because it also imports Aws.roc and the types it exposes. Now library F may be able to read or use my access token, without my explicit concent.ok great example!
clarification question: who defines the private token type, me or Aws.roc?
(either works, I'm just not sure which to assume)
I think Aws.roc in this example
ok cool, so in that case there's no extra wrapping necessary. Here's Aws.roc:
interface Aws
exposes [SecretKey, storeInS3]
imports [Http]
SecretKey := Str
TempToken := [Uninitialized, Initialized Str]
implements Stored
secretKeyFromStr: Str -> SecretKey
secretKeyFromStr: = @SecretKey
storeInS3 : SecretKey, DataForS3 -> Task {} Http.Err
storeInS3 = \@SecretKey secretKey, data ->
tempToken <- getOrInitTempToken secretKey |> Task.await
# use the temp token to call S3, since that's what S3 requires
# if the S3 response indicates the temp token was expired,
# run getOrInitTempToken again and re-run the S3 request
# with that new token
# note: this is not exposed!
getOrInitTempToken : SecretKey -> Task Str Http.Err
getOrInitTempToken = \@SecretKey secretKey ->
@TempToken tempToken -> Stored.read
when tempToken is
Uninitialized ->
# assume getNewTempToken has been implemented
str <- getNewTempToken secretKey |> Task.await
{} <- Stored.write (@TempToken str)
Task.succeed str
Initialized str -> Task.succeed str
so the TempToken opaque type is never exposed, but still allows us to present a public-facing storeInS3 function that looks like a normal API that doesn't require this stateful temporary token concept that AWS has
in other words, it just asks for the secret key and that's it; users of this API don't even need to know that AWS does all this temporary token stuff
btw of note, this exact use case is in my mind the #2 selling point of Stored (#1 being simulation tests)
also note that this is a good example where saying "the host can implement this" is much worse for security than Stored
like yeah a host can offer a key/value store like write : Str, List U8 -> Task {} * and read : Str -> Task (List U8) *
but for Aws.roc to use that, it would have to both say "I need the platform to offer both HTTP as well as a key/value store like this" and then also it would have to pick a specific Str to use as a key for that
Yeah that example makes sense. I agree you can design an API in this manner and it works around the problem.
My concern is, is the current Stored API the pit of success? Like I wonder if there is an alternative here that eliminates the potential to create a less-secure API in libraries like Aws.roc to begin with. In my mind, the first thought in a library designer's mind would be "let me expose SecretKey, and also an implementation of Stored for it, so users of my library can test with an arbitrary token in their unit tests!" - and it takes some knowledge of the best practice/security implications of the API/reading examples like the one you've provided to see that there is a better way.
I wonder if we can make "the best way" the default, or more actively push people in a safe direction, rather than leaving it up to convention that things should be defined this way.
at which point the original concern actually does apply: any other library which requires both HTTP as well as that key/value store can very much call the platform's read passing the same string key that AWS uses
so AWS is an example where it can be done safely and ergonomically using Stored, but I literally do not see a way to have it be both safe and ergonomic in current Roc
(and very unfortunately, the insecure way is more ergonomic, which means there would be demand for such a library despite the insecurity)
Ayaz Hafiz said:
I wonder if we can make "the best way" the default, or more actively push people in a safe direction, rather than leaving it up to convention that things should be defined this way.
yeah that's a great goal! To be honest, the "hey don't expose opaque types that have Stored" warning feels to me like it would accomplish that
I can't really think of a downside to be honest
Richard Feldman said:
so AWS is an example where it can be done safely and ergonomically using
Stored, but I literally do not see a way to have it be both safe and ergonomic in current Roc(and very unfortunately, the insecure way is more ergonomic, which means there would be demand for such a library despite the insecurity)
just to elaborate on this, I do think it's not the end of the world if Roc doesn't have AWS libraries that are as ergonomic as they are in mainstream languages (but as discussed on other threads, they would really be a lot less ergonomic) but I am genuinely concerned that the degree of painfulness will lead people to reach for insecure solutions that alleviate the user pain at the expense of introducing a vulnerability
so in that sense, if I had to predict, I would guess that in two hypothetical futures which differ only in that one has Stored in Roc (with the warning about exposing types with Stored), I would predict fewer successful security exploits of Roc programs than in the alternate world where everything is the same except there's no Stored
the "hey don't expose opaque types that have Stored" warning feels to me like it would accomplish that
I think that would work if we also introduce the following restrictions:
and maybe make it an error instead of a warning - that would push developers to only the secure API :sweat_smile:
Richard Feldman said:
there are cases where they're genuinely the nicest tool for the job, e.g. Brendan Hansknecht I think you ran into a case recently where mutable references were desirable?
I don't think Stored really helps my case. Like, yes, I do need something like a refcounted IORef, but whether I do that with a List {rc: Nat, data: MyType } in pure roc. Then just use Nat as my IORef type, or I wrapped that all in stored, it still really is all the exact same logic. Also, currently my code is totally pure with no tasks. So either way, it is adding a stateful wrapper. Either the state is via Stored in Task, or the state is just in my Evaluator type. Really all the same logic and hassle. Both cases, I have to manage a list of manually refcounted data and write code to find an reuse slots.
Ayaz Hafiz said:
and maybe make it an error instead of a warning - that would push developers to only the secure API :sweat_smile:
well, sure - I mean, warnings create a nonzero exit code, so they unavoidably block CI :big_smile:
and neither of them blocks you from running, since that's an explicit goal
so really the only distinction is whether roc dev treats them as blockers to running
and this seems like one that shouldn't block you roc dev from running in the sense that there's nothing broken, you just shouldn't deploy it like this :smiley:
Ayaz Hafiz said:
you cannot expose functions that reference non-exposed types in them
just to clarify: you mean functions that reference non-exposed types in their types, yeah?
if so, I totally agree - in general, I think we should give a warning if you expose anything whose type annotation includes a type that isn't exposed
that's always a mistake I think haha
another potential use of Stored just occurred to me: platform-agnostic caching
e.g. let's say I want to have a platform-agnostic logging library with different levels you can set at runtime (e.g. through an env var or config file, or maybe even while the program is running). You could implement it to parse an env var on every logging event to see what the current level is, or similarly with reading and parsing a config file, but it would be much faster to have the library cache the log level in (the equivalent of) a global mutable variable so you can access it quickly.
also the same security concerns would apply (e.g. with Stored, other libraries can't mess with the log level, but with the type of raw Str or List U8 key/value store a platform could implement, they totally could), although to be fair there's not much of a security risk to someone maliciously changing the log level...or at least not one I can think of :sweat_smile:
Hmm, but doesn't logging require platform support anyway? Is it logging to file, stdout, stderr, some other service like sending a web request of cached logs.
So setting the log level doesn't matter if you literally have nothing to log to anyway.
And if a platform is adding a logging api, I assume that would require some system to setting logging levels. Likely, you would not want a package to set it's own logging level. You would want to be able to configure the logging level for different packages and points in you code from the main app.
I just thought of a way to offer task simulation without Stored:
Task.simulate : List a -> Task a [SimEnded]
so each time the task runs, it gives the next element in the list, until the list runs out, at which point it gives the SimEnded error
(maybe there's a better name for it, since it doesn't strictly have to be used for simulation, but then again I can't think of another use for it!)
I also separately realized it would be possible to use Stored on opaque types defined in nested scopes, and those couldn't compile to plain global variables.
In fact, I'm not sure what they could compile to :thinking:
e.g.
\arg ->
Foo := U8 implements Stored { init: 0 }
@Foo num <- Stored.get |> Task.await
...
what would that compile to?
I guess one possible answer is that it's the same global mutable variable, which might be surprising but maybe also is fine
I thought fundamentally stored was going to compile to global mutable variables. That seems like the best way to make it performant and is fundamentally what Stored is.
Hmm though if roc can be called by multiple threads, it would need to be protected by a mutex or rwlock of some sort.
Or be thread local
yeah I think it needs to be protected by a lock
if it's threadlocal it limits what use cases it can be used for
also threadlocals require libc-like dependencies, which may not always be available
also if it's global and protected by a lock, for some cases (e.g. integers) we can theoretically potentially optimize into single instructions like atomic load/store/etc
thinking about this some more, I think this is actually a subtly error-prone thing to use in the main non-testing use case we've discussed, namely AWS-like APIs
if your entire webserver has exactly one AWS account, then it's fine
however, as soon as you have multiple AWS accounts, or - much, much worse - different threads want to use different request handlers - then suddenly this design becomes a source of errors and possibly also security vulnerabilities
because it's based on one global for the entire type, not one global value per request handler (or for that matter per thread, but the host is free to put different callbacks even within the same request handler on multiple different threads, so it's not like threadlocals would help here)
Is there a link to this proposal somewhere? Ive seen this Stored thingy referenced a few times now and Im curious :)
yes! It's in https://docs.google.com/document/d/1-h9bNNCLuYV2wSvjQA58SsGHOJivH9NHGr4wU_VF5I0/edit?usp=drivesdk
relatedly, I had an idea for how to make use cases like AWS SDK more ergonomic without needing Stored
the basic idea is to have the AWS SDK package require two more module params along the lines of :
get : Task token [NotFound] where token implements Decoding,
set : token -> Task {} * where token implements Encoding,
and then server platforms can provide a per-request-handler key/value store where both the keys and values are List U8, and the API exposes them in terms of Encoding and Decoding
so then as the application author, when importing the AWS SDK package, I give it module params which use this key-value store, but I give it functions that I've prepopulated to use my AWS-specific key
that way, the AWS SDK package gets the storage it needs, but - crucially - other (potentially malicious) packages can't access it at all
and the AWS package can't access storage I'm using for other things either, because I'm only passing it sandboxed functions which know how to access my application's AWS keys in the per-request-handler key/value store
so it's very slightly less ergonomic than Stored in that I have to write these two sandboxed one-liner functions once in my entire code base, and then specify them whenever I import the AWS module, but that's still far more convenient than havnig to thread the AWS temporary token through everywhere
and if I take the ergonomics delta between that and Stored, it's so small I think it would be hard to justify introducing Stored using that as a major motivating factor
I don't quite follow this. I think I'm almost there, but the per-request part has me a bit confused. Is the intent here for a cache that the platform provides to the application as a Task. That token value is going to be the same globally, so if it's set using set then every get will return that same token even across different threads.
kinda yeah
Without Stored, what would the equivalent in-memory cache look like if I wanted to store session keys and some meta data like userId or user roles?
If we are using Encoding and Decoding so that we are passing List U8 bytes to the platform for safe storage, I assume we are very quickly going to want that compact binary encoding for "all the things" so I can store any data and recover it quickly.
Stored would be a builtin, it couldn't possibly know about request handlers and couldn't possibly work this way, even though this is the least error-prone way to do itLuke Boswell said:
Without Stored, what would the equivalent in-memory cache look like if I wanted to store session keys and some meta data like userId or user roles?
it would be different in that it wouldn't be per-request-handler, but rather something more global. Kind of a different use case honestly
actually for AWS in particular I can see a lot of applications preferring it to be global. What's cool is that this design works for both use cases, and the library has the same API either way! :smiley:
Could stored be used for something like deno kv or would a more specific platform api be better for that use case?
Should those get and set functions also take a key?
I assumed that was just a specific API for the fictitious AWS package.
If we are making a (platform independent) package for KV then I assume it would require different API, maybe;
get : U64 -> Task value [NotFound] where value implements Decoding,
set : U64, value -> Task {} * where value implements Encoding,
Not sure if the key should just be a U64 or maybe something like where key implements Eq, Ord or something
yeah exactly
the platform would expose an API with both keys and values but the AWS package wouldn't care what key you're using (which is actively important for security!)
Is it too early to add an effect like this to the platforms? I'd like this for webserver in particular. I can use JSON for now and use it as a cache, assuming thats faster than starting a child process to call into sqlite from command line.
I think it's fine to add it now! :thumbs_up:
I don't think the platform should expose the encoding
I'm not sure about U64 keys though, might not be enough for some use cases :thinking:
maybe it's fine though?
I guess worth trying (since it's faster than e.g. keys that just implement Encoding) and seeing if it ends up being a problem in practice :big_smile:
I really like the simplicity of this approach, and it sounds wise to try this before something like Stored that might be harder to take back.
Other question... Can we do better than encode? Maybe just a box of anything? Though that isn't your safe.... Hmm
Doesn't matter for something that is a string anyway, but feels unnecessarily
It could be safe if you have a Key type with a type variable, couldn’t it?
but I guess you’d have to guarantee that the same key isn’t used twice with different types
Are keys global for the whole program or namespaced somehow?
I understand the platform can namespace per request or whatever makes sense, but you might still have conflicts across modules
I mean you could do this, but not sure how the key type having value some how fixes any type safety really. Also, it would mean that the key would need to store a dummy value, which is strange. I mean I guess it could be defined like an option and be the nothing case, but still strange
get : Key value -> Task (Box value) [NotFound]
set : Key value, Box value -> Task {} *
I understand the platform can namespace per request or whatever makes sense, but you might still have conflicts across modules
Probably wrap get and set when passing them to a module. That way you could make a wrapping key type. Would require a more complex key type than just an integer.
Yeah, so I’d have a init function that you use like:
import pf.State
key : State.Key (Result Str [Pending])
key = State.init “unique key here” (Err Pending)
and then you can only get/set using the right type
But it would break if the same key is used somewhere else with a different type
It’d be nice to have the platform/language come up with the key using a sequence or something
I might just be reinventing Stored :big_smile:
Using Encode/Decode is probably good enough for this
I'm a little confused where this ended up, are U64 keys ok? I went with that because thats what we use for Hash and figured you could just hash anything on the platform side before passing to the host. So the Roc to Host interface is just U64 and List U8.
Yeah I'm confused about how U64 could possibly not be enough, at least on a 64-bit machine.
I think it could be not enough in a scenario where someone is using arbitrary strings (e.g. domain names) for the keys, but maybe that's just not something that should be supported - like if you want that, store a dictionary in the value under a hardcoded U64 key
Oh I see what you mean.
Richard Feldman said:
ok cool, so in that case there's no extra wrapping necessary. Here's Aws.roc:
interface Aws exposes [SecretKey, storeInS3] imports [Http] SecretKey := Str TempToken := [Uninitialized, Initialized Str] implements Stored secretKeyFromStr: Str -> SecretKey secretKeyFromStr: = @SecretKey storeInS3 : SecretKey, DataForS3 -> Task {} Http.Err storeInS3 = \@SecretKey secretKey, data -> tempToken <- getOrInitTempToken secretKey |> Task.await # use the temp token to call S3, since that's what S3 requires # if the S3 response indicates the temp token was expired, # run getOrInitTempToken again and re-run the S3 request # with that new token # note: this is not exposed! getOrInitTempToken : SecretKey -> Task Str Http.Err getOrInitTempToken = \@SecretKey secretKey -> @TempToken tempToken -> Stored.read when tempToken is Uninitialized -> # assume getNewTempToken has been implemented str <- getNewTempToken secretKey |> Task.await {} <- Stored.write (@TempToken str) Task.succeed str Initialized str -> Task.succeed str
If I understand the AWS example correctly, it would not necessarily use the token fitting the secret key. As I see it, the storeInS3 function returns a Task that will use the token that is stored at the time the task is executed. This token must not match the secret key. The implementation could be fixed by storing a Dict from secret key to token.
At the end, as @Brendan Hansknecht said, this impurity of using the wrong token is not reflected in the Task type.
@Richard Feldman Do you know https://zio.dev? I can't find anything about ZIO here. It's an effect system for Scala and unlike the current Task type has a third type parameter called Environment. If you like to include state in the execution of your ZIO type, you have to specify a ZState type in the Environment.
So essentially ZIO is an example of a 3-arg Task type @Brendan Hansknecht suggested.
Same for effect in TypeScript (https://effect.website/)
that example is old - see the thread starting here for the revised idea:
Richard Feldman said:
relatedly, I had an idea for how to make use cases like AWS SDK more ergonomic without needing
Stored
it doesn't need a third argument to Task (which we've had in the past and intentionally decided to remove because it didn't seem to be worth the complexity it introduced) but also doesn't have a concern with token security!
Last updated: Jun 16 2026 at 16:19 UTC