Cache a pure function · ideas · Zulip Chat Archive

Stream: ideas

Topic: Cache a pure function

Oskar Hahn (Dec 29 2024 at 20:45):

This year on AoC there where some days, that had to be solved by caching the result of a function.

I would be nice, if this would be easier with Roc. You would guess, that it is easy for pure functions, but it gets difficult on recursive functions.

If you want to cache the results for this function:

foo = \argument, n ->
    if n == 0 then
        argument
    else
        foo (doStuff argument) (n-1)

To cache this, you have to give the cache to the function, and I don't know, if you can write it tail recursive.

foo = \argument, n, cache ->
    when Dict.get cache (argument, n) is
        Ok v -> (v, cache)
        Err KeyNotFound ->
            (v, updatedCache) =
                if n == 0 then
                    argument
                else
                    foo (doStuff argument) (n-1) cache
            newCache = Dict.insert updatedCache (argument, n) v
            (v, cache)

I don't know, what a good syntax would look like. I know, that there are currently no decorators:

foo = @cached \argument, n ->
    if n == 0 then
        argument
    else
        foo (doStuff argument) (n-1)

foo = @|argument, n| ->
    if n == 0 then
        argument
    else
        foo (doStuff argument) (n-1)

And that you don't like for comments to change the functionality:

# !cached
foo = \argument, n ->
    if n == 0 then
        argument
    else
        foo (doStuff argument) (n-1)

But since it is a strong argument for a pure function, that you can cache it, it would be nice, if there was a way to do it in Roc.

Brendan Hansknecht (Dec 29 2024 at 20:54):

Yeah, theoretically, a minor wrapper could be used for non-recursive functions, but for recursive functions there is no good solution that I can think of.

Dawid Danieluk (Dec 29 2024 at 21:03):

Couldn't it be 'special' function that would only cache for params used in callsite?

fib = \number ->
   ...

bar = \_ ->
   val = cached(fib 150)

this would cache fib only for value of 150, if you wanted to cache it recursively you'd have to write fib like this

fib = \number ->
   if number <= 1 then
      number
   else
      cached(fib number-2) + cached(fib number-1)

I think it gives plenty of control and doesn't introduce new keywords for roc users.
Every talk about functional programming mentions how cool pure functions are so having something like that to actually use their advantages would be pretty nice. I don't think roc has optional params, if it did it would be nice to add optional age here aswell.

BTW. I think that this code would make the shortest memoized fibonacci in any language and be pretty cool party trick while solving AOC style problems

Brendan Hansknecht (Dec 29 2024 at 21:13):

This leads to a very big question: can roc apps have global state?

Currently the answer is no. When explicit threading a cache dictionary around the answer is also no. With something like an automatic cache annotation or function the answer is unclear.

Dawid Danieluk (Dec 29 2024 at 21:20):

I think it's pretty similar to opportunistic mutation. Does Roc support mutation? Well, no but also it kind of does. From end user perspective you can think of Roc as immutable.

This would be similar. Does Roc have global state? Well, it could for some features, but it should only be used with pure functions so I think it doesn't matter to the end user, right? There's no difference between calculating value of pure function and getting it from cache.

I really see opportunistic mutation and cache for pure functions as pretty smart optimizations that Roc could do underneath and even if they're there end users could still think of Roc as immutable stateless language (if that makes sense).

Brendan Hansknecht (Dec 29 2024 at 21:21):

This is different though cause global state affects platforms and linking

Brendan Hansknecht (Dec 29 2024 at 21:21):

Also leads to questions around multithreaded apps

Dawid Danieluk (Dec 29 2024 at 21:31):

Yea I guess implementing it can be hard, not going to argue about that. I've no clue how Roc code is structured internally but I thought that this feature should be on Roc side and not on platform side (you probably see some issues with that and I'm not very good low level programmer so I don't have much insight to offer here).

Maybe my mental model is incorrect, but in my mind platforms are responsible for effects and memory allocations. Platforms already need to expose methods to allocate memory anyway so I thought having such cache wouldn't require any changes in platforms.

Initially I've envisioned some sort of actor model with one actor per function cached (so if cached was used on fib and bar roc would spawn 2 actors holding something like Map<Params, Result>).

Brendan Hansknecht (Dec 29 2024 at 21:35):

2 per thread? Like if it was used in basic webserver?

Dawid Danieluk (Dec 29 2024 at 21:39):

each actor in it's own thread (not sure how basic-webserver works, I've only used basic cli for now)

Richard Feldman (Dec 29 2024 at 21:44):

some relevant things to note:

although this is global state, in the context of hot code loading, the fact that it's specifically cache state means we can discard it without affecting correctness at all. This makes it different from other types of global state!
I definitely don't think we should add features to the language if advent of code is the only real-world motivating use case :big_smile:

Dawid Danieluk (Dec 29 2024 at 22:08):

AOC was just the first thing that came to my mind, because most modern apps aren't very much CPU bound. I think it would be generally useful and would actually show that having functions be pure functions has benefits.
I'm pretty sure that you specifically said in one of your talks that one benefit of pure functions is that they're easy to cache. It would be nice to be able to use such cache in pure function land instead of caching them using platform effects (and turning fib into fib!).

It's hard to come up with specific examples because you wouldn't need such cache very often, but if you did it would be really nice to have it.
Some examples of when it could be useful:

path finding
some math algorithms (fib, factorial, probably plenty of bigint math in cryptocurrency land)
chess engines

It doesn't have to be Roc feature (can be moved to platforms), but I think that it would fit well with 'modern functional language' (so language that actually uses advantages of pure functions).

Would it be worth it to spend time on it now though is different question. I think that Roc is undergoing plenty of changes atm and this is much less important than language stability, concurrency etc, so it depends on how hard it would be to achieve and if it's something that maintainers would found at least interesting to explore :-)

Brendan Hansknecht (Dec 29 2024 at 22:20):

I'll have to think more if this has implications for the surgical linker or not.

Also, I'm definitely not sold that we can make a solution that will be nice in both single thread and multithreaded contexts without being problematic.

Oskar Hahn (Dec 29 2024 at 22:52):

A cache is very easy for non recursive functions.

Many recursive functions are written, to simulate a for loop.

In a world where roc supports loops, will it be easy to cache such a function?

Dawid Danieluk (Dec 29 2024 at 22:55):

I think that pretty easy way to hack such feature would be using in-memory RocksDB (which should play nice in both single threaded and multi threaded environments). It would offload many issues with implementation to already battle tested db and such backend could always be changed in the future.
It also has TTL functionality for cache invalidation and Roc using Rocks has some meme potential aswell :P

No idea what implications for linker that would bring though (and it's extra dependency that you might not want).

Brendan Hansknecht (Dec 29 2024 at 22:59):

That is an exceptionally heavy handed solution. Also, anything with a DB will have a lot more overhead than is necessary for a simple cache

Dawid Danieluk (Dec 29 2024 at 23:01):

Just read that rocksdb has pretty high memory usage even for empty DB (so in this case SQlite would be better).
It's true that it's pretty heavy, I was thinking about it more as prototype to hack such feature and test how it feels (and whether it's worth it to have it in the language).

Brendan Hansknecht (Dec 29 2024 at 23:02):

I totally thinking using sqlite or rocks db in roc as a cache is reasonable, but I think that level of cache should definitely be explicit and via platform effects.

Dawid Danieluk (Dec 29 2024 at 23:09):

Yea, not sure why I went to rocksdb, should have proposed some off the shelf cache crate like moka :P

Luke Boswell (Dec 29 2024 at 23:43):

Could this kind of capability be an opt-in thing by platforms?

I imagine when we pass in the allocators (in the future design), there could be a flag/functions passed in for roc to use a generic cache in the platform. The platform is responsible for managing any multithreading etc.

Luke Boswell (Dec 29 2024 at 23:44):

If the platform passes in Chache::NotAvailable or similar then roc just doesn't cache and instead recalculates.

Luke Boswell (Dec 29 2024 at 23:47):

Maybe something like this is provided to roc

roc_set_cache : U64, List U8 -> Result {} [NotAvailable]
roc_get_cache : U64 -> Result (List U8) [NotAvailable]

Brendan Hansknecht (Dec 30 2024 at 02:23):

Quite possibly, but if you don't know how it works under the hood (in mem vs on disk, total size, retention policy) then it isn't really useful except in the most naive cases.

Scally (Dec 31 2024 at 13:46):

I've also ran across this in AoC and it was one of the very few cases where I switched away from Roc to Python because I was pretty confident memoization would very cheaply solve the problem but it was terribly complicated to implement in Roc (due to the recursive nature of the function).

This was especially frustrating since it should be a major advantage of a pure functional language and Richard said in multiple talks that basically a pure function could boil down to a lookup table so it seems rather straight forward to be able to dynamically build that lookup table when running the program.

In terms of purity, immutability and side effects - a memoization would ideally have a mutable datastore in the background and writing to that cache is kind of a side effect, however, in terms of purity the guarantee is still given that the same input results in the same output, just the way to reach that output changes (table lookup vs calculation).
Also the mutable state and the side effect would be internal detail of the implementation and not exposed at all and be not perceivable from anywhere.

Since multithreading was an issue - as far as I understand there should not necessarily be an issue with race conditions etc. since in worst case a function that might already be cached would be still ran again without cache but still yield the same result (although that would make runtime performance a bit unpredictable). In theory every thread could have it's own memoization structure and it should be possible to merge the caches of different threads since with functions being pure they must arrive at the same results.
The only issue I could see is having some data structure that allows parallel writes / merges without breaking the whole thing.

Brendan Hansknecht (Dec 31 2024 at 19:07):

I've been thinking about this more here are a few leading thoughts:

Let's totally ignore the multithreaded case for now. Any caching will be single threaded or thread local. Roc does not yet have any sort of atomic refcounting infrastructure, so there is no safe way to implement multithreaded caching. (This is not completely true. If you cache outside of roc, like in sqlite or rockdb, it can be multithreading safe, but that should be left only to platforms).
If we are adding caching to roc, it almost certainly needs two variants. One with unlimited memory. Another that will limit the number of items cached. (both are possible in roc, but a doubly linked list for a tradition limited size lru cache is not implementable in the normal way)
This could be implemented in pure roc without any sort of magic, but is a bit verbose. It could be implemented nearly automatically for a flat function, but it can not be implemented automatically for recursive functions.
If implemented in pure roc, the cache state will need to be passed around manually and will not be able to live beyond what the platform allows.
If implemented via the compiler, we could automatically apply to any function, the state could be implicitly added, and the state could be a threadlocal static to automatically make it last longer.
This would be the first case of roc holding any sort of state if done automatically by the compiler. Maybe that is ok, but I'm not sold. That definitely falls into the domain of the platform today.

Brendan Hansknecht (Dec 31 2024 at 19:08):

My current thought is that an automatic transform that still requires the user to pass around the cache is likely what fits roc best. We can re-evaluate in the future if it is valuable for roc to implicitly pass around and own the state.

Brendan Hansknecht (Dec 31 2024 at 19:12):

So it would look something like:

cached_foo = some_compiler_magic(foo)

var cache_ = Cache.empty() # or Cache.with_max_capacity(2000)

(cache_, out) = cached_foo(cache_, ... other args for foo)

Brendan Hansknecht (Dec 31 2024 at 19:16):

If cache has no max capacity, it is just a Dict. If it has a max capacity, not sure how we will implement it. I'm not really a fan of traditional lrus with doubly linked lists (and they don't necessarily work in roc). Maybe a bucketed dictionary where each bucket is a mini flat lru...but that is just an implementation detail that slightly changes the cache hit rate.

Dawid Danieluk (Dec 31 2024 at 20:10):

I personally would prefer solution handled by compiler as it can be more ergonomic (no need to carry cache around) and some syntax (or keyword) around it. I think it would be really fitting for modern functional language.
The second thing I'd reach for would be using some platform cache (I'll probably want custom platform for some things anyway so adding cache should be easy).

Passing around cache doesn't seem very fun. Btw in your example couldn't some_compiler_magic just be a higher order function (takes in function and returns function that will first check the cache, then if needed calculate result, store it in cache and also return calculated value as your example)? Is it just 'bad' name or is there some reason it's not possible with regular Roc?
This solution seems pretty 'imperative' (you need to create wrapper functions, pass around cache) and imao it takes away some elegance of Roc code.

That said I'd totally understand implementing Cache as library (maybe even 'official' library) as it:

would not require changes to the language
could be example of making a package (would make a good blog post so people can contribute to the ecosystem once 0.1 lands)
probably would be much faster to implement

I'd probably be happiest with pseudofunction (like cached I've suggested) but that can always be reevaluated in the future (I'm sure there are more pressing issues, maybe it's worth to think about compiler solution after some multithreading support as it could change the design anyway).

Brendan Hansknecht (Dec 31 2024 at 20:23):

takes in function and returns function that will first check the cache, then if needed calculate result, store it in cache and also return calculated value as your example)?

This doesn't correctly work for recursive functions. It will only apply to the topmost calls to the function and not to the recursive calls. So it has to be compiler magic

Brendan Hansknecht (Dec 31 2024 at 20:27):

I personally would prefer solution handled by compiler as it can be more ergonomic (no need to carry cache around)

Sure, I think we should just start with it being explicit and then later re-evaluate making it implicit after we use the explicit version in practice.

Dawid Danieluk (Dec 31 2024 at 21:02):

Hmm, compiler magic that swaps function calls with another variant using cache (if I understood that correctly) might be harder to explain/understand for users than: "we have globally available cache that can store results of pure functions".
At this point I'd rather change the implementation of the function to use cache explicitly

fib = \n, cache ->
    when Dict.get cache n is
        Ok v ->
            (v, cache)

        Err _ ->
            if n < 2 then
                (n, Dict.insert cache n n)
            else
                (fib2, fib2cache) = fib (n - 2) cache
                (fib1, fib1cache) = fib (n - 1) fib2cache
                val = fib1 + fib2

                (val, Dict.insert fib1cache n val)

Unless such swapping functions could be used for other features (like simulating output of effectful functions, which would be totally a killer feature and I think I've seen some talks about it some time ago) idk if cost/benefit ratio of it would be worth it.

Brendan Hansknecht (Dec 31 2024 at 21:15):

Possibly

Brendan Hansknecht (Dec 31 2024 at 21:15):

Though clearly from the comments above on advent of code, people don't want to write that themselves

Brendan Hansknecht (Dec 31 2024 at 21:15):

Also, it would likely have an lru cache on top of just a dictionary to allow limiting the max size.

Brendan Hansknecht (Dec 31 2024 at 21:16):

But I guess that could just use a cache type instead of a dict type and still be equivalent in implementation to that

Brendan Hansknecht (Dec 31 2024 at 21:17):

I don't think it would be harder to explain. It would look just like a higher order function and would do what people expect it to do. It would be equivalent to an @cached decorator in python just with manual pipelining of the cache.

Richard Feldman (Dec 31 2024 at 22:06):

another few random thoughts:

it wouldn't necessarily have to be restricted to pure functions. Obviously if you did it with an effectful function, it wouldn't rerun the effect, but that might actually be exactly what you're looking for in some situations.
I'm generally against decorators and annotations, but even a version of this that was integrated into the compiler wouldn't need that. It could just be a keyword, like dbg or return, which takes an expression. In this case the expression would have to be a function, and it would evaluate to a function of the same type which had caching behavior built in.
I'm generally open to this being something that exists in the compiler someday, but I think there are a lot of questions about how the cache works and what specific problems would be solved by it that need to be sorted out first. For example, how are the function's keys checked for cache hits? Is it based on hashing, or equality? How is the cache bounded? Number of entries? What's the eviction policy? etc.

Brendan Hansknecht (Dec 31 2024 at 22:09):

Ok, I'm looking at the surgical linker side of this now. I think we could make this implicit without complications to the surgical linker. It will need to be expanded, but I think in a minor way.

So we can support creating an empty dictionary constant as a threadlocal and then updating it in place on each call to a function. It does not generate any problematic relocations.

It still has some major complexities around platform interactions, data ownership, and multithreading, but one piece is fine.

A simple example of a complexity is: in a plugin world, who frees the cache when closing one plugin to load another. I guess it would have to be an exposed function that all platforms must call before closing a shared library. Have to call just in case there are any allocations still kept alive by the cache.

Richard Feldman (Dec 31 2024 at 22:18):

another question is whether threadlocal storage is even the desired behavior

Richard Feldman (Dec 31 2024 at 22:19):

for example, if you have a platform that's suspending and resuming work across multiple threads, you could get more cache misses than if they were behind a lock or something

Brendan Hansknecht (Dec 31 2024 at 22:21):

Yeah.....threadlocal is simple but may not be wanted

Brendan Hansknecht (Dec 31 2024 at 22:21):

For example in basic-webserver, you almost certainly want a cache that can be used from many threads at the same time

Brendan Hansknecht (Dec 31 2024 at 22:24):

If I were to prescribe something generic, it would be some sort of sharded cache that can that uses shards to reduces total locking. It also would probably use a smaller buckets for the lru behaviour instead of a global or sharded lru. But bucket size would depend on how often collisions lead to dropping data too early.

Richard Feldman (Dec 31 2024 at 22:28):

one design that answers a bunch of implementation questions in a nice way is:

each function that gets used with this keyword gets a pointer global variable with a lock around it, initialized to null
on first use, it does a roc_alloc to create the cache and store its address in the global var, so the platform knows about the memory being used for it via existing mechanisms.
the cache key consists of not only the arguments but also the capture, so one global var per function is sufficient

Brendan Hansknecht (Dec 31 2024 at 22:28):

Or, I guess the shard size and bucket size could be the same now that I think about it more. And probably would be pretty small overall.

So the workflow would be hash first. Lock a specific buck. Check the bucket for a hit. On hit reorder the bucket into lru order. On miss, insert into the bucket removing the lru item.

This way, only a small number of items are locked at any one time to reduce contention. As long as hashing is roughly random, this should have good perf. The buck size may need to be increased if too many collisions are happening.

Brendan Hansknecht (Dec 31 2024 at 22:29):

the cache key consists of not only the arguments but also the capture, so one global var per function is sufficient

Captures are just arguments in roc once we get to mono. So that seems easy to do.

Brendan Hansknecht (Dec 31 2024 at 22:30):

each function that gets used with this keyword gets a pointer global variable with a lock around it, initialized to null

I think one lock per cache would be a big mistake. Way too much contention

Brendan Hansknecht (Dec 31 2024 at 22:33):

on first use, it does a roc_alloc to create the cache and store its address in the global var, so the platform knows about the memory being used for it via existing mechanisms.

This likely isn't enough without a lot of hassle. The platform generally does not track every allocation and it would have no way to differentiate a cache allocation from other allocations. While it could just have roc drop the model to get roc to free all of its data, it couldn't do the same for the cache.

Brendan Hansknecht (Dec 31 2024 at 22:34):

Also, for any of these ideas to work (expect for threadlocal or manual threading) a lock is not enough. It also needs atomic refcounts for the data stored in the cache.

Richard Feldman (Dec 31 2024 at 22:41):

Brendan Hansknecht said:

each function that gets used with this keyword gets a pointer global variable with a lock around it, initialized to null

I think one lock per cache would be a big mistake. Way too much contention

hm, what else would the lock be on? :thinking:

Brendan Hansknecht (Dec 31 2024 at 22:57):

You hash first, then you only lock a section of the cache instead of the full cache.

Brendan Hansknecht (Dec 31 2024 at 22:57):

That way many queries can happen in parallel

Richard Feldman (Dec 31 2024 at 23:06):

ah, I see...I guess as long as it's fixed size and never needs to resize, that could work!

Brendan Hansknecht (Dec 31 2024 at 23:07):

You can do it for resizable as well. Just each shard resizes separately. (Or new shards are added to resize)

Brendan Hansknecht (Dec 31 2024 at 23:08):

But yeah, works more cleanly with constant size

Richard Feldman (Dec 31 2024 at 23:08):

Brendan Hansknecht said:

on first use, it does a roc_alloc to create the cache and store its address in the global var, so the platform knows about the memory being used for it via existing mechanisms.

This likely isn't enough without a lot of hassle. The platform generally does not track every allocation and it would have no way to differentiate a cache allocation from other allocations. While it could just have roc drop the model to get roc to free all of its data, it couldn't do the same for the cache.

for what it's worth, I don't think this would be significantly different from other things Roc returns to the host. Like the host calls the Roc function, some allocations and deallocations happen, some number of allocations are left over from things returned...if the host doesn't have some way to identify all the allocations that happened inside Roc, then solving both problems feels like it would be the same category of solution

Brendan Hansknecht (Dec 31 2024 at 23:09):

A little different. It is trivial for a platform to expose an extra function that deallocates a model. Just takes a model as an arg and returns nothing.

Brendan Hansknecht (Dec 31 2024 at 23:10):

But yeah, not a big deal to also expose a special function to the platform to clear these caches.

Richard Feldman (Dec 31 2024 at 23:34):

yeah the main point is that from a lifetime perspective, the host has to take action on both at the same time, and the action is essentially the same in both cases :big_smile:

Richard Feldman (Dec 31 2024 at 23:34):

so shouldn't be more complex, at most an extra function call at the time it was already doing other function calls

Brendan Hansknecht (Dec 31 2024 at 23:39):

Yeah

Brendan Hansknecht (Dec 31 2024 at 23:39):

So I guess that is actually fine

Brendan Hansknecht (Dec 31 2024 at 23:39):

Just something host authors have to be aware of

Brendan Hansknecht (Dec 31 2024 at 23:42):

Also, If we add lazy propagation of the atomic bit on data load (for List.get, Box.unbox, and any recursive tag load), I think we could have really solid solution to optionally atomic refcounting. Mixed with #7444 the perf would be really solid even for the single threaded case. If wanted, we can also add a flag to opt into/out of atomic refcounting. Could add a platform header config field to decide if it is needed or not.

Brendan Hansknecht (Dec 31 2024 at 23:43):

So maybe we should think about the possibility for a mulithreaded cache with atomic refcounting of stored data.

Richard Feldman (Jan 01 2025 at 00:20):

yeah definitely sounds interesting!

Richard Feldman (Jan 01 2025 at 00:20):

I do wonder about what retention and hashing policies make sense

Brendan Hansknecht (Jan 01 2025 at 00:34):

I think some form of sharded lru should be fine.

Brendan Hansknecht (Jan 01 2025 at 00:40):

The biggest question in my mind is how sharded. The more sharded the less lock contention but also the shorter the lru history (though if the hash is perfectly distributed, it is still the same theoretically. Not sure how much it veers in practice).

Also, if you make the shard size very small (like shards of 32 elements for example), you would be able to avoid the doubly linked list for the lru cache and instead use a flat list. This would save a solid amount of memory. Though depends on the memcpy cost of the shard size number of indices. It would depend much more on the hash distribution, but might be a lot faster and essentially contention free in practice.

Brendan Hansknecht (Jan 01 2025 at 00:40):

But all of those kinds of optimizations could be tested later.

Brendan Hansknecht (Jan 01 2025 at 00:41):

We can start with a naive single lock or large shard lru cache.

Richard Feldman (Jan 01 2025 at 01:28):

should it be configurable how much history is saved?

Brendan Hansknecht (Jan 01 2025 at 01:29):

I think so.

Brendan Hansknecht (Jan 01 2025 at 01:29):

Otherwise it might be too oom prone depending on the use case.

Last updated: Jul 23 2026 at 13:15 UTC