number ranges without heap allocations · beginners

Can someone explain a way to make a generator for a long array of numbers that would just, well, make numbers? List.range is no good as it seems to allocate the actual list (even if I will only ever walk the thing)

Luke Boswell (Aug 26 2024 at 18:48):

Alexander Pyattaev (Aug 26 2024 at 18:51):

which part of it? it still uses List.range to make the numbers, so it will OOM if I want to loop over 10 billion numbers

Sam Mohr (Aug 26 2024 at 19:02):

If you want to loop over 10 billion numbers in a functional language, you'll want to write a recursive function that is tail call optimized into a "while loop" at runtime.

Alexander Pyattaev (Aug 26 2024 at 19:04):

Well I do not want 10billion, I just want 1 billion, and I get a segfault with this...

app [main] { pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.14.0/dC5ceT962N_4jmoyoffVdphJ_4GlW3YMhAPyGPr-nU0.tar.br" }

import pf.Stdout
import pf.Task

loop = \i, max ->
    if i < max then
        i + loop (i + 1) max
    else
        i

main =
    x = 1000 * 1000 * 1000
    z = loop 0 x
    # z = List.range {start: At 0, end: Before x} |> List.walk 0 Num.add
    Stdout.line! "Hello, World! $(Num.toStr z)"

Alexander Pyattaev (Aug 26 2024 at 19:05):

and yes, I am fully aware that this not what roc was designed to do, but I think it is still relevant for the language that aims at high performance to not allocate arrays on the heap that will be insta-discarded after 1 use. even python is not that wasteful =)

Alexander Pyattaev (Aug 26 2024 at 19:06):

Sam Mohr (Aug 26 2024 at 19:06):

Well, that's not gonna be tail-call optimized, since the looping happens in the middle of the function

Alexander Pyattaev (Aug 26 2024 at 19:06):

Brendan Hansknecht (Aug 26 2024 at 19:08):

It isn't a tail call cause addition has to happen after the next loop call returns.

Alexander Pyattaev (Aug 26 2024 at 19:09):

Brendan Hansknecht (Aug 26 2024 at 19:09):

loop = \sum, i, max ->
    if i < max then
        loop (sum + i) (i + 1) max
    else
        sum

Sam Mohr (Aug 26 2024 at 19:10):

Yeah, probably not. It might be segfaulting because of an addition overflow? I'd have to gdb it

Brendan Hansknecht (Aug 26 2024 at 19:11):

Sam Mohr (Aug 26 2024 at 19:11):

Sam Mohr (Aug 26 2024 at 19:12):

So something in our custom assembly builder isn't handling either a stack overflow or a number addition failure, probably the first

Alexander Pyattaev (Aug 26 2024 at 19:12):

Alexander Pyattaev (Aug 26 2024 at 19:14):

rustc  count.rs
headhunter@hunter-laptop /tmp> ./count
499999999500000000
headhunter@hunter-laptop /tmp> cat count.rs
fn main(){
   let s:u64 = (0u64..1000*1000*1000).sum();
    println!("{s}");
}

Alexander Pyattaev (Aug 26 2024 at 19:15):

I'm assuming that roc defaults to u64 for num representation here (not sure that is true, and even if it does not the program would just run forever)

Brendan Hansknecht (Aug 26 2024 at 19:15):

Alexander Pyattaev (Aug 26 2024 at 19:16):

loop = \sum, i, max ->
    if i < max then
        loop (sum + i) (i + 1) max
    else
        sum

Alexander Pyattaev (Aug 26 2024 at 19:17):

but I would not even be here if I could have a generator for this sort of thing that I can then walk over

Alexander Pyattaev (Aug 26 2024 at 19:18):

I mean rust has (min..max) notation that just gives you an iterator and you do not even need to think about it, would not be out of place in roc imo

Brendan Hansknecht (Aug 26 2024 at 19:20):

Yeah, iterators/streams have been talked about a few times on Zulip. There are concerns about splitting the ecosystem between lists and streams. But they are the right solution for certain problems.

Alexander Pyattaev (Aug 26 2024 at 19:22):

well its either that or better optimization passes that can detect when list is abused as iterator so it does not get backed by memory.

Alexander Pyattaev (Aug 26 2024 at 19:22):

either way it is a massive gotcha, as I would have never in my life expected List.range to allocate

Brendan Hansknecht (Aug 26 2024 at 19:22):

Brendan Hansknecht (Aug 26 2024 at 19:24):

Today, has gotchas but forces users to be explicit about allocations. Technically iterators can be built up in userland if required. But yeah, it isn't a great state for usability.

Alexander Pyattaev (Aug 26 2024 at 19:25):

Alexander Pyattaev (Aug 26 2024 at 19:26):

this way you can have functions taking in things that have Iterator ability rather than specifically Lists

Brendan Hansknecht (Aug 26 2024 at 19:28):

Alexander Pyattaev (Aug 26 2024 at 19:40):

Brendan Hansknecht (Aug 26 2024 at 20:09):

An ability with a type variable. In the case of an iterator, it is needed to capture the element type.

In roc, you could make an I64Iterator ability, but not a generic Iterator ability

Alexander Pyattaev (Aug 27 2024 at 06:38):

Well in this case maybe some methods on numeric types would be nice to make ranges?

Alexander Pyattaev (Aug 27 2024 at 06:38):

batyoullfly (Aug 27 2024 at 11:13):

It would? You can have type variables in the functions belonging to an ability, and there are iterator interfaces in languages like C#/F# and Java where there are no higher kinded types

batyoullfly (Aug 27 2024 at 11:24):

Iterator implements next : {} -> Result a [NoMoreElements] something like this?

batyoullfly (Aug 27 2024 at 11:26):

Or I guess there’s a rule about the type variables needing to implement the ability

batyoullfly (Aug 27 2024 at 11:34):

batyoullfly (Aug 27 2024 at 11:44):

Oh, well I guess these aren’t methods with implicit access to an iterator object so you would need to have the iterator as an argument even if that wasn’t being enforced. Hmm.
Couldn’t you still do something like
Iterator implements next : iter, a -> { state: iter, elem: Result a [NoMoreElements] } where iter implements Iterator
and just leave it up to the implementation to ensure that a is in fact the element type of the type implementing the ability?

batyoullfly (Aug 27 2024 at 12:07):

I guess then the issue is you couldn’t actually use it very effectively. Is there any reason abilities shouldn’t be able to take type parameters themselves as long as they are concrete types?

Brendan Hansknecht (Aug 27 2024 at 16:55):

Notification Bot (Aug 27 2024 at 18:41):

Richard Feldman (Aug 27 2024 at 18:44):

what if we added this to Num? (the ...etc part is the rest of the configuration record that List.range uses)

Num.walkRange :
    { start: [At (Int a), Before (Int a), After (Int a)], ...etc },
    state,
    (state, Num a -> state)
    -> state

Sam Mohr (Aug 27 2024 at 18:49):

I think this would be a great allocation-free way to achieve the result, but points to an underlying issue that streams would solve in many ways. We have

Because we want to have iterators, but don't want to support higher-kinded types. So having some mechanism to achieve that result would lower the cardinality of our APIs while also avoiding allocations.

Sam Mohr (Aug 27 2024 at 18:50):

I will try to look around this weekend at potential solutions that won't require our language to open the floodgates for type abuse

batyoullfly (Aug 27 2024 at 20:05):

Would allowing abilities to have type parameters be a form of higher ranked types?

batyoullfly (Aug 27 2024 at 20:23):

Would it be rank 2? Because instead of a function in an implementation being able to satisfy a signature of a -> a by accepting a specific type and returning that specific type, it would have to be able to accept any arbitrary type?

Sven van Caem (Aug 27 2024 at 20:23):

I believe, to this end, the plan is for the compiler to eventually support deforestation/stream fusion, which would automatically combine successive list operations to avoid intermediate allocations. This way List.walkBackwards could be replaced by List.reverse |> List.walk with no loss in performance. In the meantime though, having this expanded family of list operations acts as a stop-gap to achieve the desired performance characteristics anyway.

Sam Mohr (Aug 27 2024 at 20:43):

Yes, that was a mechanism proposed by Rich. It would probably be better for users to not need to know the difference between List and Iterators for perf reasons, but there are some spots where we avoid giving "easy ways out", e.g. roc-unicode being a separate package instead of a standard library add with simple Str.len tools.

Sam Mohr (Aug 27 2024 at 20:45):

So yes, I'm okay with the expanded family for now, but it seems like if we could find a way to do iterators, that would be better than what we have. Meaning the issue isn't that iterators are the wrong solution, but that we don't know how to implement them in Roc's current type system.

Richard Feldman (Aug 27 2024 at 21:02):

for what it's worth, I spent a lot of time looking into various iterator designs and my conclusion was that "there are more functions exposed" was a better design than any of the ones I came up with

Richard Feldman (Aug 27 2024 at 21:05):

maybe another way to say it is that I prefer the downside of "this API could be a few functions smaller" to the downside of "there is a new entire chapter's worth of complexity in the tutorial on iterators"

Richard Feldman (Aug 27 2024 at 21:09):

that's not to say we should never do it, just that it's in the "explored various designs and none seemed like they were worth the complexity cost" state rather than the "hasn't really been investigated seriously" state :big_smile:

Sam Mohr (Aug 27 2024 at 21:19):

Yeah, that tracks. If there was an easy way to do it, it would've been in the language a few years ago.

Sam Mohr (Aug 27 2024 at 21:20):

Which is why I think maybe some macro to combine the various walk* functions in the compiler would be the way to go. Duplication in the compiler isn't bad if the stdlib API is nicer to use.

batyoullfly (Aug 27 2024 at 21:39):

Other than the number of functions for dealing with Lists, lack of any way to implement iterators still means there’s no generic way of writing a function that works with an unknown type of collection, right?

Brendan Hansknecht (Aug 27 2024 at 21:41):

Or maybe it is done for the future version of encode and decode that aren't implemented yet

Brendan Hansknecht (Aug 27 2024 at 21:46):

SequenceWalker state seq elem : seq, state, (state, elem -> state) -> state
LengthInfo : [Length U64, UnknownLength]

# In the encode ability for sequences:
    sequence :
        seq,
        LengthInfo,
        SequenceWalker state seq elem,
        (elem -> FutureEncoder state)
        -> FutureEncoder state where state implements FutureEncoderFormatting

Brendan Hansknecht (Aug 27 2024 at 21:46):

You then can pass List.walk or Set.walk into that and it would work. Theoretically could also pass a lazy Iterator.walk.

Brendan Hansknecht (Aug 27 2024 at 21:47):

That said, it is technically missing information. It doesn't know the type of the element in the container. It just knows you have an impl of Container.walk that generates an element type. Technically that walk function could be mapping the element type to something different

batyoullfly (Aug 27 2024 at 21:49):

So I guess similar to what I asked about earlier with having an iterator ability, you could do operations on the collection but you couldn’t, for example, implement a sum function that works for all collections of Nums

Richard Feldman (Aug 27 2024 at 21:51):

it seemed like something that the language "should be able to do" in the abstract, but when I got down to it I honestly had more and more trouble convincing myself it was actually important in practice as opposed to something that sounds important in the abstract :sweat_smile:

Brendan Hansknecht (Aug 27 2024 at 21:51):

Yeah, and you have to be careful about some weird cases. Like if you require elem as an input to specify the input type, that may lead to impossible cases the user doesn't have a way to generate an element type

batyoullfly (Aug 27 2024 at 21:52):

I was kind of thinking the same. Like in my mind it seems frustrating and strange that you wouldn’t be able to operate on generic collections of some specific type, but also I have never needed to do that really so it wouldn’t change anything for me in practice using the language

Brendan Hansknecht (Aug 27 2024 at 21:53):

Aside, isn't sequence walker almost the entire definitely or an iterator? Besides making apis inconvenient cause you have to pass in a input type and a walk function, I think it would work for most things iterators

Brendan Hansknecht (Aug 27 2024 at 21:53):

You could make a version that uses Num a as the element type and implement sum

batyoullfly (Aug 27 2024 at 21:56):

Oh? That’s cool, I think I don’t quite understand the usage of the example you gave then. I guess in situations where you’re writing library code and really anticipate people wanting to swap in their own data structure rather than using lists, you could always provide an alternate API for doing so, even if it’s a little awkward, because most of the time you’ll just use a list anyway

Brendan Hansknecht (Aug 27 2024 at 21:57):

Yeah, if you expect that you need to support multiple container types and you only need to walk the container. You could force the user to pass on the walk function and it would be functional. And they could even define a "container" that is really an iterator and doesn't have backing storage.

Brendan Hansknecht (Aug 28 2024 at 02:28):

So I guess iterables can actually work today in roc. A bit of a strange setup, but should be functional. Also, the typing is a bit jank. It does work though..

Also, I'm sure something nicer could be made from this. Just the first thing I tested really:

Functional iterator

app [main] { pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.14.0/dC5ceT962N_4jmoyoffVdphJ_4GlW3YMhAPyGPr-nU0.tar.br" }

import pf.Stdout
import pf.Task exposing [Task]

Walker state elem : state, (state, elem -> state) -> state

Iterable implements
    createWalkFn : iter -> Walker state elem where iter implements Iterable

IterList elem := List elem implements [
        Iterable {
            createWalkFn: createIterListWalkFn,
        },
    ]

createIterListWalkFn = \@IterList list ->
    \state, step -> List.walk list state step

Range := { start : I64, stop : I64 } implements [
        Iterable {
            createWalkFn: createRangeWalkFn,
        },
    ]

createRangeWalkFn = \@Range { start, stop } ->
    helper = \state, step, i ->
        if i < stop then
            helper (step state i) step (i + 1)
        else
            state

    \state, step ->
        helper state step start

main =

    a = @IterList [1, 7, 8]
    walkA = createWalkFn a

    printA =
        walkA (Task.ok {}) \prev, elem ->
            prev!

            Num.toStr elem
            |> Stdout.line

    printA!

    b = @Range { start: 7, stop: 22 }
    walkB = createWalkFn b

    printB =
        walkB (Task.ok {}) \prev, elem ->
            prev!

            Num.toStr elem
            |> Stdout.line

    printB!

Brendan Hansknecht (Aug 28 2024 at 02:29):

Since abilities get monomorphize, roc will generate the concrete Walker type that is created by a walk function. Thus, later uses will type check/error correctly based on the concrete final types.

Brendan Hansknecht (Aug 28 2024 at 02:33):

Iterator implements
    next : iter -> (iter, Result elm [NoMoreElements]) where iter implements Iterator

Brendan Hansknecht (Aug 28 2024 at 02:35):

Of course you really need an extra layer of indirection. Cause an iterator is not just a container. It is general a container plus some extra state. But I guess it could be paired with:

Iterable implements
    toIter : iterable -> iterator where iterable implements Iterable, iterator implements Iterator

This definitely is throwing out some type information. Like we have no idea what the element type is from the signature, but I think it would still compile and work in roc.

Stream: beginners

Topic: number ranges without heap allocations

Alexander Pyattaev (Aug 26 2024 at 18:46):

Luke Boswell (Aug 26 2024 at 18:48):

Alexander Pyattaev (Aug 26 2024 at 18:51):

Sam Mohr (Aug 26 2024 at 19:02):

Alexander Pyattaev (Aug 26 2024 at 19:04):

Alexander Pyattaev (Aug 26 2024 at 19:05):

Alexander Pyattaev (Aug 26 2024 at 19:06):

Sam Mohr (Aug 26 2024 at 19:06):

Alexander Pyattaev (Aug 26 2024 at 19:06):

Brendan Hansknecht (Aug 26 2024 at 19:08):

Alexander Pyattaev (Aug 26 2024 at 19:09):

Brendan Hansknecht (Aug 26 2024 at 19:09):

Sam Mohr (Aug 26 2024 at 19:10):

Brendan Hansknecht (Aug 26 2024 at 19:11):

Sam Mohr (Aug 26 2024 at 19:11):

Sam Mohr (Aug 26 2024 at 19:12):

Alexander Pyattaev (Aug 26 2024 at 19:12):

Alexander Pyattaev (Aug 26 2024 at 19:14):

Alexander Pyattaev (Aug 26 2024 at 19:15):

Brendan Hansknecht (Aug 26 2024 at 19:15):

Brendan Hansknecht (Aug 26 2024 at 19:15):

Alexander Pyattaev (Aug 26 2024 at 19:16):

Alexander Pyattaev (Aug 26 2024 at 19:17):

Alexander Pyattaev (Aug 26 2024 at 19:17):

Alexander Pyattaev (Aug 26 2024 at 19:18):

Alexander Pyattaev (Aug 26 2024 at 19:18):

Brendan Hansknecht (Aug 26 2024 at 19:20):

Alexander Pyattaev (Aug 26 2024 at 19:22):

Alexander Pyattaev (Aug 26 2024 at 19:22):

Brendan Hansknecht (Aug 26 2024 at 19:22):

Brendan Hansknecht (Aug 26 2024 at 19:24):

Alexander Pyattaev (Aug 26 2024 at 19:25):

Alexander Pyattaev (Aug 26 2024 at 19:26):

Brendan Hansknecht (Aug 26 2024 at 19:28):

Brendan Hansknecht (Aug 26 2024 at 19:28):

Brendan Hansknecht (Aug 26 2024 at 19:28):

Alexander Pyattaev (Aug 26 2024 at 19:40):

Brendan Hansknecht (Aug 26 2024 at 20:09):

Alexander Pyattaev (Aug 27 2024 at 06:38):

Alexander Pyattaev (Aug 27 2024 at 06:38):

batyoullfly (Aug 27 2024 at 11:13):

batyoullfly (Aug 27 2024 at 11:24):

batyoullfly (Aug 27 2024 at 11:26):

batyoullfly (Aug 27 2024 at 11:34):

batyoullfly (Aug 27 2024 at 11:44):

batyoullfly (Aug 27 2024 at 12:07):

Brendan Hansknecht (Aug 27 2024 at 16:55):

Notification Bot (Aug 27 2024 at 18:41):

Richard Feldman (Aug 27 2024 at 18:44):

Sam Mohr (Aug 27 2024 at 18:49):

Sam Mohr (Aug 27 2024 at 18:50):

batyoullfly (Aug 27 2024 at 20:05):

batyoullfly (Aug 27 2024 at 20:23):

Sven van Caem (Aug 27 2024 at 20:23):

Sam Mohr (Aug 27 2024 at 20:43):

Sam Mohr (Aug 27 2024 at 20:45):

Richard Feldman (Aug 27 2024 at 21:02):

Richard Feldman (Aug 27 2024 at 21:05):

Richard Feldman (Aug 27 2024 at 21:09):

Sam Mohr (Aug 27 2024 at 21:19):

Sam Mohr (Aug 27 2024 at 21:20):

batyoullfly (Aug 27 2024 at 21:39):

Brendan Hansknecht (Aug 27 2024 at 21:41):

Brendan Hansknecht (Aug 27 2024 at 21:41):

Brendan Hansknecht (Aug 27 2024 at 21:46):

Brendan Hansknecht (Aug 27 2024 at 21:46):

Brendan Hansknecht (Aug 27 2024 at 21:47):

batyoullfly (Aug 27 2024 at 21:49):

Richard Feldman (Aug 27 2024 at 21:51):

Richard Feldman (Aug 27 2024 at 21:51):

Richard Feldman (Aug 27 2024 at 21:51):

Brendan Hansknecht (Aug 27 2024 at 21:51):

batyoullfly (Aug 27 2024 at 21:52):

Brendan Hansknecht (Aug 27 2024 at 21:53):

Brendan Hansknecht (Aug 27 2024 at 21:53):

batyoullfly (Aug 27 2024 at 21:56):

Brendan Hansknecht (Aug 27 2024 at 21:57):

Brendan Hansknecht (Aug 28 2024 at 02:28):