Stream: ideas

Topic: ✔ Change List.split to List.splitAt?


view this post on Zulip mdznta (Dec 08 2023 at 03:21):

Hey all, there is an issue #6164 that brings up that Str.split and List.split behave somewhat differently and proposes changing List.split to possibly List.splitAt and then introducing a List.split that more closely resembles Str.split. Does anyone have any strong opinions when it comes to this? I personally like the idea of changing List.split.

view this post on Zulip Isaac Van Doren (Dec 08 2023 at 03:31):

I am very in favor!

view this post on Zulip Anton (Dec 08 2023 at 10:16):

I like the rename for the current List.split to List.splitAt.
For the new List.split implementation, I prefer the List.split : List elem, elem -> List (List elem) signature.

view this post on Zulip Anton (Dec 08 2023 at 11:22):

After reading the topic in contributing I also prefer splitBy instead of split.

view this post on Zulip Richard Feldman (Dec 08 2023 at 11:56):

splitBy or splitOn? :thinking:

view this post on Zulip Anton (Dec 08 2023 at 12:01):

By is more common in other languages I think, but On does seem nicer.

view this post on Zulip Declan Joseph Maguire (Dec 08 2023 at 12:32):

A little bit of ambiguity in the name for me, is whether the split happens before or after the matching spot. I always find myself consulting the docs for things like this if I've been away from a language for a while.

view this post on Zulip LoipesMas (Dec 08 2023 at 12:35):

Maybe this could also use tags, e.g., List.splitAt (After 5). But this doesn't "sound" that nice

view this post on Zulip Declan Joseph Maguire (Dec 08 2023 at 12:37):

Maybe generic List.split could work with that sort of structure, with more specific function names likeList.splitAfter or List.splitBy not having the tag argument as normal

view this post on Zulip Brendan Hansknecht (Dec 08 2023 at 15:53):

SplitBy suggest removal of the element. I don't think SplitOn suggests removal

view this post on Zulip Brendan Hansknecht (Dec 08 2023 at 15:54):

If I "split by" whitespace, I wouldn't expect to see whitespace in the output. If I "split on" whitespace, I would expect it to be captured in a trailing manner with each split.

view this post on Zulip Brendan Hansknecht (Dec 08 2023 at 15:55):

That is definitely how I parse the wording and I think relatively common

view this post on Zulip mdznta (Dec 08 2023 at 23:05):

I am in splitOn camp as well I think.

view this post on Zulip Sky Rose (Dec 09 2023 at 02:01):

splitBy, splitBefore, and splitAfter?

view this post on Zulip Isaac Van Doren (Dec 09 2023 at 03:07):

My preference is List.splitOn

view this post on Zulip Isaac Van Doren (Dec 09 2023 at 03:08):

Whichever way it goes, it would be great if the comparable Str and List functions have the same name

view this post on Zulip Sky Rose (Dec 09 2023 at 16:19):

I just thought of a new complication:
For each split function, we probably want a version that takes an elem and a version that takes a predicate.
List elem, elem -> List (List elem)
List elem, (elem -> Bool) -> List (List elem)

view this post on Zulip Anton (Dec 10 2023 at 12:19):

we probably want a version that takes an elem and a version that takes a predicate.

That seems reasonable at first, but I'm struggling to come up with common enough realistic use cases :thinking:

view this post on Zulip Anton (Dec 12 2023 at 11:47):

@Richard Feldman can you make the final decision on splitBy vs splitOn? That way @mdznta can move forward.

view this post on Zulip Adrian (Apr 11 2024 at 21:06):

I'd like to add a first attempt at parsing multipart/form-data and since basic-webserver would probably prefer not to have List splitting code in the Http module it would be incredibly convenient to move forward with this issue.
If it counts towards anything my vote is on

List.splitAt: List a, U64 -> {before: List a, after: List a}
List.splitOn: List a, List a -> List (List a) where a implements Eq

view this post on Zulip Richard Feldman (Apr 11 2024 at 23:01):

oh yeah I just realized I never responded to this! I'm in favor of that pairing as well :+1:

view this post on Zulip Brendan Hansknecht (Apr 11 2024 at 23:01):

So an important distinction here in @Adrian's version is that List.splitOn matches Str.split. It is skipping entire sequences of elements instead of just a single element.

view this post on Zulip Brendan Hansknecht (Apr 11 2024 at 23:01):

Which I think is fine, but just an important difference from what was talked about above.

view this post on Zulip Richard Feldman (Apr 11 2024 at 23:02):

oh hm I missed that

view this post on Zulip Brendan Hansknecht (Apr 11 2024 at 23:38):

We could always be explicit with it:

List.splitAt: List a, U64 -> {before: List a, after: List a}
List.splitOn: List a, a -> List (List a)
List.splitOnSeq: List a, List a -> List (List a)

view this post on Zulip Kevin Gillette (Apr 12 2024 at 00:36):

Is it worth using tags instead of different function variants for this one?

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 01:10):

That works to. I think this is a decent proposal:

List.splitAt : List a, U64 -> {before: List a, after: List a}
List.splitOn : List a, [Elem a, Sequence (List a)] -> List (List a)

view this post on Zulip Isaac Van Doren (Apr 12 2024 at 01:55):

I would prefer just passing a single element list if you only want to split on an element. It seems nicer than requiring a tag or having two different functions

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 01:59):

I personally don't like that cause I really dislike the chains of minor paper cuts that aren't likely to optimize away in api design. Generating a list for a single element then extracting the element to run the more optimized single element split function will not optimize away. There will always be an unnecessary allocation.

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 01:59):

The tag, should optimize away when the outer function inlines (which should essentially always happen). Even if it doesn't optimize away, no allocation, just an extra int in a register/on stack.

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 02:00):

With strings it matters a lot less cause we have small string optimization (which at a minimum avoids the allocation and likely also optimizes away). We have no plans to add a small list optimization (it is a much more complex mess due to various sized elements and often requires manual tuning to be beneficial).

view this post on Zulip Isaac Van Doren (Apr 12 2024 at 02:04):

Gotcha, that makes sense!

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 02:15):

Anyway, doesn't matter to me if it is a tag or 2 functions, but I think it is 100% fine for someone to start implementing this. It would be a very minor tweak to switch from one version to the other. So they can write up 99% of the PR rn.

view this post on Zulip Brendan Hansknecht (Apr 12 2024 at 02:27):

Oh, also, I just had a realization. The fancy algorithm that gets applied with strings can't be applied on lists. You would have to explicitly know the list is a List U8 to use it. As such, I think the linear scan algorithm will be the fastest thing we can safely do.

So I think the impl I wrote here for splitting based on a seq is essentially a valid impl. That said, if written in the standard library, we can rewrite it in a smarter way to avoid more bounds checking. So that will lead to a different design if we want to write the max perf version. I think this should be performant as 100% roc builtin code (bar some future refcounting fixes, but those will come with time and don't need to be worried about now).

view this post on Zulip Adrian (Apr 12 2024 at 07:35):

Should I also add the appropriate First and Last functions?
This is from my current Utils.roc so names are still subject to change.

listSplitFirstSeq : List a, List a -> { before : List a, after : List a } where a implements Eq
listSplitFirstSeq = \list, seq ->
    if List.isEmpty seq then
        { before: [], after: list }
    else
        listSplitFirstSeqHelper list seq 0 0

listSplitFirstSeqHelper = \list, seq, i, j ->
    when (List.get list (i + j), List.get seq j) is
        (Ok x, Ok y) if x == y ->
            listSplitFirstSeqHelper list seq i (j + 1)

        (Ok _, Ok _) ->
            listSplitFirstSeqHelper list seq (i + 1) 0

        (Ok _, Err _) ->
            {
                before: List.sublist list { start: 0, len: i },
                after: List.sublist list { start: i + j, len: (List.len list) - i - j },
            }

        (Err _, Ok _) ->
            { before: List.sublist list { start: 0, len: i + j }, after: [] }

        (Err _, Err _) ->
            { before: List.sublist list { start: 0, len: i }, after: [] }

expect listSplitFirstSeq ("a/b/c" |> Str.toUtf8) ['/'] |> eq { before: "a" |> Str.toUtf8, after: "b/c" |> Str.toUtf8 }
expect listSplitFirstSeq ("/a/b/c" |> Str.toUtf8) ['/'] |> eq { before: [], after: "a/b/c" |> Str.toUtf8 }
expect listSplitFirstSeq ("abc/" |> Str.toUtf8) ['/'] |> eq { before: "abc" |> Str.toUtf8, after: [] }
expect listSplitFirstSeq ("abc" |> Str.toUtf8) ['/'] |> eq { before: "abc" |> Str.toUtf8, after: [] }

expect listSplitFirstSeq ("a//b//c" |> Str.toUtf8) ['/', '/'] |> eq { before: "a" |> Str.toUtf8, after: "b//c" |> Str.toUtf8 }
expect listSplitFirstSeq ("//a//b//c" |> Str.toUtf8) ['/', '/'] |> eq { before: [], after: "a//b//c" |> Str.toUtf8 }
expect listSplitFirstSeq ("abc//" |> Str.toUtf8) ['/', '/'] |> eq { before: "abc" |> Str.toUtf8, after: [] }
expect listSplitFirstSeq ("abc" |> Str.toUtf8) ['/', '/'] |> eq { before: "abc" |> Str.toUtf8, after: [] }
expect listSplitFirstSeq ("/abc/" |> Str.toUtf8) ['/', '/'] |> eq { before: "/abc/" |> Str.toUtf8, after: [] }
expect listSplitFirstSeq ("abc" |> Str.toUtf8) ['/', '/'] |> eq { before: "abc" |> Str.toUtf8, after: [] }

view this post on Zulip Anton (Jun 29 2024 at 11:58):

Someone commented on the github issue for this (#6164), let's finish up the decision here.

The last point of doubt is this standoff:

  1. :
List.splitOn: List a, a -> List (List a)
List.splitOnSeq: List a, List a -> List (List a)

vs 2. :

List.splitOn : List a, [Elem a, Sequence (List a)] -> List (List a)

view this post on Zulip Anton (Jun 29 2024 at 11:58):

I'm heavily in favor of 1. it's just simpler. It doesn't require you to be familiar with tag unions, it's easier to type. I can't find any real benefits of 2 :p

view this post on Zulip Francois Green (Jun 29 2024 at 14:58):

Version 2 is far more elegant and once you abbreviate Sequence to Seq everyone will see it :laughing:

view this post on Zulip Adrian (Jun 29 2024 at 15:44):

Sorry to restir this conversation, but aren't the roc's compiler and llvm able to optimize out List.splitOnSeq _ [element] to List.splitOn? Could we get away with only one function like in the Str module?

view this post on Zulip Anton (Jun 29 2024 at 16:07):

Version 2 is far more elegant ...

Interesting, can you elaborate on that @Francois Green?

view this post on Zulip Anton (Jun 29 2024 at 16:14):

but aren't the roc's compiler and llvm able to optimize out List.splitOnSeq _ [element] to List.splitOn?

That optimization seems plausible, I think it's safe to consider the performance of both equivalent.

Could we get away with only one function like in the Str module?

We definitely could get away with it, can you share why you think it's the best approach?

view this post on Zulip Brendan Hansknecht (Jun 29 2024 at 17:03):

llvm able to optimize out List.splitOnSeq _ [element]

Definitely not. That list requires an allocation. LLVM basically gives up on everything once the heap is involved (not actaully true, but not far from it)

roc's compiler ... able to optimize out List.splitOnSeq _ [element]

We could, but it would be a one off bespoke pattern that affects performance. I am very much not a fan of those.

view this post on Zulip Adrian (Jun 29 2024 at 17:18):

Brendan Hansknecht said:

We could, but it would be a one off bespoke pattern that affects performance. I am very much not a fan of those.

Sorry but the transformation [const, const, ...] -> ListConst doesn't seem like a one off to me. In general I would prefer to avoid allocations if possible so transforming List.single a -> a seeems implementable (perhaps i am mistken about the innerworkings of the compiler) but if lists are mostly passed as slices, than you can construct a list like wrapper around a single element. In general i would experiment with const computation, tho that might get expansive for compile times (and might not avoid as many allocations as i might envision).

view this post on Zulip Adrian (Jun 29 2024 at 17:30):

Anton said:

We definitely could get away with it, can you share why you think it's the best approach?

I would prefer List.splitOn to match Str.splitOn. I think of Str type like a thin wrapper around the List U8 without the direct access to elements to prevent messing up unicode codepoints, so i would like to see as many parallels between the two (it would also be easier to remember).

view this post on Zulip Anton (Jun 29 2024 at 18:02):

I would prefer List.splitOn to match Str.splitOn.

There is a case to be made for consistency. I would expect that most of the time the user would want to split a list by a single element. I would not prefer consistency over accommodating the common use case.

view this post on Zulip Adrian (Jun 29 2024 at 18:58):

A single element list can be made with 2 extra characters. Is it accommodating to skip 2 chars? Why do you need to create a new string if you're spliting on "/"? Sorry but i might be missing your point

view this post on Zulip Francois Green (Jun 29 2024 at 20:30):

@Anton Version 2 gives me function overloading in a manner that embraces a key feature of the language. Maybe it's too clever by half, but I find it a pleasure to read.

view this post on Zulip Brendan Hansknecht (Jun 29 2024 at 20:43):

Going from List.splitOnSeq _ [element] to List.splitOn would be a special transformation specifically for List.splitOnSeq that generates a new function that never builds the original list. This is not general.

Sorry but the transformation [const, const, ...] -> ListConst doesn't seem like a one off to me

These seem like a different problem. Sure, we have constant lists, but they still manifest in a different section of the binary with a pointer indirection. Looking at List.splitOnSeq _ [element] there is no requirement for element to be a constant. As such, this code should be expected to allocate.

view this post on Zulip Brendan Hansknecht (Jun 29 2024 at 20:45):

A single element list can be made with 2 extra characters

Yes, and it should be assumed to require an allocation. We don't have a small list optimization (and we don't have plans to add one).

view this post on Zulip Brendan Hansknecht (Jun 29 2024 at 20:47):

Why do you need to create a new string if you're spliting on "/"?

Because the standard library Str has restrictions related to unicode. We don't have a character* type to guarantee valid unicode input. So requiring a Str as input is what makes most sense for that api. It guarantees we don't split in a way that generates invalid unicode.

* warning: "character" is a very loaded word when talking about unicode.

view this post on Zulip Francois Green (Jul 01 2024 at 12:47):

If List.split is to be renamed List.splitAt, can the decision to return a record and not a tuple also be reevaluated?

view this post on Zulip Anton (Jul 01 2024 at 12:58):

Sure, can you start a new topic (in #ideas) for that?

view this post on Zulip Adrian (Jul 01 2024 at 23:59):

Considering that the options from 1st option won't be equal for a long time and that most people voted for number 1, I see no issues blocking the implementation being added to stdlib.

view this post on Zulip Chris (Jul 03 2024 at 06:58):

Anton said:

List.splitOn : List a, [Elem a, Sequence (List a)] -> List (List a)

If Element is abbreviated to Elem then i believe Sequence also should be abbreviated to Seq, or use full words

view this post on Zulip Kiryl Dziamura (Jul 03 2024 at 07:08):

I think it will look like this from the API perspective?

List.splitOn list (Elem 42)
List.splitOn list (Seq [1, 2, 3])

vs

List.splitOn list 42
List.splitOnSeq list [1, 2, 3]

Also, what’s the advantage of the tag union? How often Elem and Seq be used not directly with the List.splitOn call?

view this post on Zulip Chris (Jul 03 2024 at 07:16):

I'm not sure what is current "policy" for duplicate functions in builtins, but maybe it would be beneficial to have all those signatures?

List.splitOnElem: List a, a -> List (List a)
List.splitOnSeq: List a, List a -> List (List a)
List.splitOn : List a, [Elem a, Seq (List a)] -> List (List a)

view this post on Zulip Anton (Jul 03 2024 at 10:30):

Let's see how things go with the solution we voted on and we can revisit later if needed.

view this post on Zulip Notification Bot (Jul 03 2024 at 10:30):

Anton has marked this topic as resolved.

view this post on Zulip Luke Boswell (Jul 18 2024 at 05:45):

Are these going to be implemented in pure Roc, or are we planning on using something like zig's std.mem.splitSequence?

view this post on Zulip Luke Boswell (Jul 18 2024 at 07:00):

I needed this, so I made something and shared in an Issue
https://github.com/roc-lang/roc/issues/6912

If it looks ok maybe we can add to the builtins?

view this post on Zulip Richard Feldman (Jul 18 2024 at 11:14):

what if we named it List.splitOnList instead of List.splitOnSeq? My first thought when I saw the name (after having not thought about this thread for awhile) was "so I have to give it a Seq? How do I get one of those?" :big_smile:

view this post on Zulip Isaac Van Doren (Nov 14 2024 at 03:34):

I'd like to also rename Str.split to Str.splitOn so that it is consistent with List.splitOn.

For context, once this PR merges, we'll have these list splitting functions:

splitAt : List elem, U64 -> { before : List elem, others : List elem }
splitOn : List a, a -> List (List a) where a implements Eq
splitOnList : List a, List a -> List (List a) where a implements Eq

Last updated: Jun 16 2026 at 16:19 UTC