non-empty lists · ideas · Zulip Chat Archive

Even though Haskell ships with a Data.List.NonEmpty, it's pretty rare for libraries to give you back a NonEmpty for operations that can't return an empty list (using sepBy1 in attoparsec rn made me reflect about this).

Elm doesn't ship with NonEmpty in the standard lib, not sure why. Could be a decision to keep the stdlib as small as possible, could be there's tradeoffs to making NonEmpty pervasive that I'm unaware.

I noticed Roc doesn't have NonEmpty yet, and was curious: do folks have an opinion about making NonEmpty pervasive/normative in a language's ecosystem, and thoughts on how to go about it in Roc?

Martin Stewart (Aug 28 2022 at 20:07):

I've started on something like this https://github.com/MartinSStewart/roc-nonempty with the hope that we can have a single package handling common nonempty use cases, rather than ending up with functions just returning (a, List a) or having multiple competing packages.

Maybe it would make more sense to have this part of the standard Roc lib though.

Richard Feldman (Aug 28 2022 at 20:14):

so I had an idea around this awhile ago, but I wasn't convinced the stdlib complexity would be worth it

Richard Feldman (Aug 28 2022 at 20:15):

to me, one of the natural questions that comes up when you have nonempty lists is: what are the implications for APIs?

Richard Feldman (Aug 28 2022 at 20:15):

like for example, let's say I have Dict.fromList - do I also want Dict.fromNonEmptyList?

Richard Feldman (Aug 28 2022 at 20:16):

arguably not, because if I already have Dict.fromList, I can always call that passing NonEmpty.toList or something like that

Richard Feldman (Aug 28 2022 at 20:16):

but what if I have a function that accepts a List Str and returns a List Str?

Richard Feldman (Aug 28 2022 at 20:17):

now if I pass it a NonEmpty list (via NonEmpty.toList), the returned List has lost its "nonemptiness"

Richard Feldman (Aug 28 2022 at 20:18):

one possible solution to that is to say "if you're in that situation, just offer 2 versions, one for empty lists and one for nonempty lists"

Richard Feldman (Aug 28 2022 at 20:18):

(or in Haskell's case, probably something like Functor f - but I definitely don't want to go down that road for this use case!)

Richard Feldman (Aug 28 2022 at 20:19):

one possible solution is to introduce a new stdlib opaque type called (let's say for the sake of argument) Seq elem emptiness

Richard Feldman (Aug 28 2022 at 20:19):

and then define List and NonEmptyList as type aliases of it, similar to what we do with Num:

List elem : Seq elem [PotentiallyEmpty]

NonEmptyList elem : Seq elem [NonEmpty]

Richard Feldman (Aug 28 2022 at 20:20):

["a"] : Seq Str *

[] : Seq * [PotentiallyEmpty]

Richard Feldman (Aug 28 2022 at 20:20):

so square bracket literals are no longer a List, but rather a Seq which is compatible with either potentially-empty or guaranteed NonEmpty lists

Richard Feldman (Aug 28 2022 at 20:21):

then you can also say, instead of List Str -> List Str, something like Seq Str emptiness -> Seq Str emptiness

Richard Feldman (Aug 28 2022 at 20:21):

and now you can give it either lists or nonempty lists, and it returns whatever you gave it

Richard Feldman (Aug 28 2022 at 20:23):

in this world, I assume it would be recommended to accept Seq Whatever * instead of List Whatever, because it's more flexible; you can give it either a List or a NonEmptyList

Richard Feldman (Aug 28 2022 at 20:24):

also, you could potentially take the same idea further and define Dict and Set to be type aliases of Seq as well

Richard Feldman (Aug 28 2022 at 20:24):

such that (for example) a function that accepts a Seq could also accept either a Dict or a Set

Richard Feldman (Aug 28 2022 at 20:25):

one downside of this is that as soon as you start down that road, much like with Num, there's now a demand for userspace extensions of Seq

Richard Feldman (Aug 28 2022 at 20:26):

e.g. "I made my custom data structure and it doesn't work with Seq; how can I make it Just Work for any functions that accept Seq?" but of course the best you could do would be to offer a toSeq function which converts it into one of the stdlib data structures

Richard Feldman (Aug 28 2022 at 20:26):

Martin Stewart (Aug 28 2022 at 20:26):

I vaguely remember reading about this approach in an earlier version of the roc-for-elm-programmers doc :thinking:

Richard Feldman (Aug 28 2022 at 20:27):

like I said at the beginning...I'm not convinced it's worth the complexity, but I'm also not convinced it's definitely a mistake either, so I'm curious what others think on this topic in general :big_smile:

Martin Stewart (Aug 28 2022 at 20:30):

I agree, I don't think it's worth the complexity. My idea was to just add in nonempty versions of List, Set, and Dict into the standard lib and then add Nonempty* versions of functions like List Str -> List Str where it makes sense. That feels inelegant but I suspect in practice it will be fine.

Richard Feldman (Aug 28 2022 at 20:31):

one thing that it would make pretty different is that it's start calling stuff like Seq.map instead of List.map, but you'd still have List.get and not Seq.get because the types would be different (depending on whether it was nonempty)

Juliano (Aug 28 2022 at 20:34):

What's an example where people would create a data structure that wouldn't work with Seq? I love the idea of Seq elem [NonEmpty], and I think I'm not seeing the implications :S

Qqwy / Marten (Aug 28 2022 at 20:38):

I think what Richard means is that there are many other sequential collections one can think of besides doubling-in-size vectors and nonempty-doubling-in-size vectors. Like for instance linked lists, RRB vectors, finger trees, numerical ranges, etc.

Qqwy / Marten (Aug 28 2022 at 20:39):

An alternative might be to define non-higher-kindred traits (side-stepping Functor a) by taking inspiration from, for instance,
https://docs.rs/cc-traits/latest/cc_traits/

Qqwy / Marten (Aug 28 2022 at 20:42):

It does depend on associated types, but only having those does not make it possible to implement the Monad stack.

Richard Feldman (Aug 29 2022 at 00:47):

yeah but with associated types we're talking about a whole new language feature in what I consider the riskiest category of new language features from the standpoint of overall ecosystem complexity (namely, making abilities more powerful), so I'd consider the bar for introducing something like that to be very high :big_smile:

Richard Feldman (Aug 29 2022 at 02:00):

Seq.custom : structure, (state, (state, elem -> state) -> state) -> Seq elem [Custom structure] [PotentiallyEmpty]

Seq.customNonEmpty : structure, elem, (state, (state, elem -> state) -> state) ->  Seq elem [Custom structure] [NonEmpty]

Seq.unwrapCustom : Seq * [Custom structure] * -> structure

ConsList.toSeq : ConsList elem -> Seq elem [Custom (ConsList elem)] [PotentiallyEmpty]
ConsList.toSeq = \consList ->
    Seq.custom consList \state, transform ->
        ConsList.walk consList state transform

that way you could still use your custom data structure with functions that work on Seq

Richard Feldman (Aug 29 2022 at 02:01):

it wouldn't be as ergonomic as the builtin ones, but that seems like an ok tradeoff given that builtin data structures always have an ergonomics advantage due to having things like literals in the syntax for them etc

Qqwy / Marten (Aug 29 2022 at 11:02):

Interesting idea. In a sense, you're building a trait-implementation in the manual way.

Qqwy / Marten (Aug 29 2022 at 11:07):

However, it does not compose. If I wrap my fancy RRB vector using Seq.custom I cannot call any RRBVector.* functions on it until calling Seq.unwrapCustom again.

Qqwy / Marten (Aug 29 2022 at 11:07):

Richard Feldman (Aug 29 2022 at 12:29):

thinking about it more, this Seq idea (with Seq.custom) would be most similar to Rust's Iterator

Richard Feldman (Aug 29 2022 at 12:29):

Richard Feldman (Aug 29 2022 at 12:31):

which is interesting because in Rust I've used that a lot, but so far I can't remember any times (in Rust at least) that I really wanted something that was "a generic collection that supports map" for example

Richard Feldman (Aug 29 2022 at 12:32):

which is relevant because I don't think that custom design could implement a way to provide a map implementation

Folkert de Vries (Aug 29 2022 at 12:33):

you don't need the map on the data structure because you can always use the iterator to map

Qqwy / Marten (Aug 29 2022 at 12:34):

Folkert de Vries (Aug 29 2022 at 12:34):

Qqwy / Marten (Aug 29 2022 at 12:35):

Richard Feldman (Aug 29 2022 at 12:36):

right, and I explicitly want to avoid adding associated types to the language :big_smile:

Qqwy / Marten (Aug 29 2022 at 12:37):

Yes, that's why I mentioned it. But I think Folkert is right that they are not needed.

Richard Feldman (Aug 29 2022 at 12:38):

yeah so you couldn't offer Seq.map itself, but since you'd have Seq.walk, you wouldn't need to

Richard Feldman (Aug 29 2022 at 12:39):

which would make Seq like ExactSizeIterator in Rust. There are pros and cons to that

Richard Feldman (Aug 29 2022 at 13:00):

could also have it ask for walkUntil instead of walk, and then Seq.first would be implementable

Qqwy / Marten (Aug 29 2022 at 13:20):

You could have multiple versions that stack on top of each other, just like Iterator and ExactSizeIterator

Richard Feldman (Aug 29 2022 at 13:30):

it's possible, but that one seems like a stretch in terms of being worth it :big_smile:

Richard Feldman (Aug 29 2022 at 13:33):

Qqwy / Marten (Aug 29 2022 at 13:35):

Richard Feldman (Aug 29 2022 at 13:36):

Richard Feldman (Aug 29 2022 at 13:37):

like the way I typically use dictionaries and sets, I'm always asking questions like "is this specific key present?" (and in the case of a dictionary, possibly "is this specific value present?")

Richard Feldman (Aug 29 2022 at 13:38):

as opposed to a nonempty list where I sometimes ask questions like "give me the first element from this list, and I know there's one there so I don't want to have to deal with a Result"

Richard Feldman (Aug 29 2022 at 13:39):

so I'm wondering whether that's a use case worth thinking about while exploring this idea!

Ayaz Hafiz (Aug 29 2022 at 13:40):

I use non-empty sets in OCaml a lot, in particular to choose one arbitrary element and do some processing over it, but I want to make sure there are no duplicate elements that I need to process over

Richard Feldman (Aug 29 2022 at 13:41):

Ayaz Hafiz (Aug 29 2022 at 13:42):

Richard Feldman (Aug 29 2022 at 13:43):

léo (Aug 29 2022 at 13:55):

Martin Stewart (Aug 29 2022 at 13:55):

For nonempty sets I've used it to ensure a user has selected at least one item before submitting a form. I didn't use a nonempty list because each item should be unique.

For nonempty dict I've used it for modelling a user account that has one or more "cases" and each case has an id used to reference it. So NonemptyDict CaseId Case. Part of the account creation is to set up a case so there is guaranteed to be at least one.

Richard Feldman (Aug 29 2022 at 13:59):

léo (Aug 29 2022 at 14:09):

type (_, 'a) set =
  | Nil : ([`Empty], 'a) set
  | Cons : 'a * (_, 'a) set -> ([`Nonempty], 'a) set

let hd : ([`Nonempty], 'a) set -> 'a = function
  | Cons (hd, _tl) -> hd

let empty = Nil

let cons hd tl = Cons (hd, tl)

let () =
  let s = cons 1 empty in
  Printf.printf "hd = %d\n" (hd s)

That's just non-empty lists, but if you want to have uniqueness, you can have a smart constructor that ensures uniqueness

Brendan Hansknecht (Aug 29 2022 at 14:36):

How does this work? If a user unselected an item, wouldn't it degrade into a regular set? Do you just block a user from unselecting unless they have at least 2 items selected?

Aside, what is the gain over just adding a length check at the beginning of your processing/submitting step?

I don't think I have ever used non-empty containers, so trying to understand better.

léo (Aug 29 2022 at 14:39):

léo (Aug 29 2022 at 14:41):

The gains are usually performance (no runtime check) and safety (the invariant is encoded in the type).

Brendan Hansknecht (Aug 29 2022 at 14:43):

But if you have to define the type with some sort of recursive structure or union, I would bet that is generally much slower than a linear list/array in memory. Following pointers is generally quite slow.

léo (Aug 29 2022 at 14:52):

type (_, 'a) set =
  | Empty : ([`Empty], 'a) set
  | Nonempty : 'a array -> ([`Nonempty], 'a) set

Then ofc, you'll have to have a smart constructor and hide stuff to prevent user to build a Nonempty variant with an empty array.

léo (Aug 29 2022 at 15:13):

module Non_empty_array : sig
  type (_, 'a) t
  val empty : ([> `Empty], 'a) t
  val cons : 'a -> ([ `Empty | `Nonempty], 'a) t -> ([`Nonempty], 'a) t
  val hd : ([`Nonempty], 'a) t -> 'a
end = struct

  type (_, 'a) t =
    | Empty : ([>`Empty], 'a) t
    | Nonempty : 'a array -> ([>`Nonempty], 'a) t

  let hd : ([`Nonempty], 'a) t -> 'a = function
    | Nonempty a -> a.(0)

  let empty = Empty

  let cons : 'a -> ([ `Empty | `Nonempty], 'a) t -> ([`Nonempty], 'a) t =
    fun hd -> function
    | Empty -> Nonempty ([|hd|])
    | Nonempty a ->
      let len = Array.length a + 1 in
      let f = function
        | 0 -> hd
        | i -> a.(i + 1)
      in
      let a = Array.init len f in
      Nonempty a
end

let () =
  let open Non_empty_array in
  let s = cons 1 empty in
  Printf.printf "hd = %d\n" (hd s)

(cons is terribly inefficient as I'm using non-resizable arrays, but that another problem)

Martin Stewart (Aug 29 2022 at 15:23):

A normal set is used to store items while the user is still selecting them. The nonempty set is for places where the selection must be validated. For example, showing the selection on the submit success page, or storing the data in the backend.

The advantage of using nonempty set is that it's not possible for me to forget to do the validation when the user presses submit since I need to convert from a set to a nonempty set. Additionally having nonempty set better documents what is intended.

Richard Feldman (Aug 29 2022 at 16:53):

Richard Feldman (Aug 29 2022 at 17:03):

e.g. for form fields, validating that things like username and email were nonempty

Martin Stewart (Aug 29 2022 at 17:06):

I made a nonempty string package in Elm. My experience after using it for a while is it can be useful* but there's some drawbacks:

unsafeNonemptyString : String -> NonemptyString
unsafeNonemptyString text =
    case NonemptyString.fromString text of
        Just nonempty -> nonempty
        Nothing -> unsafeNonemptyString text -- I hope we never reach this

myString = unsafeNonemptyString "My string"

*One usecase I had was creating a package for sending emails via sendgrid. The sendgrid API will return an error if the subject field is empty. So I used NonemptyString to ensure the user won't accidentally write code that does that.

Brendan Hansknecht (Aug 29 2022 at 18:52):

Ah, I guess I wasn't thinking API boundaries and passing data to another function. That makes sense. Now a user has to verify a set is non-empty before even calling certain functions. So just encoding function constraints in type constructors.

Thomas Dwyer (Aug 30 2022 at 01:48):

Non-empty lists are a form of "Parse, don't validate." More common in languages with more powerful type systems, it can still be very useful in languages with expressive datatypes like Roc.

Qqwy / Marten (Aug 30 2022 at 08:00):

Maybe at some point we might improve the constant value story in Roc by simultaneously:

Qqwy / Marten (Aug 30 2022 at 08:03):

urlConstant : String -> Url
urlConstant string =
  case Url.parse string of
    Just url -> url
    Nothing -> absurd

Qqwy / Marten (Aug 30 2022 at 08:04):

But obviously this would require being able to run semi-arbitrary Roc code at compile time

Qqwy / Marten (Aug 30 2022 at 08:05):

I guess it would be nice to indicate in the types of the example that it's a compile-time thing too by the way, otherwise its type signature looks 'too good to be true'

Qqwy / Marten (Aug 30 2022 at 08:14):

urlConstant : CompileTime (String -> Url)
urlConstant string =
  case Url.parse string of
    Just url -> url
    Nothing -> absurd

Martin Stewart (Aug 30 2022 at 08:15):

I think this was discussed a while back (I don't remember which thread). I would love to have this feature but there are some challenges.

Martin Stewart (Aug 30 2022 at 08:19):

{-| Be very careful when using this!
-}
url : String -> Url
url urlText =
    case Url.fromString urlText of
        Just url_ ->
            url_

        Nothing ->
            unreachable ()


{-| Be very careful when using this!
-}
message : String -> Message
message text =
    case Message.fromString text of
        Ok ok ->
            ok

        Err _ ->
            unreachable ()


{-| Be very careful when using this!
-}
emailAddress : String -> EmailAddress
emailAddress text =
    case EmailAddress.fromString text of
        Just emailAddress_ ->
            emailAddress_

        Nothing ->
            unreachable ()


{-| Be very careful when using this!
-}
unreachable : () -> a
unreachable () =
    let
        _ =
            causeStackOverflow 0
    in
    unreachable ()


causeStackOverflow : Int -> Int
causeStackOverflow value =
    -- Don't TCO
    causeStackOverflow value + 1

In practice this has never given me any trouble*. So maybe this approach is good enough even without any compiler support

*I've only done this on solo projects so far. Maybe with multiple people, the odds are higher that someone will misuse these functions (i.e. Unsafe.url urlDeterminedAtRuntime)

Qqwy / Marten (Aug 30 2022 at 08:23):

Yes, you would need to work with what the cool kids nowadays call a 'gas meter'.
It's a simple trade-off between power/safety and compilation-speed and I do not think there is any way around it.

Qqwy / Marten (Aug 30 2022 at 08:23):

For the vast majority of things we want, the amount of time that needs to be spent during compilation will be very low (e.g. not recursive at all), of course.

lue (Sep 03 2022 at 06:27):

type List element emptyTag_ emptiable =
      Empty emptiable
    | Append element (List element Empty emptiable)

-- same with Set, ...
append :
       element
    -> List element Empty emptiable_
    -> List element Empty never_

-- same with Set, ...
top  : List element Empty Never -> element

Set.fromList : List element Empty emptiable -> Set element Empty emptiable

Stream: ideas

Topic: non-empty lists

Juliano (Aug 28 2022 at 18:38):

Martin Stewart (Aug 28 2022 at 20:07):

Richard Feldman (Aug 28 2022 at 20:14):

Richard Feldman (Aug 28 2022 at 20:15):

Richard Feldman (Aug 28 2022 at 20:15):

Richard Feldman (Aug 28 2022 at 20:16):

Richard Feldman (Aug 28 2022 at 20:16):

Richard Feldman (Aug 28 2022 at 20:17):

Richard Feldman (Aug 28 2022 at 20:18):

Richard Feldman (Aug 28 2022 at 20:18):

Richard Feldman (Aug 28 2022 at 20:19):

Richard Feldman (Aug 28 2022 at 20:19):

Richard Feldman (Aug 28 2022 at 20:20):

Richard Feldman (Aug 28 2022 at 20:20):

Richard Feldman (Aug 28 2022 at 20:21):

Richard Feldman (Aug 28 2022 at 20:21):

Richard Feldman (Aug 28 2022 at 20:23):

Richard Feldman (Aug 28 2022 at 20:24):

Richard Feldman (Aug 28 2022 at 20:24):

Richard Feldman (Aug 28 2022 at 20:25):

Richard Feldman (Aug 28 2022 at 20:26):

Richard Feldman (Aug 28 2022 at 20:26):

Martin Stewart (Aug 28 2022 at 20:26):

Richard Feldman (Aug 28 2022 at 20:27):

Martin Stewart (Aug 28 2022 at 20:30):

Richard Feldman (Aug 28 2022 at 20:31):

Juliano (Aug 28 2022 at 20:34):

Qqwy / Marten (Aug 28 2022 at 20:38):

Qqwy / Marten (Aug 28 2022 at 20:39):

Qqwy / Marten (Aug 28 2022 at 20:42):

Richard Feldman (Aug 29 2022 at 00:47):

Richard Feldman (Aug 29 2022 at 02:00):

Richard Feldman (Aug 29 2022 at 02:01):

Qqwy / Marten (Aug 29 2022 at 11:02):

Qqwy / Marten (Aug 29 2022 at 11:07):

Qqwy / Marten (Aug 29 2022 at 11:07):

Richard Feldman (Aug 29 2022 at 12:29):

Richard Feldman (Aug 29 2022 at 12:29):

Richard Feldman (Aug 29 2022 at 12:31):

Richard Feldman (Aug 29 2022 at 12:32):

Folkert de Vries (Aug 29 2022 at 12:33):

Qqwy / Marten (Aug 29 2022 at 12:34):

Folkert de Vries (Aug 29 2022 at 12:34):

Folkert de Vries (Aug 29 2022 at 12:34):

Qqwy / Marten (Aug 29 2022 at 12:35):

Richard Feldman (Aug 29 2022 at 12:36):

Qqwy / Marten (Aug 29 2022 at 12:37):

Richard Feldman (Aug 29 2022 at 12:38):

Richard Feldman (Aug 29 2022 at 12:39):

Richard Feldman (Aug 29 2022 at 12:39):

Richard Feldman (Aug 29 2022 at 13:00):

Qqwy / Marten (Aug 29 2022 at 13:20):

Richard Feldman (Aug 29 2022 at 13:30):

Richard Feldman (Aug 29 2022 at 13:33):

Qqwy / Marten (Aug 29 2022 at 13:35):

Richard Feldman (Aug 29 2022 at 13:36):

Richard Feldman (Aug 29 2022 at 13:37):

Richard Feldman (Aug 29 2022 at 13:38):

Richard Feldman (Aug 29 2022 at 13:39):

Ayaz Hafiz (Aug 29 2022 at 13:40):

Richard Feldman (Aug 29 2022 at 13:41):

Richard Feldman (Aug 29 2022 at 13:41):

Ayaz Hafiz (Aug 29 2022 at 13:42):

Richard Feldman (Aug 29 2022 at 13:43):

léo (Aug 29 2022 at 13:55):

Martin Stewart (Aug 29 2022 at 13:55):

Richard Feldman (Aug 29 2022 at 13:59):

léo (Aug 29 2022 at 14:09):

Brendan Hansknecht (Aug 29 2022 at 14:36):

léo (Aug 29 2022 at 14:39):

léo (Aug 29 2022 at 14:41):

Brendan Hansknecht (Aug 29 2022 at 14:43):

léo (Aug 29 2022 at 14:52):

léo (Aug 29 2022 at 15:13):

Martin Stewart (Aug 29 2022 at 15:23):

Richard Feldman (Aug 29 2022 at 16:53):

Richard Feldman (Aug 29 2022 at 17:03):

Martin Stewart (Aug 29 2022 at 17:06):