package naming and search · ideas

Stream: ideas

Topic: package naming and search

Richard Feldman (Oct 04 2024 at 19:47):

I've been trying to finish up the design doc for a centralized package index, but I've gotten stuck on a fairly fundamental problem (how naming and search ordering should work), and I think talking through it might help.

Here's a summary of some things I've thought through and considered: https://docs.google.com/document/d/1mnBFnvFQ2wEkTLIKlU60BO7BQKQWp_n7dRXJaVN6pDQ/edit?usp=sharing

I really don't have a clear frontrunner design here. The tradeoffs are all over the place. I'm curious to hear anyone's thoughts on this, including random ideas and more data points on things you've seen work well or not so well.

Luke Boswell (Oct 04 2024 at 21:14):

This is my jam. :heart:

Here's a quick brain dump of the approach I would take.

https://docs.google.com/document/d/1L-uA8DycSHjSUFHPhYf0Hoj5vHA6K4s6J9m40s888Hw/edit?usp=sharing

edit - I should add that here I've assumed namespaced and alphabetical.

Jared Cone (Oct 04 2024 at 21:27):

Richard's doc gives a good argument for namespaces being superior to flat. It prevents squatting. Having the company/creator in the package path gives some comfort knowing the package is coming from a source I consider reputable. Like when I'm browsing for IDE extensions, it's nice seeing something like C/C++ from Microsoft. And maybe typo name-trickery is a little less likely, since you'd have to typo the namespace and the package. Also Roc has some extra security with its Task/Effect system "hmmm, why does this png decoder need to make http requests..."

For sorting, I'm not sure if the sort order is as important as the information that's presented with the sort. Again thinking of when I select an extension in an IDE, it shows the package name, creator, # of downloads, # of reviews, # of stars, and verified ownership of the domain. I often use that when trying to decide which extension to install.

Isaac Van Doren (Oct 05 2024 at 00:33):

I'm a big fan of namespaced. Squatting is definitely not something I want to have. The opposite problem might also exist in a flat solution where an author could be reluctant to choose a descriptive name for their package like roc-time because they aren't sure about taking such a good name out of circulation.

Nathan Kramer (Oct 05 2024 at 01:04):

Namespacing seems key to me. Another positive reason to namespace is that it means you don't have to give packages "quirky" names to make them unique, you can just name it what it does without worrying about collisions.

Sky Rose (Oct 05 2024 at 02:37):

I don't think that a sorting method needs to be decided on right now. Some things, like naming schemes, will be hard to change later, so we want to get them right the first time. But sorting is something that can be easily iterated on by the package index without having to change any community contributions or code. I think it'd be enough to pick a reasonable one (random would probably be easiest) to start with and then wait until package sorting becomes a problem before trying to pick a better method.

Sky Rose (Oct 05 2024 at 02:42):

That said, a comment on Elm's approach of a manual list of authors: It doesn't have to be tied to conference talks. You could have a handful of trusted curators who manually tag high-quality packages and authors, for whatever sense of "high quality" they think makes sense. Tying it to conference talks gives a convenient way to claim that it's impartial and avoid having to make decisions about what should be included, but if curators want to have more editorial power over what's included in the list, that's just as technically feasible.

Sky Rose (Oct 05 2024 at 02:49):

I agree that namespacing would be good.

An anecdote from Gleam: They don't have namespacing, so they discourage the community from naming their packages straightforwardly like "gleam-tcp" because that implies that they're official Gleam packages. Instead, people make creative names like "glisten" ("gleam" + "listen", cuz you listen to tcp sockets). Which is fun, but causes all sorts of naming problems.

Sky Rose (Oct 05 2024 at 02:58):

If you do namespacing, you still run into the problem of having to adjudicate name conflicts in the namespaces. That's less of an issue because there are fewer namespaces than packages, but as a community member there still needs to be a way to prevent me from getting the "google/" namespace.

Elm solves this problem by using GitHub's accounts as namespaces. That offloads the adjudication to GitHub, who already handles this, but then you're stuck using GitHub. Sometimes I wonder if there could be another layer, so a package would be "github/rtfeldman/elm-ui", and there'd also be "gitlab/" or "npm/" or other top-level organizations with namespaces that we trust, though I think I've just re-invented DNS.

Speaking of DNS, using URLs is also deferring the namespace issue to someone else. But it's offloading it to the DNS system, which seems like a totally fine foundation.

Jasper Woudenberg (Oct 05 2024 at 08:09):

I like the namespaced approach as well. I'd propose the full name of the package should be a prefix of the canonical URL where it can be downloaded, i.e. the URL you will see in main.roc. This to avoid the possibility of packages having multiple names, and people needing to remember that X they see in the code goes by Y in the package index. But that would create new constraints:

Authors might change where they host their binaries, and this would result in their package name changing too.
Folks might use mirrors to get a package source from somewhere else than the canonical location, in which case the package name in source code will be different from the name in the package repository anyway.

Jasper Woudenberg (Oct 05 2024 at 08:20):

The way I read this doc, an important goal of creating a centralized package index is helping folks find the best packages. I really like this framing!

One quality criterium that I often try to figure out when adopting a new dependency is whether the package I'm looking at is well maintained. Luke mentioned this one as well. It can be tricky to have objective measures for this. Activity in the package's repo is a poor proxy for maintainedness - some packages are "done" and see very little further work, and those are often great.

I think the Haskell ecosystem offers some inspiration here. One repository there, stackage, contains only packages that have been proven to compile against the latest versions of their dependencies. If a package starts failing this test its author gets some time to address this, and if they don't the package is dropped from the repository. I always take 'is included in stackage' as an indication of maintanedness.

Richard Feldman (Oct 05 2024 at 12:05):

Jasper Woudenberg said:

I think the Haskell ecosystem offers some inspiration here. One repository there, stackage, contains only packages that have been proven to compile against the latest versions of their dependencies. If a package starts failing this test its author gets some time to address this, and if they don't the package is dropped from the repository. I always take 'is included in stackage' as an indication of maintanedness.

hm, so that means if I write a package that depends on another package that releases a new major version with breaking changes, I have to update my package to work with that breaking change? :thinking:

Anton (Oct 05 2024 at 12:21):

For ranking search results, we could use a hidden multifaceted metric. If it's sufficiently complex, malicious actors will not be able to figure it out.

Jasper Woudenberg (Oct 05 2024 at 15:42):

Richard Feldman said:

hm, so that means if I write a package that depends on another package that releases a new major version with breaking changes, I have to update my package to work with that breaking change? :thinking:

Yes, or alternatively you could drop the dependency that made the breaking change.

The benefit of this is that as an application writer you get guarantees that the packages in stackage work together well, i.e. you won't get diamond dependency conflicts.

Kasper Møller Andersen (Oct 11 2024 at 06:02):

Jasper Woudenberg sagde:

I think the Haskell ecosystem offers some inspiration here. One repository there, stackage, contains only packages that have been proven to compile against the latest versions of their dependencies. If a package starts failing this test its author gets some time to address this, and if they don't the package is dropped from the repository. I always take 'is included in stackage' as an indication of maintanedness.

This sounds enticing to me! I think this sounds like it encourages some healthy traits:

having good SemVer coverage becomes important, and encourages people to do proper deprecation cycles. Having good deprecation and SemVer tooling built in would make this a powerful combination.
It encourages package stability in general

It’s basically a way to discourage churn, which also has some downsides of course if you actually need to make changes. So although I really like this approach, I think I would first introduce it later, once the eco system has gotten “shaken out”, and the first major rounds of iteration has completed.

Jasper Woudenberg (Oct 11 2024 at 06:17):

To clarify, I'm not saying we should take the stackage approach wholesale, rather that we might use 'up-to-date-ness with dependencies' as an indicator of stability and make it part of the library rating. Though there's a downside: figuring out whether a library works its most recent dependencies requires building it and running its tests after those dependencies update, and that requires CI resources.

Richard Feldman (Nov 28 2024 at 12:54):

from the files of "what does a package index do when it receives a naming dispute based on trademark law?" https://devclass.com/2024/11/27/redis-inc-seeks-control-over-future-of-rust-redis-rs-client-library-amid-talk-of-trademark-threat/

Richard Feldman (Nov 28 2024 at 12:55):

Ronacher said he understood from the call that “the name of the library [redis-rs] constitutes a trademark violation in their mind” and that the options were either to transfer the code to Redis, or to rename the crate.

Richard Feldman (Nov 28 2024 at 12:58):

a nice thing about DNS for naming (e.g. the package namespace is based on a domain) is that the blanket response can be "if you think your trademark entitles you to that domain, you're welcome to take it up with the domain registrar"

Anton (Nov 28 2024 at 13:06):

“the name of the library [redis-rs] constitutes a trademark violation in their mind”

That seems like a great way to make it harder for people to use your product :p

Anthony Bullard (Nov 28 2024 at 14:34):

I think a registry that's more like a search engine (like go get) is the best compromise between developer ergonomics and language maintainer sanity.

Eli Dowling (Nov 30 2024 at 05:11):

Commenting on the package ranking metric.
Maybe a good approach is to survey the current roc users to find out how we all assess package quality.
Normally I use a ratio between:
Fitness: How well does the package solve my problem
Popularity: Does this have a lot of GitHub stars/downloads
Activity: A quick glance at the last few commits to see if it's active.

If the ranking showed those factors front and centre, I would be happy.
I appreciate they aren't perfect and they all have ways of being gamed, but really, what doesn't have ways of being gamed?

I appreciate the conference talk thing is a good answer, but I don't like that.

@Luke Boswell I actually like this idea. But I have issues:

There needs to be a provision for packages that are just done. The maintainer isn't interested in adding much new functionality or keeping up with these assessments, but also the package is useful as is.
This is a lot of work for a central governing body and I don't see it scaling very well.
Judgements like this are very open to opinion. I might think that testing isn't very important for my package, you might disagree and say it needs more tests.

Mostly I spent a bunch of time considering alternatives and throwing them out because they are hard, or have flaws, or can be gamed or because they are just GitHub stars again.

I have one general idea that keeps coming up: I like how lobsters solves this problem by only allowing invited people to invite other people. The idea is that users can vet each other.

What about a ranking algorithm using stars with a little more info.
Eg:

If you have no contributions on GitHub your star is mostly irrelevant.
If you have a contribution to a roc package the value of your star increases based on the number of contributions and the ranking of that package.
The star value decays with age.

The idea, is that users who make high quality contributions to high quality roc packages are qualified to judge the quality of another package.
In theory you could game this by making a package that lots of high profile members of the roc community like.... But I think that's just called contributing to the ecosystem :sweat_smile:
Like Richard suggested in his write up, in the end it often comes down to "Do I trust that the guy who made this makes good stuff".

Richard Feldman (Nov 30 2024 at 06:05):

Eli Dowling said:

Commenting on the package ranking metric.
Maybe a good approach is to survey the current roc users to find out how we all assess package quality.
Normally I use a ratio between:
Fitness: How well does the package solve my problem
Popularity: Does this have a lot of GitHub stars/downloads
Activity: A quick glance at the last few commits to see if it's active.

If the ranking showed those factors front and centre, I would be happy.
I appreciate they aren't perfect and they all have ways of being gamed, but really, what doesn't have ways of being gamed?

hm, but how do we tell how many downloads it has, when the downloads go through a URL we don't control? Do we just silently fire off an additional request to our own servers every time someone installs a package from a particular URL, to track that they've done that?

If so, that raises the classic tradeoff of making it opt-in vs opt-out being in tension with privacy and quality of data collected.

Richard Feldman (Nov 30 2024 at 06:05):

for GitHub stars I guess we'd have to offer it only for packages that are hosted on GitHub, right?

Eli Dowling (Nov 30 2024 at 08:12):

I totally agree that GitHub stars being limited to GitHub is annoying, but I don't really have a better solution. We could have our own system, but then it requires people make accounts and then we have to manage users and detect fake accounts.

I'm not necessarily saying to sort by GitHub stars by default, but showing that info in a package manager and allowing us to sort would be appreciated.

As for downloads, yeah I know it's data collection, but I think you'll find most package managers do it and it's not an issue. I'd definitely make it opt out.
Whilst up address is somewhat identifying the data we're collecting can tell whether you use cli apps or webservers... Not exactly super private stuff. I believe GitHub actions publishes it's IP ranges for runners. We could choose to ignore download logging from a large portion of ci that way.

Anthony Bullard (Nov 30 2024 at 12:48):

I would say using recent commit activity as a sign of health is not as good as it sounds. It is more a function of number of open issues and average time to issue closure

Anthony Bullard (Nov 30 2024 at 12:51):

I think what Go does is it looks for all open-source Go projects in Git providers and indexes the number of usages for each package - since it has a "registry", but it doesn't store or serve the packages themselves, only the docs/metadata.

Last updated: Jul 23 2026 at 13:15 UTC