arbitrary-url packages · ideas · Zulip Chat Archive

Stream: ideas

Topic: arbitrary-url packages

Richard Feldman (Nov 07 2022 at 17:07):

wrote up an idea for packages (not a centralized package index; that will be a separate proposal in the future, which coexists with this) https://docs.google.com/document/d/1SRzBuW_hn17LzCpxk-DWCHqpegI2WzuLdQpE3-qc9Lc/edit?usp=sharing

all feedback welcome!

Ayaz Hafiz (Nov 07 2022 at 17:14):

Can you expand the read permissions to the public?

Richard Feldman (Nov 07 2022 at 17:16):

oops, done!

Luke Boswell (Nov 07 2022 at 19:16):

Cool, looks very useful. Question, does this package the host as src files or compiled libraries? Will the end use need the full host toolchain to use? I assume that for now yes, but in future with surgical linking it can cross compile automatically.

Richard Feldman (Nov 07 2022 at 19:34):

for now src files, but in the future the idea is to only support precompiled binaries

Richard Feldman (Nov 07 2022 at 19:35):

that has a couple of prerequisites though, the biggest of which is switching over to surgical linking on all targets - and the macOS surgical linker doesn't work yet, so that's most likely to be the biggest dependency

Richard Feldman (Nov 07 2022 at 19:36):

once we get to that goal state, nobody should need to have anything but the roc binary to build applications

Brian Carroll (Nov 08 2022 at 05:57):

I like the idea of the URL fragments for the entrypoint filename. But a full-stack web framework might have two entrypoints like frontend.roc and backend.roc.

Kevin Gillette (Nov 13 2022 at 02:21):

Some clarifications regarding Go's centralized package/checksum database implementation:

It's entirely optional (and so those who do not want to rely on anything centralized do not need to). Since Go's non-stdlib import paths have always been URLs, Go's package system has always been a decentralized-first system.
Whereas a local go.sum [lock] file detects/rejects subsequent alterations to a module version that has already been observed by your project, the checksum database extends this "time of first observation" globally. For anyone that thoroughly reviews their dependencies' code before use (and performs incremental diffs at time of dependency upgrade), this provides little, but this would prevent some of the social-engineering exploits you describe in the Google doc.
It also provides better fetch/describe performance than a classic VCS, as only a subset of repo files may be relevant to a build. It's essentially equivalent to a zip/tarball of the requisite files for a given version. Certainly a performance optimization is optional.

Kevin Gillette (Nov 13 2022 at 02:25):

Has a first-class concept of version numbers, and which enforces semantic versioning (like Elm's package system does), which allows for automated solving of shared dependencies and avoiding code bloat from duplication. I don't think it's possible to have Elm-level guarantees around these things for arbitrary URLs.

Go's module system also came to this conclusion: in Go, breaking changes (i.e. major version bumps) mandate a different URL (i.e. a "v2" somewhere in the URL), which reduces what most dependency solvers have as an NP-complete problem down to something that can be completed in polynomial time. With hash-embedded URLs, Roc would get the same property (albeit presumably the URLs would change even with patch releases).

Richard Feldman (Nov 13 2022 at 02:29):

but in this case "mandate" doesn't mean machine enforcement, right? It's just the rule everyone is asked to follow.

Kevin Gillette (Nov 13 2022 at 03:24):

Requiring a v2.* (in the semver sense) having a different import path than v1.* is indeed machine enforced. Not in a centralized way, but through the local toolchain.

By making this tradeoff, the diamond dependency problem is a non-problem, since, by having separate URLs per major version, major versions are effectively separate, unrelated modules. The other part of the solution to the diamond problem is that all minor/patch revisions within a major version are assumed to be backwards compatible with those that precede them, and so if your app, directly or transitively, depends on module A via versions v1.3.2, v1.6.4, and v2.8.7, then the toolchain will use v1.6.4 and v2.8.7 together.

The main downside of this arrangement is that asking users to change URLs to achieve a major version upgrade is considered a painful usability issue. Go mitigates this somewhat by, culturally, trying very hard not to break backwards compatibility: where a non-Go project might be at v6, a typical equivalent Go project may still be in v1, with a more deliberative design pace, and introducing parallel revisions of functionality in the same major version.

It sounds like you're already counting on each Roc file needing to perhaps change matching import paths in order to perform any upgrade at all. This could be very painful, or it could be a trivial non-issue, probably just depending on having a streamlined roc subcommand to manage in-place updates to imports.

I believe Go's tradeoffs are, overall, sound. It could use better tooling around major version upgrades, but it does entirely eliminate the "resolving dependencies" spinners that hurt the productivity of many other (classic) dependency management systems. In summary, the tradeoffs Go made in versioning are:

Treat major versions as entirely different modules.
For security and stability reasons, never implicitly upgrade to latest minor/patch revision without the user specifically requesting it. Later versions may have more bug fixes in the non-malicious case, but are also the most likely to be compromised. Put another way, use the minimum version of a given module that satisfies the build transitively (i.e. the latest version explicitly mentioned transitively, regardless of what's _available_). This also favors not pinging/fetching unless asked for by the user.
Arbitrary dependency specifiers, like <=, >=, ~>, etc, are not allowed (as those, in the general case, especially across major versions, require NP-complete computations to solve optimally, and especially complex dependency-solving systems). The only specifier Go permits is the minimum version concept.

Afaict, the overall solution (the dependency resolution algorithm) depends equally and intrinsically on the above 3 choices. If they picked only two or fewer, then it'd either be back to computationally-intensive dependency solving, or other equivalent tradeoffs would need to be chosen.

https://research.swtch.com/vgo-mvs describes these choices in more depth, as does part 3 in that same linked series (parts 3 and 4 are more about the choices and implications than they are about anything especially specific to Go). That outcome was the result of a couple years of planning and consideration of tradeoffs, and faced with some of the same challenges and goals that it looks like you're thinking about.

Kevin Gillette (Nov 13 2022 at 03:32):

I don't believe I yet saw thoughts in that doc on version management. If I depend on A and B v1.2.3 and A itself in turn depends on B v1.4.5, what might Roc do to choose [or not choose] between those versions of B? For pure Roc modules, presumably it doesn't _really_ hurt to have both, since the modules can't have side effects, though if Roc is enforcing semver relationships (i.e. a patch can't introduce anything new, and a minor bump can't change or remove existing declarations), then it also stands to reason that the app shouldn't _need_ both versions compiled in either.

What about multiple major versions of a dependency within the same project?

What about for other dependency types? I'm guessing, at least for the foreseeable future, the platform can only be chosen by the app (i.e. entrypoint), and thus there can't be multiple conflicting versions there.

Richard Feldman (Nov 13 2022 at 16:43):

sorry, I mean that it's not machine enforced that if I upgrade from v1.0 to v1.1 there won't be any type mismatches - I don't think that can even theoretically be guaranteed without either centralization or the local client downloading multiple (potentially very many) versions for each release :sweat_smile:

Kevin Gillette (Nov 13 2022 at 16:45):

Indeed correct, that is not machine enforced for Go

Richard Feldman (Nov 13 2022 at 16:47):

as far as versioning goes, I think that's a question for a centralized index; I consider it out of scope for https packages because I don't want to go the route Go did and try to impose a versioning scheme on arbitrary URLs

Richard Feldman (Nov 13 2022 at 16:47):

rather, with those I'm only concerned about security and convenience

Joshua Warner (Nov 14 2022 at 16:06):

For the compressed package format and hash format, it could be interesting to design the system such that the compiler can stream the package, receive some initial incremental hashes (in merkel-hashing fashion), followed by headers, followed by source, followed by compiled binaries. This way it can receive just the merkel hash tree, verify the hash of the header, and immediately start acting on it / downloading transitive dependencies, etc - before downloading the full archive and verifying it.

Richard Feldman (Nov 14 2022 at 16:18):

huh! I'm not familiar with merkel hashing, but sounds interesting!

Joshua Warner (Nov 14 2022 at 16:21):

The gist is that instead of the hash in the URL being a literal sha256 hash of the whole archive, it's actually a hash of (header_hash, source_hash, rest_hash) (each of which is a sha256 hash of that part of the archive / those files). The archive itself starts with those three hashes, followed by the data for the header, data for source, and data for the rest. As soon as the three hashes and the headers are received, the compiler can compute sha256(header_hash, source_hash, rest_hash) to verify it matches the url, and sha256(header) to verify it matches header_hash, and then immediately start acting on the content in header - without having downloaded the rest of the archive.

Brian Hicks (Nov 14 2022 at 16:23):

to put it even simpler: Merkle hashes are basically just trees with hashes attached. You can send as many or as few levels of the tree as you want and do a relatively efficient diff to get only the parts of the subtree you need.

Brian Hicks (Nov 14 2022 at 16:24):

bonus: there is a lot of prior art as to how to implement them because they're the "chain" in blockchain. Why does crypto get all the cool data structures grumble grumble.

Joshua Warner (Nov 14 2022 at 16:24):

Also, git is a big merkel-hash tree :)

Brian Hicks (Nov 14 2022 at 16:25):

oh right, I forgot that! Whee!

Folkert de Vries (Nov 14 2022 at 16:31):

just to clear this up, you mean "merkle tree" right? (Merkel is the former leader of germany)

Brian Hicks (Nov 14 2022 at 16:44):

no we mean a Merkel tree. Here's a picture of a bunch of birds on one. image.png

Brian Hicks (Nov 14 2022 at 16:44):

(yes Merkle)

Last updated: Jul 23 2026 at 13:15 UTC