I'd like to remove the file extensions from package URLs, and instead use only HTTP headers and the contents of the file to determine how they're compressed.
One motivation for this is that with the current design, it's impossible to upgrade to a better compression format in the future without breaking package URLs. For example, someone might want to first upload with gzip to get a bugfix or security release out as fast as possible, because brotli compression can take awhile, and then re-uploading later with brotli at maximum compression, without breaking anyone.
Also, some hosting providers or CDNs might offer built-in brotli compression, so having to pre-compress with brotli when publishing takes unnecessarily long and is redundant with the compression the hosting provider will do automatically.
I think both the HTTP headers and the contents of the file should be used because some hosting providers don't let you customize the HTTP headers - e.g. GitHub Releases doesn't natively support brotli and doesn't give you a way to customize headers, and that's how almost all Roc packages are likely to be hosted, so if we didn't support doing the compression yourself (via roc bundle), then we just wouldn't get that compression at all in the most common case
a challenge is that although it's trivial to detect if the file is compressed with gzip (all gzip streams begin with the bytes 0x1F 0x8B specifically to identify them as being gzipped), the same is not true of brotli, which doesn't have signature bytes at the beginning as part of its specification
however, there's a proposal to introduce this which includes 4 initial signature bytes (0x91, 0x19, 0x62, 0x66) that are not valid brotli, and therefore can't be mistaken for header-less brotli
that looks like a well-reasoned proposal, but the most popular Rust crate for brotli uses a different one (0xE1, 0x97)
also Mark Adler, coauthor of gzip, proposed (apparently at Google's request) a standard for this back in 2016, which also used a different magic number (0xCE, 0xB2, 0xB2, 0x81)
what's even more confusing to me is that it seems like all of these are using incorrect magic numbers? The brotli RFC says that the stream header (which seems to be the very beginning of the brotli stream) begins with 7 bits, and:
Note that bit pattern 0010001 is invalid and must not be used.
so that means the two bit patterns that are invalid for the first 8 bytes are those 7 invalid bits plus either 0 or 1 as the eighth bit, which works out to be 0x22 and 0x23
If the hash of the archive is then only checked after compression that does mean a file can be replaced with a zip bomb.
true, but I think we need to defend against zip bombs in general because someone might just publish a new release of a package that's a zip bomb
given all that, I don't like the idea of trying to use signature bytes to identify a brotli-encoded file. It seems like they don't have that sorted out yet, and anything we pick is likely to be incompatible with a future official signature.
a few possible designs come to mind here:
https://blog.cloudflare.com/results-experimenting-brotli/ says that based on Cloudflare's benchmarks:
On average, Brotli at the maximal quality setting produces 1.19X smaller results than [gzip] at the maximal quality.
also maybe file extensions are just the way to go for now :stuck_out_tongue:
Another potential security issue comes to mind (but might apply regardless): compression implementations often have security problems of their own, such as remote code execution. Does the decision of a pre-compression vs post-compression hash change our susceptibility to such attacks?
Also, iiuc, zstd was designed with similar goals to brotli, but often has slightly better compression ratios, and perhaps wider adoption. It definitely has a magic byte sequence at the beginning.
It is kinda interesting that brotli is targeted for web, but zstd is not really. I wonder if there is any meaningful difference because of that.
Last updated: Jun 16 2026 at 16:19 UTC