after reading the best article I've ever read on contemporary practical AI use, I'm sold on 100% test coverage being a good idea now (when I've never cared about it before) only because it's a dependable feedback loop for ai agents
the specific point the article made that I found compelling is that any percentage lower than 100 introduces a judgment call, which lets the agent be like "I've decided testing this case wasn't important so I skipped it" whereas with 100%, missing tests fail the build, and I can see in review if the tests are good or not
this makes me want to make roc check just automatically collect and show branch coverage statistics, with an optional --required-coverage flag that makes it emit a nonzero exit code if the coverage is too low
having it be a configurable threshold means that you can use it as a ratchet if you're working towards 100%, and also you can set it to 0 if you don't care (which I suppose we could use to make it opt-in rather than opt-out)
I'm not sure if 100% is what should always be strived for, but yeah, could make sense. Here also tooling could help (see e.g. Wallaby.js references made here in the past) that show in the gutter of the editor the state of the test and partially the coverage (where all branches run). I think they now also try to feed the tests results via an mcp (https://wallabyjs.com/docs/features/mcp/).
the thing that I found compelling about the article was "100% or nothing so the agent is compelled to test everything or else the build fails"
I've personally never found code coverage helpful in the past, but I buy that argument for "100% or don't bother" when it comes to agents specifically :smile:
The thing with 100% is that it likely depends on the use case. Speaking about a server with a defined API, 100% can be nice. But what does that mean in terms of visuals? What's 100% testing of a User Interface or graphics? Atleast there it might not be as clear cut I assume. Is it finding some element in a DOM. Is it some sort of snapshot testing against different sizes of the interfaces? 🤷🏼♂️
it's an excellent question!
Also 100% could mean quite some long running CI or waiting after each line change I assume if there's no caching involved that somehow knows which part of the code might change and which tests need to rerun.
I don't know the answer. It's possible that 100% coverage makes sense for some parts of the code base and not others.
Like I could see "these directories are 100% or else fail, and for these others we don't care"
Yep, that could make sense. As long as the LLM then doesn't work around it, by putting it somewhere else :D
Richard Feldman said:
after reading the best article I've ever read on contemporary practical AI use
this part stood out the most to me besides the testing
Additionally, prefer many small well-scoped files.
It improves how context gets loaded. Agents often summarize or truncate large files when they pull them into their working set. Small files reduce that risk. If a file is short enough to be loaded in full, the model can keep the entire thing active in context.
Im constantly running out of context heh.
relevant video on context management: https://www.youtube.com/watch?v=rmvDxxNubIg
oh, many agents can handle this fine - they just see the file is too big and read a subset of it instead
I find that often with big files they definitely hit blind spots currently. The file is too large, they only read part of it, they misunderstand patterns and build something less good. I think when they read full files they seem to understand context and code style better.
I speak from mostly claude experience.
They are getting better at large files, but I definitely have seen many cases of it being worse for them.
I am currently working on a 100% agent coded side project where I am trying to use a lot of the tips from folks at work to see how it all feels. It started by getting nerd sniped by a friend the other day and chatting with perplexity about design for like a day. Then trying to aggregate that and design principles into a doc and starting to code with Claude from there. Lots of detailed prompts and guidance.
I think I may test 100% code coverage with that project to see how it goes. I also need to start using work trees in that project. Have had cases of competing Claude instances on the same project. Though generally they just plan in parallel and I only let one edit at a time.
I also feel that I need a better way to review the code. Just looking at local diffs and reviewing them is not the best. I probably should use some sort of PR like flow to so that I can give Claude better reviews and scrutinize the code more.
This is a rust project mostly due to dependencies (would have preferred zig though).
Brendan Hansknecht said:
This is a rust project mostly due to dependencies (would have preferred zig though).
Ive been vibing dependencies away lately. i dunno if that makes my code more stable but I tell myself it does lol
this makes me want to make
roc checkjust automatically collect and show branch coverage statistics
Does this mean roc check will be running the tests now? Maybe that was already decided and I missed it
A big limiter for the value of code coverage is that it doesn't say anything about if your assertions are good. I constantly see tests that cover the code but have very limited assertions that don't actually check if the behavior well. You could get 100% code coverage without a single assertion in your test suite which is of course useless.
I know the intention is that the human will review the tests to see if they are actually useful, but given how mischievous agents are at times, I suspect they will still often play tricks like writing minimal assertions to get around subtle issues.
Every untested change is now visible. Unknown gaps -> known gaps. Full accountability
I really disagree with this statement from the article. Even if your assertions are comprehensive, there are always going to be tons of untested cases even with 100% code coverage. For example, coverage systems usually only check if a line has been executed, but that single line could be a ternary expression where only one of the branches is ever executed by your tests. Some systems allow for branch-level coverage to track cases like ternary operators, but that still doesn't help much because you only test a limited set of inputs (e.g., your function has 100% code coverage, but on some untested inputs you get a division by zero error). This also doesn't say anything about if you are using external libraries properly.
I also worry that requiring 100% coverage in a project will make it much more difficult to contribute to without using AI which isn't a sacrifice I want to make at this point
The question is:
Is AI with 100% code coverage enforcement better or worse than AI without it?
I would guess better, but that is just a guess.
Ive been vibing dependencies away lately
My dependencies are deep learning related and very heavy. So not something to vibe away.
Isaac Van Doren said:
this makes me want to make
roc checkjust automatically collect and show branch coverage statisticsDoes this mean
roc checkwill be running the tests now? Maybe that was already decided and I missed it
oops sorry, I meant to say roc test there!
Last updated: Jun 16 2026 at 16:19 UTC