automatic code coverage · ideas · Zulip Chat Archive

Stream: ideas

Topic: automatic code coverage

Richard Feldman (Dec 31 2025 at 16:51):

after reading the best article I've ever read on contemporary practical AI use, I'm sold on 100% test coverage being a good idea now (when I've never cared about it before) only because it's a dependable feedback loop for ai agents

Richard Feldman (Dec 31 2025 at 16:52):

the specific point the article made that I found compelling is that any percentage lower than 100 introduces a judgment call, which lets the agent be like "I've decided testing this case wasn't important so I skipped it" whereas with 100%, missing tests fail the build, and I can see in review if the tests are good or not

Richard Feldman (Dec 31 2025 at 16:53):

this makes me want to make roc check just automatically collect and show branch coverage statistics, with an optional --required-coverage flag that makes it emit a nonzero exit code if the coverage is too low

Richard Feldman (Dec 31 2025 at 16:55):

having it be a configurable threshold means that you can use it as a ratchet if you're working towards 100%, and also you can set it to 0 if you don't care (which I suppose we could use to make it opt-in rather than opt-out)

Tobias Steckenborn (Dec 31 2025 at 16:58):

I'm not sure if 100% is what should always be strived for, but yeah, could make sense. Here also tooling could help (see e.g. Wallaby.js references made here in the past) that show in the gutter of the editor the state of the test and partially the coverage (where all branches run). I think they now also try to feed the tests results via an mcp (https://wallabyjs.com/docs/features/mcp/).

Richard Feldman (Dec 31 2025 at 17:01):

the thing that I found compelling about the article was "100% or nothing so the agent is compelled to test everything or else the build fails"

Richard Feldman (Dec 31 2025 at 17:02):

I've personally never found code coverage helpful in the past, but I buy that argument for "100% or don't bother" when it comes to agents specifically :smile:

Tobias Steckenborn (Dec 31 2025 at 17:03):

The thing with 100% is that it likely depends on the use case. Speaking about a server with a defined API, 100% can be nice. But what does that mean in terms of visuals? What's 100% testing of a User Interface or graphics? Atleast there it might not be as clear cut I assume. Is it finding some element in a DOM. Is it some sort of snapshot testing against different sizes of the interfaces? 🤷🏼‍♂️

Richard Feldman (Dec 31 2025 at 17:05):

it's an excellent question!

Tobias Steckenborn (Dec 31 2025 at 17:05):

Also 100% could mean quite some long running CI or waiting after each line change I assume if there's no caching involved that somehow knows which part of the code might change and which tests need to rerun.

Richard Feldman (Dec 31 2025 at 17:06):

I don't know the answer. It's possible that 100% coverage makes sense for some parts of the code base and not others.

Like I could see "these directories are 100% or else fail, and for these others we don't care"

Tobias Steckenborn (Dec 31 2025 at 17:06):

Yep, that could make sense. As long as the LLM then doesn't work around it, by putting it somewhere else :D

nandi (Dec 31 2025 at 19:42):

Richard Feldman said:

after reading the best article I've ever read on contemporary practical AI use

this part stood out the most to me besides the testing

Additionally, prefer many small well-scoped files.

It improves how context gets loaded. Agents often summarize or truncate large files when they pull them into their working set. Small files reduce that risk. If a file is short enough to be loaded in full, the model can keep the entire thing active in context.

Im constantly running out of context heh.

nandi (Dec 31 2025 at 19:44):

relevant video on context management: https://www.youtube.com/watch?v=rmvDxxNubIg

Richard Feldman (Dec 31 2025 at 19:47):

oh, many agents can handle this fine - they just see the file is too big and read a subset of it instead

Brendan Hansknecht (Dec 31 2025 at 22:07):

I find that often with big files they definitely hit blind spots currently. The file is too large, they only read part of it, they misunderstand patterns and build something less good. I think when they read full files they seem to understand context and code style better.

Brendan Hansknecht (Dec 31 2025 at 22:07):

I speak from mostly claude experience.

Brendan Hansknecht (Dec 31 2025 at 22:08):

They are getting better at large files, but I definitely have seen many cases of it being worse for them.

Brendan Hansknecht (Jan 01 2026 at 17:40):

I am currently working on a 100% agent coded side project where I am trying to use a lot of the tips from folks at work to see how it all feels. It started by getting nerd sniped by a friend the other day and chatting with perplexity about design for like a day. Then trying to aggregate that and design principles into a doc and starting to code with Claude from there. Lots of detailed prompts and guidance.

I think I may test 100% code coverage with that project to see how it goes. I also need to start using work trees in that project. Have had cases of competing Claude instances on the same project. Though generally they just plan in parallel and I only let one edit at a time.

I also feel that I need a better way to review the code. Just looking at local diffs and reviewing them is not the best. I probably should use some sort of PR like flow to so that I can give Claude better reviews and scrutinize the code more.

This is a rust project mostly due to dependencies (would have preferred zig though).

nandi (Jan 01 2026 at 18:10):

Brendan Hansknecht said:

This is a rust project mostly due to dependencies (would have preferred zig though).

Ive been vibing dependencies away lately. i dunno if that makes my code more stable but I tell myself it does lol

Isaac Van Doren (Jan 01 2026 at 19:56):

this makes me want to make roc check just automatically collect and show branch coverage statistics

Does this mean roc check will be running the tests now? Maybe that was already decided and I missed it

Isaac Van Doren (Jan 01 2026 at 20:03):

A big limiter for the value of code coverage is that it doesn't say anything about if your assertions are good. I constantly see tests that cover the code but have very limited assertions that don't actually check if the behavior well. You could get 100% code coverage without a single assertion in your test suite which is of course useless.

I know the intention is that the human will review the tests to see if they are actually useful, but given how mischievous agents are at times, I suspect they will still often play tricks like writing minimal assertions to get around subtle issues.

Isaac Van Doren (Jan 01 2026 at 20:14):

Every untested change is now visible. Unknown gaps -> known gaps. Full accountability

I really disagree with this statement from the article. Even if your assertions are comprehensive, there are always going to be tons of untested cases even with 100% code coverage. For example, coverage systems usually only check if a line has been executed, but that single line could be a ternary expression where only one of the branches is ever executed by your tests. Some systems allow for branch-level coverage to track cases like ternary operators, but that still doesn't help much because you only test a limited set of inputs (e.g., your function has 100% code coverage, but on some untested inputs you get a division by zero error). This also doesn't say anything about if you are using external libraries properly.

Isaac Van Doren (Jan 01 2026 at 20:19):

I also worry that requiring 100% coverage in a project will make it much more difficult to contribute to without using AI which isn't a sacrifice I want to make at this point

Brendan Hansknecht (Jan 01 2026 at 20:52):

The question is:

Is AI with 100% code coverage enforcement better or worse than AI without it?

Brendan Hansknecht (Jan 01 2026 at 20:52):

I would guess better, but that is just a guess.

Brendan Hansknecht (Jan 01 2026 at 20:53):

Ive been vibing dependencies away lately

My dependencies are deep learning related and very heavy. So not something to vibe away.

Richard Feldman (Jan 01 2026 at 22:34):

Isaac Van Doren said:

this makes me want to make roc check just automatically collect and show branch coverage statistics

Does this mean roc check will be running the tests now? Maybe that was already decided and I missed it

oops sorry, I meant to say roc test there!

Last updated: Jul 23 2026 at 13:15 UTC