AI and Quality

I asked for, and got, some suggestions on what would be good AI related blog posts. My heads still fuzzy on points to post about; so I hoped this would get something in my head.
Hasn't really. Nothing I think will be significant. There was one; about quality.

How AI can improve software quality? Especially within disciplined practices like TDD, XP, and CD.

Which... I've got thoughts; pre-dating the AI bullshit. As you'll read below; AI doesn't change the approach.

AI doesn't improve software quality. I don't think it can. I don't even think it's neutral; I think AI is an active thread to quality code.
AI cannot produce quality code. Many developers cannot produce quality code. I get lazy and don't produce the quality I want to see.

How do we protect code quality? I don't care about the source.
How can we protect code quality when the amount of code generated increases? AI or a horde of low skilled developers - same problem. Crap code is being pumped out - how do we protect the code quality from it?

My answer is currently tooling and reviews. I'll get to reviews in a bit.

I've implemented built in tools; warnings as errors; test's fail if not at 80%. I'm putting conditions in place that DRIVE the code towards the implementation quality that I want. You can see most of what I do (here)[https://github.com/Fyzxs/MtgDiscoveryClaude/tree/main/csharp/src]. Particularly the .editorConfigs and Directory.Build.props. You may notice I said 'implementation quality'; not code quality.I have high doubts that we'll see AI be able to hit the areas of quality matter most - Architecture and Design.
I haven't figured out how to get AI to generate the architecture I have in mind. Maybe I suck at prompting... buuuuutttttt... I give it examples. Literal fully implemented examples of architecture and implementation - and it fucks it up. Most software is not quality code. The LLMs predictions are not based on high quality code.

All that said - I clearly don't care about AI in regards to quality. I think it's fundamentally the wrong question. I want to know how to put enforcement into place that require the code to be written in a way that's not; fundamentally; going to annoy the fuck out of me. My own quick code included.

I don't have 3rd party stuff integrated into the projects yet; just .editorConfig and MSBUILD. It's 90% of where I want it. There's still a shit ton of BAD code generated - but when it fails to build, the AI fixes it. And it keeps fixing it until it passes as high of a bar as I can currently produce.

The final code - absolute shit. It's written almost exactly like I want to see code written; architecture and design - garbage. I provide architectural and design IMPLEMENTATIONS and it gets it wrong.

I treat this like I would any FOUNDATION of a project - we're going to spend as much time on it to get it right as we need. Every design shortcut accepted, every architecture deferred compounds the pain. It makes the project take longer because small things take longer.

Once the foundations are correct; I don't have to worry about them. They are good. I just have to worry about the next layer up. Ports and Adapters!

The reviews... take time. Take iterations.
The 'implementation quality' I mentioned before has the code in a state that doesn't piss me off so I can effectively review the structure. I've rejected PRs because the code wasn't "ReSharper Green". If I can't see the forest because I have to check every tree for rot - I'm getting a bit pissy.
The tooling gets the code into a state that I can see the architecture from the classes.

I did my Magic card site via Claude Code. Barely wrote a line of it; but I damn sure paid attention to the implementation, code, and architectural quality. I gave A LOT of corrections when reviewing the code. The foundations need to be there. And despite examples, prompts, guides - it still fucks it up. Normally the issue is fairly contained; my MicroObjects style creates very small blast radius for incorrect implementations. This allows for quick fixes later.

But I spend almost as much time during the creation and review process as it would take me to write it myself. As I become more proficient at constraining the LLM, I think that's shifting a bit in the LLM's favor. I've got a new project I'm doing the same on... we'll see how I feel after the foundational work...

I'm not the first to say it; but the engineers that produce high quality code can get high quality code out of LLMs; by virtue of understanding what high quality code IS. Engineers that can't - will never be able to.

Which, honestly, in the industry, quality doesn't matter to the ones making the decisions, functionality does. AI gets you functionality fast... and then you plateau... and have issues... and unmanageable code.
If the code produces doesn't look like a human wrote it; a human will struggle to work in it. ... and I've worked in code I know a human wrote that didn't look like a human wrote it.

We need tooling to enforce practices. I don't know how I'd like to enforce architecture. I LOATHE the idea of building the 'architectural' tests that enforce the architecture patterns... even when the architecture is very aligned with how those function. I see them as friction when needing to change the architecture. We don't have "The Right (TM)" architecture. We strive for the best one we know works at this time and place. If you put tests around everything that dictates they "BE THIS WAY (TM)" - then change is hard. The system doesn't grow; it starts to decay.

Even with architecture enforcement - we need the human to drive the architecture decisions. I've had really good use of Claude Code to engage in architectural discovery - it's ideas area almost always wrong; but it will present things that give me an idea that gets me towards the better architecture.
If we want to enforce architecture - who's saying what that architecture is? The LLM is gonna be SO wrong. Even if it predicts the right words - it doesn't understand them; it can't use the predicted architectural description to produce code that aligns to it.

I think this is a good point to say that I'm pretty damn vocal against the hype of AI to write code. I think "AI can do it" is a bullshit position; it's a lie. If you believe it can, you don't understand software engineering.
That said; the same thing applies to a Jr Developer; they can't either. The same oversight and correction needs to occur. The difference is that the LLM have no capability to learn.

The short answer is that AI tools will amplify the output at the quality level the developer cares about. We tended mitigate this with team structure and size. Cheap devs started to tip the scales into a larger volume of shit code; AI tips it even further to be able to produce more.

I think we can get enough constraints on the code thought tooling and AI (like AI reviews) that the implementation quality is high. Which leaves the skilled engineers to focus on architectural aspects; and ... by the gods we need to train the jrs HARD into what makes good architecture. The form is not enough; the understanding of why is the only way to avoid becoming techno-priests.

Practical Examples - the prompt asked for practical examples or lessons from real world usage.

Real World Lesson - AI sucks. It's dumb as the rocks it's made out it. You have to work it so much to get it into the shape you want; You have to KNOW what shape you want to know when it's not it.

The biggest lesson I learned - Do not fit your workflow to AI; fit AI to your workflow.
I tried a few times to do the 'soup-to-nuts' approach with AI... fucking sucked every single time. It was early in my AI efforts. I saw that it was CAPABLE of something useful... but it's not ready to go.
So I stopped that. I switched to the flow I used, but instead of me typing the code I was thinking of; I had the AI generate it... then a lot of "no, this way" cycles to get it right.
I use Claude Code because it requires the least cycles of correction.
As my comfort and understanding of how AI wanted to behave, the space I gave it grew. Instead of a method; a class. Then a module, then a project, then multiple proje.... oh dear gawd what the hell is THAT?! ... claude still doesn't get to go beyond project level independently. This is as far as I can trust it. It's as far as it's currently CAPABLE of doing correctly.

As I mentioned, or hinted at; I have a very strictly defined architecture across 7 layers. Each layer has it's distinct flavor of a common architecture. I have a "reference implementation" at each layer. With all of that - it's still, consistently, done wrong. I can't trust it beyond a single layer (or a single + a simple). The context is unable to produce the correct results for multiple different layers. Which... yeah... fundamentally; I'm asking it use the same process to predict the missing number in both patterns, 1, 2, 3, ?, 5, 6, 8 and 13, 21, 34, ?, 89, 144. I consider it impossible. Once contexts are bigger, or I do some RAG work locally - MAYBE. I've already had to curate a lot of context and memory files to get it capable of a single layer; which it will also consistently miss expected things.

Lesson: Use what you know to work in spaces faster. I don't do front end. Can I? Yes. Just not where I like to be. Built UIs for apps and sites. But I can guide a LLM on building a front end into a good architecture w/o being GOOD at the tech stack.
Would someone good at react be able to make it better than I could with the LLM? Without a doubt. I think what I have is pretty good; in every way... except the react tech. That's probably gonna make some dev's head explode from whatever the LLM generated.
I had to guide the LLM at times to revert, or take a different approach, because I understand enough of the space to do that. I can do more faster with the LLM in spaces I don't know well than I could alone. Absolutely. My next project is, essentially, an experiment in that.

The earlier linked MtgDiscoveryClaude repo is my not-quite-a-product; Claude Code implementation.

A couple other points were in the prompt; TDD and XP.
XP is mostly social practices. There's technical practices as well... which are almost all included in my MicroObjects practices. Which are mostly followed by the LLM output. It also makes it easy for me to see what's not right; because there are very clear expectations defined. It TRIES... it just doesn't always.

TDD - I don't TDD with the AI. By the time I trusted Claude Code to write a test, I trusted it to do a module.

Why do we do TDD? Primarily to force good practices into the code. Fast feedback is there; but that helps us to see when we've lost the good practices. It's there to give us a safety net when we refactor. TDD is a guide and a safety net for getting and maintaining the code according to the practices we want to use.
Here's the part I don't share TOO openly... my style makes it REALLY hard to not have those things in the code. TDD isn't needed as the guide. Tests existing are definitely still valuable for the safety net - but when branching is minimal and methods are short. The test after is still EASILY exhaustive of the functionality. Also... with MicroObjects, when doing it manually, you don't refactor. You want new behavior, you write a new object. So... refactoring... changes meaning.
Work; yes TDD - but I don't have reliable access to an AI agent that can write a test.

I skip TDD on personal projects; because I'm focused on other things. One of my earliest posts on this site, possibly the first, is about doing a simple android logger w/o TDD and with TDD. TDD made it better. Fundamentally. Much like pairing and mobbing have ALWAYS improved the output; TDD also always makes it better.
I use the idea of limiting learning efforts to a primary and secondary focus. Personal projects are for learning; and I'm not usually using them to learn TDD.

... In summary; AI will be a detriment to software quality unless we constrain,and review ,the hell out of it; while enforcing good architectural decisions.

AI and Quality

Quinn Gil

Topics