effective AI assisted development

AI assistants are great at spitting out code at lightning speed. One of the reasons that this has not so far led to a massive increase in developer productivity is that reading someone else's code is not cheap. It requires only a little less attention and brainpower than actually writing the solution yourself.

AI assistants are better reference copilots than conversational pair partners today. Their interface is too slow -- typing is slower than speaking, and verbal communication with an LLM is sketchy at best -- and their execution is too fast. Instead I like to think of them as a powerful user interface to a massive reference manual.

They produce information based on natural language queries
They can combine information from multiple entries on the fly (perfect for instruction)
Their output typically requires interpretation and cannot be used as-is

Getting the most value out of AI assisted development comes down to one thing: discipline.

Why discipline beats speed

Discipline is a combination of two things: you need a set of guidelines, and you need the capability to adhere to those guidelines. Capability encapsulates your technical ability, your intrinsic motivation, and your confidence that your guidelines are the right ones.

The discipline that lets me work effectively with AI assistance is canon TDD.

The initial step in TDD, given a system & a desired change in behavior, is to list all the expected variants in the new behavior. "There's the basic case & then what if this service times out & what if the key isn’t in the database yet &..." This is analysis, but behavioral analysis. (Kent Beck, Canon TDD)

This list gives your AI assistant essential context. It can suggest things you missed based on this list, and you can quickly verify its suggestions. As long as the model has the test list plus the relevant test & prod code for the feature you're changing, it has all the context it needs.

A short kata with AI

I worked through the bowling game kata following the canon TDD steps, using Claude Sonnet 3.7 as my assistant. In this kata you're creating a Game class with two methods roll(pins_hit: int) -> None and score() -> int based on the rules of ten-pin bowling.

Getting started: the test list

Some basic behaviours that I need were easy to write:

- [ ] A player's score is the sum of all normal rolls (no spares/strikes)
- [ ] A player cannot roll after a game finishes

The rules about bonus points after a spare or strike are tricky, so I gave the assistant a very specific prompt to come up with test list entries specifically on those topics.

- [ ] After a spare, the player's next roll should count double
- [ ] After a strike, the frame should end and the player's next two rolls should count

The assistant came up with the "should score double" logic all by itself. According to the rules of 10-pin bowling, the bonus points after a spare or strike should be added to the frame in which the spare or strike was made; but since we're only keeping track of the game score, not the score per frame, simply counting the bonus rolls double fulfils the requirement.

commit eb696a6

Getting into it: the first test

The first behaviour on my test list is very easy to do and fundamental, so I start with this. The order in which you think of behaviours is not necessarily the best order to write tests in, but in my experience the first behaviour you think of is almost always a suitable first test.

# test_bowling.py -- imports omitted
def test_after_two_normal_roles_score_is_sum_of_rolls():
    game = bowling.Game()
    game.roll(3)
    game.roll(5)
    assert game.score == 8

# bowling.py
class Game:
    def __init__(self):
        self.score = 0

    def roll(self, pins_hit: int) -> None:
        self.score += pins_hit

Both the test and the prod code are so straightforward that I quickly bash this out without considering asking the assistant. Saving a few seconds here by asking the AI to write test & code for the first case has no real benefit. There's also nothing to refactor here -- newing up a Game instance in the body of the test will probably become obsolete very soon, but right now it's fine.

I update the test list and move on.

commit 841fba0

Keeping it rolling

Going back to my test list, I try to implement the bonus logic: what is the score after a spare plus a normal roll? Turns out this is way harder than I thought. Instead I implement "After ten frames the game should end", changing the naive implementations of both roll and score in a way that passes all tests.

commit d441a2b

For the next change I accidentally do a little too much. I realise that in order to correctly check whether a game is finished, I need the concept of a frame. I also need this to handle the case of throwing a strike. I should have realised I was trying to do too much and thought of a smaller behaviour to test first. Instead I ended up struggling for thirty minutes before getting things working and looking good.

commit 4e05a34

Use AI for leverage

Using AI assistants effectively is about harnessing their potential. Yes, they can do much more much faster than any human. That doesn't mean that they should. AI accelerates work within constraints. Without constraints it amplifies waste.

When used effectively, AI can give incredible benefits.

Reduced cognitive load: longer, steadier focus
Greater speed of execution: more done in the same time
Increased quality of implementation: making changes is easier

As long as humans are an essential part of the software development lifecycle, the goal should never be to let AI take over generation of all code. Doing that will only make developers less effective at doing the things that AI cannot do autonomously.

Harness your AI assistant

Table of contents