Building a Reinforcement Learning-Based Chess Bot

I’ve had to explain this many times.

When I say, “I enjoy playing chess,” it does not mean I’m good at it.

To tell the truth, sometimes chess frustrates me more than anything else. It’s especially maddening when I’m in a completely winning position… and blunder mate-in-two. (Grr.)

It was during one such catastrophe — I’d just lost my queen to a particularly nasty skewer — that it struck me.

I had to get back at the people who’d beaten me.

And I don’t mean challenging them to a rematch and winning. I needed to steamroll them. Wipe the floor with them. You get the picture.

Wait… so you think the way to get back is to make a chess bot?

Yes, dear reader, I can hear your thoughts. Also, to answer your question — yes.

Why spend time learning how to play chess, when I can ask someone (something) else to play for me?

cue maniacal laughter

Ahem. Back to the topic at hand.

The Legend of AlphaZero

The idea that I should build a chess bot of my own came years ago, when AlphaZero shocked the chess world in 2017 by defeating Stockfish — then the strongest engine in existence — in a 100-game match. (Read more here.)

For those unfamiliar:

Stockfish was a brute-force evaluation machine, trained with massive databases of openings and endgames
AlphaZero was taught only the rules of chess
Using deep reinforcement learning and self-play (millions of games), AlphaZero taught itself how to play
The result? After four hours of training, it played 100 games against Stockfish… and didn’t lose a single one (28 wins, 72 draws, 0 losses).

I suppose an analogy is in order. Quoting FM MikeKlein (the author of the aforementioned article):

This would be akin to a robot being given access to thousands of metal bits and parts, but no knowledge of a combustion engine, then it experiments numerous times with every combination possible until it builds a Ferrari.

Now, after a year of coding “Hello, World!” in ten different programming languages, I felt confident enough to tackle a project of this size.

Stage 1: Random Nonsense

I decided to start from the basics. Create a chess bot that can play random (legal) moves.

Simple enough — I used the excellent python-chess library for board generation and move validation.

It played terribly.

Naturally, I was proud.

Stage 2: Minimax

Next, I leveled up to the minimax algorithm — a classic decision-making algorithm used not only in chess, but in many two-player, turn-based games.

Here’s the basic idea:

You try to maximize your position (make the best move for yourself)
Your opponent tries to minimize it (make the best move for themself, which is the worst move for you)

So the bot evaluates all legal moves, assumes the opponent will respond optimally, and picks the move that leads to the best outcome — given that worst-case response from the opponent.

How does the bot know what moves are good and what moves are bad?

Excellent question. That’s what we have heuristic evaluation functions for.

I started with a simple one that took into account:

Material value on the board
Avoiding getting checkmated
Controlling the center

Very crude, but it served my purposes.

Stage 2.5: Alpha-Beta Pruning

Minimax is very powerful — in fact, it’s possible to make an unbeatable tic-tac-toe bot with it, by playing through all possible positions (often called states).

But chess?

Chess has approximately 10¹²⁰ possible states, and checking these many states in advance would be a… knightmare. (I’m sorry, I couldn’t help it.)

So, I added alpha-beta pruning, which skips entire branches of the search tree.

In short, if one option is already worse than another, we don’t waste time exploring that option any further.

This lets me search deeper into the game tree — meaning the bot can think four, five moves ahead — without melting my computer.

Next: Reinforcement Learning

Minimax is cool and all, but the bot will be limited by the heuristic I give it.

Going further means letting the bot learn from self-play, like AlphaZero did.

To make training faster and debugging easier, I’m starting on a downsized 5×5 board:

Fewer pieces
Smaller state space
Faster iterations

Once I get that working, I’ll scale it up to the full 8×8 board and start exploring further.

(Stay tuned for those updates.)

Final Thoughts

That’s it for now. I’ll be back in two weeks, with updates on the (hopefully) upgraded bot.

Thanks for reading.

If you’re interested in:

The bot’s progress
Reinforcement learning
Or just want to challenge me to a friendly game of chess…

Feel free to reach out here.

Just know that, if I lose, I’m unleashing KnightmareProtocol on you.

😈

The Queen is Dead. Long Live My Chess Bot.

Table of contents