…and I mean it.

Working till midnight, only to realize I’ve created two functions that call each other in a disastrous loop — then having to start from scratch.

And yet, there’s nothing like the satisfaction of seeing things finally fall into place:

Built a basic 5×5 chess environment + random agent
Currently working on NFQ implementation (read on for explanation)

Progress Report: Week One of Building Knightmare Protocol

For starters, I’ve made a rudimentary 5×5 chess environment and created a random agent that plays itself.

The chess environment is built as a custom OpenAI Gym environment, which means I’ve defined reset(), step(), and render() methods to integrate easily with my RL agents.

What are those methods, you ask? Excellent question.

Before we go into that, take a look at this (or skip ahead if you’ve got things to do in those 20 seconds):

Let’s Talk About Agents (No, Not 007)

The agent is the part of the code that makes the decisions.

Start			End

Okay, yes, that’s a table. But pretend it’s a 1D world, full of 1D movie theaters, bowling alleys, and bookstores.

The agent starts on the leftmost cell and has two possible actions:

0 = go left
1 = go right

If the agent is already at the start and chooses 0, it stays in place. (Don’t judge, it’s just a baby right now.)

If it reaches the rightmost cell, it gets a reward of 1 point. Every other cell gives 0.

And if it reaches the end, the episode ends.

That’s it — you’ve defined an environment. The agent’s job is to learn how to reach the goal. Because, you know, to flex all its points on its friends.

`reset()`, `step()`, and `render()`— The Holy Trinity of an RL Environment

Initially, when you train the agent, it will fail. A lot. You’ll need a quick way to star over. Enter the reset() method.

What if I want to see the ~~bad~~ decisions my agent is making? You’re in luck, that’s what the render() method is for.

Now, what’s the step() method? That’s the guy in the chair (takes hat off to Ned from Spider-Man) who applies the agent’s chosen action, updates the environment, and returns the next state, reward, and termination flag.

Congratulations. You now understand the basics of RL agents and the environments they operate in.

Armed with this knowledge, you are now ready to become the life of the party.

Watch The Random Agent In Action!

Here’s a clip of the random agent playing both sides on a 5×5 board:

I think you may have noticed a problem.

After the Black knight moves, there’s a noticeable pause. That’s because my engine evaluates all moves — even illegal ones — and sometimes plays them(!).

They don’t show up visually, but internally the engine logs them, which explains why it reports the game as taking 31 steps.

I’m working a version two of the chess environment that should hopefully remove this problem.

Knightmare Protocol v1.1 — Now With Cleaner Code

There are other design problems I’m coming to realize, and so these are the changes I’m implementing in version two of my chess environment:

Creating a Board class.

Now before you guys laugh haughtily and go, “Well, you should’ve done that to begin with!”, I was — understandably — impatient to get something up. As the saying goes:

“Make it exist first. Then make it pretty.”

Cleaning up move generation logic.

Never have I appreciated clean, modular, and commented code more.

An Actual Agent That Learns!

Right now, I’m implementing NFQ — Neural Fitted Q-Iteration.

It’s kind of primitive, and I fully expect convergence issues. But that’s kind of the point of this whole exercise: to understand why these algorithms were developed and how they work.

I’ll detail how NFQ works next blog post, once I have it up and running.

Bonus RL Theory: More Coffee-Table Talk For You

I finished reading up on the fundamentals of RL!

I initially expected it’d take me much longer than two weeks, but massive props to Prof. ChatGPT for top-tier analogies.

Check this out — here’s how it explained DDPG (Deep Deterministic Policy Gradient):

Blew my mind. Not very technical, sure — but it helped me build a mental model I could reinforce with equations and code.

Is That It?

Yep!

(I heard that sigh of relief. Rude.)

Next up: finish the NFQ implementation, then move on to DQN, DDQN, and other modern RL algorithms.

As always, if you guys have any queries or suggestions, feel free to reach out to me here. Would love to hear from you.

And hey, subscribe to my newsletter — it’s free — so I can keep boring you every other week. :)

The Knightmare has only just begun.

😈

(P.S. Yes, this post was meant to go out yesterday, but I was busy fending off an insect horde with my Blades of Cleanliness.)

The Knightmare Begins...

Table of contents

Progress Report: Week One of Building Knightmare Protocol

Let’s Talk About Agents (No, Not 007)

`reset()`, `step()`, and `render()`— The Holy Trinity of an RL Environment

Watch The Random Agent In Action!

Knightmare Protocol v1.1 — Now With Cleaner Code

Creating a Board class.

Cleaning up move generation logic.

An Actual Agent That Learns!

Bonus RL Theory: More Coffee-Table Talk For You

Is That It?

Subscribe to my newsletter

GallantGargoyle

GallantGargoyle

The Knightmare Begins...

Table of contents

Progress Report: Week One of Building Knightmare Protocol

Let’s Talk About Agents (No, Not 007)

reset(), step(), and render()— The Holy Trinity of an RL Environment

Watch The Random Agent In Action!

Knightmare Protocol v1.1 — Now With Cleaner Code

Creating a Board class.

Cleaning up move generation logic.

An Actual Agent That Learns!

Bonus RL Theory: More Coffee-Table Talk For You

Is That It?

Subscribe to my newsletter

GallantGargoyle

GallantGargoyle

`reset()`, `step()`, and `render()`— The Holy Trinity of an RL Environment