High-Frequency Trading & Reinforcement Learning. Building a Simple Order Book and Market Making Bots in Python


Ever wondered how high-frequency trading firms profit from small price movements of assets? Let’s build a couple simple market making bots in Python, one rule based, and another utilising reinforcement learning in order to find out.
Market Making
As Wikipedia would put it: “A market maker or liquidity provider is a company or an individual that quotes both a buy and a sell price in a tradable asset held in inventory, hoping to make a profit on the difference, which is called the bid-ask spread or turn*.”*
- “turn refers to the profit from completing a buy-sell cycle”
OK, fine. I understood most of those words, but let’s break it down a bit more simply.
Exchanges exist as central platforms in which assets can be bought or sold. Picture an exchange like a bustling market place, where people are constantly buying stocks/shares/currencies etc. Buyers want the cheapest price, sellers want the highest…you get the picture
In many cases, there are thousands of these exchanges taking place every second. With buy orders being matched to sell orders, and vice versa. It is in the best interest of those who buy, to buy as cheaply as possible, and for those who sell, to do so at the highest price. There is, at any one time, a back log of buy and sell orders in an order book waiting to be matched (see picture below). When a buy order’s price meets or exceeds a sell order’s price, a trade happens—matched based on time and price priority. So, there results in a range of prices where buy orders have been placed, and a separate range of sell orders that are placed. Separating these two “sections” is what is referred to as the bid ask spread.
Due to constant price fluctuations, this bid ask spread is always changing. Usually more market activity results in more volatility. But orders placed within the bid ask spread are likely to be traded. If you’re cunning, you can place both a buy and sell order, better known as a bid and ask quote, with both prices within the spread, with an ask price higher than a bid price. If the volatility of the market is high enough, it is likely that both orders will be traded, and a small profit will be made.
So, how can one take advantage of this naturally occurring phenomena? Who takes advantage of it? and how?
High-frequency trading (HFT) firms profit from these tiny price swings by placing thousands of buy and sell orders a second. How? They’re market makers—quoting bids and asks to capture the spread. Although it may seem simple enough, the reality is that they utilise advanced algorithms to calibrate their activity very precisely. Let’s build two Python bots to see it in action: a rule-based one with fixed quotes and an RL one that learns on the fly.
"There are three ways to make a living in this business: be first, be smarter, or cheat. Now, I don’t cheat. And although I like to think we have some pretty smart people in this building, it sure is a hell of a lot easier to just be first."
The Order Book
In order to do this however, we’ll first need to build a simple order book in python that we can execute, store, log, and match buy and sell orders.
import time
import heapq
from collections import deque
import numpy as np
import random
random.seed(42)
np.random.seed(42)
# Global variable for last trade price (used across components)
last_trade_price = 100.0
class Order:
def __init__(self, order_id, price, quantity, order_type):
self.order_id = order_id
self.price = price
self.quantity = quantity
self.order_type = order_type
self.timestamp = time.time()
def __repr__(self):
return f"Order(id={self.order_id}, price={self.price}, quantity={self.quantity}, type={self.order_type})"
class OrderBook:
def __init__(self):
self.bids = []
self.asks = []
self.order_history = []
def place_order(self, price, size, side, bot_name):
if side == 'bid':
self.bids.append((price, size, bot_name))
self.bids.sort(reverse=True)
else:
self.asks.append((price, size, bot_name))
self.asks.sort()
self.match_orders()
def match_orders(self):
while self.bids and self.asks and self.bids[0][0] >= self.asks[0][0]:
bid_price, bid_size, bid_bot = self.bids.pop(0)
ask_price, ask_size, ask_bot = self.asks.pop(0)
if bid_bot == ask_bot:
self.bids.insert(0, (bid_price, bid_size, bid_bot))
self.asks.insert(0, (ask_price, ask_size, ask_bot))
break
trade_size = min(bid_size, ask_size)
self.order_history.append((bid_price, ask_price, trade_size, bid_bot, ask_bot))
if bid_size > trade_size:
self.bids.insert(0, (bid_price, bid_size - trade_size, bid_bot))
if ask_size > trade_size:
self.asks.insert(0, (ask_price, ask_size - trade_size, ask_bot))
def best_bid(self):
return self.bids[0][0] if self.bids else None
def best_ask(self):
return self.asks[0][0] if self.asks else None
def mid_price(self):
if self.bids and self.asks:
return (self.best_bid() + self.best_ask()) / 2
return None
We can test this order book with various tests to see if it works.
As we can see, the order book works with a variety of test cases:
Matching trades
Partially matching trades
No matching trades
Multiple matching trades
import unittest
from order_book import OrderBook
class TestOrderBook(unittest.TestCase):
def setUp(self):
self.book = OrderBook()
def test_trade_execution_same_price(self):
print("Test case 1: Trade execution at same price")
self.book.place_order(100, 5, 'bid', 'TestBot1')
self.book.place_order(100, 5, 'ask', 'TestBot2')
self.assertEqual(len(self.book.order_history), 1)
print(f"Trade: {self.book.order_history[0]}")
self.assertEqual(self.book.order_history[0], (100, 100, 5, 'TestBot1', 'TestBot2'))
def test_partial_trade(self):
print("Test case 2: Partial trade")
self.book.place_order(100, 10, 'bid', 'TestBot1')
self.book.place_order(100, 5, 'ask', 'TestBot2')
self.assertEqual(len(self.book.order_history), 1)
print(f"Trade: {self.book.order_history[0]}")
self.assertEqual(self.book.bids[0][1], 5) # 5 units left
def test_no_trade(self):
print("Test case 3: No trade")
self.book.place_order(100, 5, 'bid', 'TestBot1')
self.book.place_order(101, 5, 'ask', 'TestBot2')
self.assertEqual(len(self.book.order_history), 0)
print("No trade occurred")
if __name__ == '__main__':
unittest.main()
Now that we’ve got the background set up and well tested, let’s introduce 2 bots to execute bid ask spread trades on this order book, one following a rule based system, the other using a machine learning method of reinforcement learning.
Bot 1: Rule-Based:
Our first bot follows a rule based system, with these core rules:
Calculates a “Mid-Price”, determined as the average of the highest buy and lowest sell price in the order book.
If the order book is empty, the last traded price is used. (This gives the bot a reference point.)
A fixed spread is set (in this case, 2 units) to keep it consistent and simple.
The order size is determined based upon market liquidity. This mimics how real market makers adjust to liquidity.
Buy and sell orders are placed, and the bid and ask prices are calculated respectively. Quoting both sides when market participants hit their orders.
Finally, the order IDs are incremented.
import numpy as np
from order_book import OrderBook
class RuleBasedBot:
def __init__(self, order_book, spread=0.5):
self.order_book = order_book
self.spread = spread
self.cash = 100000
self.inventory = 0
self.trades = 0
def get_mid_price(self):
return self.order_book.mid_price() or 100
def step(self):
mid_price = self.get_mid_price()
bid_price = np.round(mid_price - self.spread / 2, 2)
ask_price = np.round(mid_price + self.spread / 2, 2)
order_size = 5
self.order_book.place_order(bid_price, order_size, 'bid', 'Rule Bot')
self.order_book.place_order(ask_price, order_size, 'ask', 'Rule Bot')
for trade in self.order_book.order_history:
bid_price, ask_price, size, bid_bot, ask_bot = trade
if bid_bot == 'Rule Bot':
self.inventory += size
self.cash -= bid_price * size
self.trades += 1
if ask_bot == 'Rule Bot':
self.inventory -= size
self.cash += ask_price * size
self.trades += 1
Bot 2: Reinforcement-Learning
import random
from order_book import OrderBook
class MarketMakerRL:
def __init__(self, order_book):
self.order_book = order_book
self.inventory = 0
self.cash = 100000
self.action_space = [-1, 0, 1]
self.spread = 0.5
self.trades = 0
def step(self):
action = random.choice(self.action_space)
self.spread += action * 0.1
self.spread = max(0.3, min(0.7, self.spread))
mid_price = self.order_book.mid_price() or 100
bid_price = mid_price - self.spread / 2
ask_price = mid_price + self.spread / 2
self.order_book.place_order(bid_price, 5, 'bid', 'RL Bot')
self.order_book.place_order(ask_price, 5, 'ask', 'RL Bot')
for trade in self.order_book.order_history:
bid_price, ask_price, size, bid_bot, ask_bot = trade
if bid_bot == 'RL Bot':
self.inventory += size
self.cash -= bid_price * size
self.trades += 1
if ask_bot == 'RL Bot':
self.inventory -= size
self.cash += ask_price * size
self.trades += 1
Before we test them both, let’s dig in a bit more into what reinforcement learning is, and how it works in this setting.
Imagine that you’re training a dog to sit when you command it to do so, you repeatedly try, until it actually does do something resembling sitting. At which point you toss it a treat. That’s reinforcement learning in a nutshell: learning by doing, based on rewards and trial-and-error.
With regards to our code above, reinforcement learning is training our bot to make smart move in our order book environment. It doesn’t start with a rule book, but messes around, sees what works, what doesn’t, and most importantly, what is most profitable. From this it builds its own “rule book” which is constantly adjusted on the fly to remain as profitable as possible.
Comparison
So, we have our order book, and our two bots, so if we pit them against each other with 50 random traders (represented by random market activity) making a mess of the order book, who comes out on top? We can compare after 100 steps, the PnL and trades:
import numpy as np
import random
from order_book import OrderBook
from rule_based_bot import RuleBasedBot
from market_maker_rl import MarketMakerRL
class RandomTrader:
def __init__(self, order_book, name):
self.order_book = order_book
self.name = name
def step(self):
side = random.choice(['bid', 'ask'])
price = (self.order_book.mid_price() or 100) + random.uniform(-1, 1)
size = random.randint(1, 5)
self.order_book.place_order(price, size, side, self.name)
# Simulation
order_book = OrderBook()
market_maker_rl = MarketMakerRL(order_book)
rule_based_bot = RuleBasedBot(order_book)
random_traders = [RandomTrader(order_book, f"Trader {i}") for i in range(50)]
for step in range(1001):
for trader in random_traders:
trader.step()
if step % 2 == 0:
rule_based_bot.step()
market_maker_rl.step()
else:
market_maker_rl.step()
rule_based_bot.step()
order_book.order_history.clear()
if step % 100 == 0:
print(f"\n--- Summary after {step} steps ---")
print(f"RL Bot PnL={market_maker_rl.cash + market_maker_rl.inventory * (order_book.mid_price() or 100) - 100000:.2f}, Trades={market_maker_rl.trades}")
print(f"Rule Bot PnL={rule_based_bot.cash + rule_based_bot.inventory * (order_book.mid_price() or 100) - 100000:.2f}, Trades={rule_based_bot.trades}")
We start with $100,000 each, run 1000 steps, and track PnL (cash + inventory value - initial cash). Alternating turns for each bot to execute first.
We can run it and see how the different bots compare:
"--- Summary after 0 steps ---
RL Bot PnL=0.00, Trades=0
Rule Bot PnL=0.00, Trades=0
--- Summary after 100 steps ---
RL Bot PnL=135.48, Trades=314
Rule Bot PnL=155.67, Trades=405
--- Summary after 200 steps ---
RL Bot PnL=276.74, Trades=605
Rule Bot PnL=324.68, Trades=793
--- Summary after 300 steps ---
RL Bot PnL=458.67, Trades=972
Rule Bot PnL=487.12, Trades=1216
--- Summary after 400 steps ---
RL Bot PnL=588.95, Trades=1221
Rule Bot PnL=620.02, Trades=1599
--- Summary after 500 steps ---
RL Bot PnL=709.71, Trades=1380
Rule Bot PnL=790.09, Trades=2016
--- Summary after 600 steps ---
RL Bot PnL=750.41, Trades=1407
Rule Bot PnL=959.83, Trades=2433
--- Summary after 700 steps ---
RL Bot PnL=854.02, Trades=1612
Rule Bot PnL=1099.40, Trades=2804
--- Summary after 800 steps ---
RL Bot PnL=973.14, Trades=1962
Rule Bot PnL=1246.51, Trades=3197
--- Summary after 900 steps ---
RL Bot PnL=1156.41, Trades=2358
Rule Bot PnL=1406.60, Trades=3561
--- Summary after 1000 steps ---
RL Bot PnL=1220.25, Trades=2688
Rule Bot PnL=1555.76, Trades=3954"
Takeaways
Timing Matters: When switching which bot goes first, there are huge advantages that can be gained. This hints at the difference that mere microseconds have in the HFT space.
Spread is King: When testing the parameters of the bots, a small spread results in many trades that were not profitable.
Reinforcement Learning Potential: The random adjustments from the RL bot hint at its potential power, however without any “real” learning (such as Q-learning), it is not “true” reinforcement learning, but rather just random adjustments for now.
please visit this link to find all the code listed above on GitHub:
https://github.com/BTurner1234/Order-Book/
Subscribe to my newsletter
Read articles from Bailey directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
