Create a Python-Based Forex Bot: Harnessing Statistical Arbitrage for Cointegration!


Introduction
Statistical arbitrage is a quantitative trading strategy that leverages statistical and econometric techniques to identify and exploit mispricings between cointegrated assets. In this article, we'll build a Statistical Arbitrage Bot in Python that trades cointegrated forex pairs while incorporating risk management features.
End Goal Visualization
Our bot will:
Identify cointegrated forex pairs (e.g., EUR/USD and GBP/USD).
Compute a mean-reverting spread using linear regression.
Generate trading signals when the spread deviates from its mean.
Execute trades with proper position sizing and stop-loss mechanisms.
Monitor performance with key metrics like the Sharpe ratio and maximum drawdown.
Here’s a conceptual diagram of the workflow:
[Forex Data Feed] → [Cointegration Test] → [Spread Calculation]
↓
[Signal Generation] → [Risk Management] → [Trade Execution]
↓
[Portfolio Monitoring & Performance Metrics]
N:B: This workflow is applicable to Stock, crypto and Futures, the only things changing is Data feeds
Now, let’s dive into the step-by-step implementation.
Step 1: Setting Up the Environment
Tech Stack
Python (Primary language)
Statsmodels (For cointegration tests and regression)
Pandas & NumPy (Data manipulation)
Matplotlib/Seaborn (Visualization)
YFinance (Fetching forex data)
Backtrader or PyAlgoTrade (Optional for backtesting)
Environment Setup and Installation of Required Libraries
python -m venv venv
source venv/bin/activate
pip install numpy pandas statsmodels matplotlib yfinance
Step 2: Fetching and Preparing Forex Data (Updated for yfinance)
Obtain Historical Forex Data Using yfinance
While yfinance primarily focuses on stocks, we can use it to fetch forex data through currency ETFs or direct currency pairs (where available). For this example, we'll use EURUSD=X and GBPUSD=X which represent the EUR/USD and GBP/USD forex pairs in Yahoo Finance.
import yfinance as yf
import pandas as pd
# Define the currency pairs
symbol1 = "EURUSD=X" # EUR/USD
symbol2 = "GBPUSD=X" # GBP/USD
# Fetch historical data
df1 = yf.download(symbol1, period="1y", interval="1d")
df2 = yf.download(symbol2, period="1y", interval="1d")
# Keep only closing prices and rename columns
df1 = df1[['Close']].rename(columns={'Close': 'close_eur'})
df2 = df2[['Close']].rename(columns={'Close': 'close_gbp'})
# Merge the two datasets
merged = pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
closing_prices = merged[["close_eur", "close_gbp"]]
# Display the first few rows
print(closing_prices.head())
Handle Missing Data
Forex markets operate 24/5, but there might be occasional gaps we should handle:
# Forward fill any missing values
closing_prices = closing_prices.ffill()
# Verify no missing values remain
print(closing_prices.isnull().sum())
Visualize the Price Series
Before testing for cointegration, let's visualize the two price series:
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(closing_prices['close_eur'], label='EUR/USD')
plt.plot(closing_prices['close_gbp'], label='GBP/USD')
plt.title('EUR/USD vs GBP/USD Closing Prices')
plt.ylabel('Price')
plt.xlabel('Date')
plt.legend()
plt.grid(True)
plt.show()
Normalize Prices for Better Visualization
To better compare the two series, we can normalize them to start at the same point:
normalized = closing_prices / closing_prices.iloc[0]
plt.figure(figsize=(12, 6))
plt.plot(normalized['close_eur'], label='EUR/USD (Normalized)')
plt.plot(normalized['close_gbp'], label='GBP/USD (Normalized)')
plt.title('Normalized Price Comparison')
plt.ylabel('Normalized Price')
plt.xlabel('Date')
plt.legend()
plt.grid(True)
plt.show()
Step 3: Testing for Cointegration
Cointegration ensures that the two forex pairs have a long-term equilibrium relationship. We’ll use the Engle-Granger test from statsmodels
.
from statsmodels.tsa.stattools import coint
# Perform cointegration test
score, p_value, _ = coint(closing_prices["close_eur"], closing_prices["close_gbp"])
print(f"Cointegration p-value: {p_value:.4f}")
if p_value < 0.05:
print("Pairs are cointegrated!")
else:
print("Pairs are NOT cointegrated.")
Step 4: Calculating the Hedge Ratio and Spread
Compute Hedge Ratio (β) via OLS Regression
import statsmodels.api as sm
X = closing_prices["close_eur"]
y = closing_prices["close_gbp"]
X = sm.add_constant(X) # Adds intercept term
model = sm.OLS(y, X).fit()
hedge_ratio = model.params[1] # Slope coefficient
spread = y - hedge_ratio * X["close_eur"]
Visualize the Spread
import matplotlib.pyplot as plt
spread_mean = spread.mean()
spread_std = spread.std()
plt.figure(figsize=(12, 6))
plt.plot(spread, label="Spread")
plt.axhline(spread_mean, color="r", linestyle="--", label="Mean")
plt.axhline(spread_mean + 1.5 * spread_std, color="g", linestyle=":", label="Upper Bound")
plt.axhline(spread_mean - 1.5 * spread_std, color="g", linestyle=":", label="Lower Bound")
plt.legend()
plt.title("Z-Score of Spread Between EUR/USD and GBP/USD")
plt.show()
Step 5: Signal Generation & Trading Logic
Define Entry/Exit Rules Using Z-Score
spread_zscore = (spread - spread_mean) / spread_std
# Trading signals
entry_threshold = 1.5
exit_threshold = 0.5
signals = []
position = 0 # 0: flat, 1: long spread, -1: short spread
for z in spread_zscore:
if z > entry_threshold and position != -1:
signals.append(-1) # Short spread (sell GBP/USD, buy EUR/USD)
position = -1
elif z < -entry_threshold and position != 1:
signals.append(1) # Long spread (buy GBP/USD, sell EUR/USD)
position = 1
elif abs(z) < exit_threshold and position != 0:
signals.append(0) # Close position
position = 0
else:
signals.append(0)
Step 6: Risk Management
Position Sizing & Stop-Loss
Fixed Fractional Sizing: Risk only 1-2% of capital per trade.
Dynamic Stop-Loss: Exit if spread moves against us beyond a threshold.
capital = 10000 # Initial capital
risk_per_trade = 0.01 # 1% risk per trade
for i, signal in enumerate(signals):
if signal != 0:
spread_value = spread.iloc[i]
stop_loss = spread_mean + (3 * spread_std if signal == -1 else -3 * spread_std)
position_size = (capital * risk_per_trade) / abs(spread_value - stop_loss)
print(f"Executing trade: {'Short' if signal == -1 else 'Long'} | Size: {position_size:.2f}")
Step 7: Backtesting & Performance Metrics
Compute Returns & Sharpe Ratio
returns = ... # Calculate PnL based on signals
sharpe_ratio = (returns.mean() / returns.std()) * (252 ** 0.5) # Annualized
print(f"Sharpe Ratio: {sharpe_ratio:.2f}")
Visualize Equity Curve
cumulative_returns = (1 + returns).cumprod()
plt.plot(cumulative_returns)
plt.title("Strategy Cumulative Returns")
plt.show()
Conclusion
We’ve built a Statistical Arbitrage Bot in Python that:
✅ Identifies cointegrated forex pairs
✅ Computes a mean-reverting spread
✅ Generates trading signals with risk management
✅ Evaluates performance via Sharpe ratio
Subscribe to my newsletter
Read articles from Abdulsalam Lukmon directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by