Create a Python-Based Forex Bot: Harnessing Statistical Arbitrage for Cointegration!

Introduction

Statistical arbitrage is a quantitative trading strategy that leverages statistical and econometric techniques to identify and exploit mispricings between cointegrated assets. In this article, we'll build a Statistical Arbitrage Bot in Python that trades cointegrated forex pairs while incorporating risk management features.

End Goal Visualization

Our bot will:

  1. Identify cointegrated forex pairs (e.g., EUR/USD and GBP/USD).

  2. Compute a mean-reverting spread using linear regression.

  3. Generate trading signals when the spread deviates from its mean.

  4. Execute trades with proper position sizing and stop-loss mechanisms.

  5. Monitor performance with key metrics like the Sharpe ratio and maximum drawdown.

Here’s a conceptual diagram of the workflow:

[Forex Data Feed] → [Cointegration Test] → [Spread Calculation]  
       ↓  
[Signal Generation] → [Risk Management] → [Trade Execution]  
       ↓  
[Portfolio Monitoring & Performance Metrics]

N:B: This workflow is applicable to Stock, crypto and Futures, the only things changing is Data feeds

Now, let’s dive into the step-by-step implementation.


Step 1: Setting Up the Environment

Tech Stack

  • Python (Primary language)

  • Statsmodels (For cointegration tests and regression)

  • Pandas & NumPy (Data manipulation)

  • Matplotlib/Seaborn (Visualization)

  • YFinance (Fetching forex data)

  • Backtrader or PyAlgoTrade (Optional for backtesting)

Environment Setup and Installation of Required Libraries

python -m venv venv
source venv/bin/activate

pip install numpy pandas statsmodels matplotlib yfinance

Step 2: Fetching and Preparing Forex Data (Updated for yfinance)

Obtain Historical Forex Data Using yfinance

While yfinance primarily focuses on stocks, we can use it to fetch forex data through currency ETFs or direct currency pairs (where available). For this example, we'll use EURUSD=X and GBPUSD=X which represent the EUR/USD and GBP/USD forex pairs in Yahoo Finance.

import yfinance as yf
import pandas as pd

# Define the currency pairs
symbol1 = "EURUSD=X"  # EUR/USD
symbol2 = "GBPUSD=X"  # GBP/USD

# Fetch historical data
df1 = yf.download(symbol1, period="1y", interval="1d")
df2 = yf.download(symbol2, period="1y", interval="1d")

# Keep only closing prices and rename columns
df1 = df1[['Close']].rename(columns={'Close': 'close_eur'})
df2 = df2[['Close']].rename(columns={'Close': 'close_gbp'})

# Merge the two datasets
merged = pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
closing_prices = merged[["close_eur", "close_gbp"]]

# Display the first few rows
print(closing_prices.head())

Handle Missing Data

Forex markets operate 24/5, but there might be occasional gaps we should handle:

# Forward fill any missing values
closing_prices = closing_prices.ffill()

# Verify no missing values remain
print(closing_prices.isnull().sum())

Visualize the Price Series

Before testing for cointegration, let's visualize the two price series:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(closing_prices['close_eur'], label='EUR/USD')
plt.plot(closing_prices['close_gbp'], label='GBP/USD')
plt.title('EUR/USD vs GBP/USD Closing Prices')
plt.ylabel('Price')
plt.xlabel('Date')
plt.legend()
plt.grid(True)
plt.show()

Normalize Prices for Better Visualization

To better compare the two series, we can normalize them to start at the same point:

normalized = closing_prices / closing_prices.iloc[0]

plt.figure(figsize=(12, 6))
plt.plot(normalized['close_eur'], label='EUR/USD (Normalized)')
plt.plot(normalized['close_gbp'], label='GBP/USD (Normalized)')
plt.title('Normalized Price Comparison')
plt.ylabel('Normalized Price')
plt.xlabel('Date')
plt.legend()
plt.grid(True)
plt.show()

Step 3: Testing for Cointegration

Cointegration ensures that the two forex pairs have a long-term equilibrium relationship. We’ll use the Engle-Granger test from statsmodels.

from statsmodels.tsa.stattools import coint

# Perform cointegration test
score, p_value, _ = coint(closing_prices["close_eur"], closing_prices["close_gbp"])
print(f"Cointegration p-value: {p_value:.4f}")

if p_value < 0.05:
    print("Pairs are cointegrated!")
else:
    print("Pairs are NOT cointegrated.")

Step 4: Calculating the Hedge Ratio and Spread

Compute Hedge Ratio (β) via OLS Regression

import statsmodels.api as sm

X = closing_prices["close_eur"]
y = closing_prices["close_gbp"]
X = sm.add_constant(X)  # Adds intercept term

model = sm.OLS(y, X).fit()
hedge_ratio = model.params[1]  # Slope coefficient
spread = y - hedge_ratio * X["close_eur"]

Visualize the Spread

import matplotlib.pyplot as plt

spread_mean = spread.mean()
spread_std = spread.std()

plt.figure(figsize=(12, 6))
plt.plot(spread, label="Spread")
plt.axhline(spread_mean, color="r", linestyle="--", label="Mean")
plt.axhline(spread_mean + 1.5 * spread_std, color="g", linestyle=":", label="Upper Bound")
plt.axhline(spread_mean - 1.5 * spread_std, color="g", linestyle=":", label="Lower Bound")
plt.legend()
plt.title("Z-Score of Spread Between EUR/USD and GBP/USD")
plt.show()

Step 5: Signal Generation & Trading Logic

Define Entry/Exit Rules Using Z-Score

spread_zscore = (spread - spread_mean) / spread_std

# Trading signals
entry_threshold = 1.5
exit_threshold = 0.5

signals = []
position = 0  # 0: flat, 1: long spread, -1: short spread

for z in spread_zscore:
    if z > entry_threshold and position != -1:
        signals.append(-1)  # Short spread (sell GBP/USD, buy EUR/USD)
        position = -1
    elif z < -entry_threshold and position != 1:
        signals.append(1)  # Long spread (buy GBP/USD, sell EUR/USD)
        position = 1
    elif abs(z) < exit_threshold and position != 0:
        signals.append(0)  # Close position
        position = 0
    else:
        signals.append(0)

Step 6: Risk Management

Position Sizing & Stop-Loss

  • Fixed Fractional Sizing: Risk only 1-2% of capital per trade.

  • Dynamic Stop-Loss: Exit if spread moves against us beyond a threshold.

capital = 10000  # Initial capital
risk_per_trade = 0.01  # 1% risk per trade

for i, signal in enumerate(signals):
    if signal != 0:
        spread_value = spread.iloc[i]
        stop_loss = spread_mean + (3 * spread_std if signal == -1 else -3 * spread_std)
        position_size = (capital * risk_per_trade) / abs(spread_value - stop_loss)
        print(f"Executing trade: {'Short' if signal == -1 else 'Long'} | Size: {position_size:.2f}")

Step 7: Backtesting & Performance Metrics

Compute Returns & Sharpe Ratio

returns = ...  # Calculate PnL based on signals
sharpe_ratio = (returns.mean() / returns.std()) * (252 ** 0.5)  # Annualized
print(f"Sharpe Ratio: {sharpe_ratio:.2f}")

Visualize Equity Curve

cumulative_returns = (1 + returns).cumprod()
plt.plot(cumulative_returns)
plt.title("Strategy Cumulative Returns")
plt.show()

Conclusion

We’ve built a Statistical Arbitrage Bot in Python that:
✅ Identifies cointegrated forex pairs
✅ Computes a mean-reverting spread
✅ Generates trading signals with risk management
✅ Evaluates performance via Sharpe ratio

0
Subscribe to my newsletter

Read articles from Abdulsalam Lukmon directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abdulsalam Lukmon
Abdulsalam Lukmon