Inspired by Rob Carver's Systematic Trading (2015) and Advanced Futures Trading Strategies (2023), this series recorded how I demonstrate the trading / back-testing framework from Carver.

Plan

I will use Python and Jupyter Notebook for this project. 20-year daily data from Yahoo! Finance will be used for back-testing the performance.

# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
import itertools

Steps:

Choosing instruments
Trading rules and variations
Volatility targeting and position sizing

Systematic Trading (Rob Carver, 2015). Chapter 3, Framework

Fig.1 Systematic Trading (Rob Carver, 2015). Chapter 3, Framework

Step 1. Choosing instruments

Performance evaluation

To begin with, let's define an evaluation framework first. A well-defined framework allows us to compare the performance and examine the characteristics of different instruments / trading systems.

Performance metrics:

Annulized Daily Return
Annulized Daily Volatility
Sharpe Ratio
Skewness
Upper-tail Ratio
Lower-tail Ratio

Benchmark

For the choice of instruments, I am going for Ray Dalio's All Weather Portfolio. It's a portfolio aiming to achieve risk parity, meaning that all assets should have the same risks. For a passive investor, this provides steady returns even in hard times.

According to Lazy Portfolio ETF,

In the last 30 years, the Ray Dalio All Weather Portfolio obtained a 7.20% compound annual return, with a 7.21% standard deviation.

Assuming a zero risk-free rate (this is used for the whole project), this is a Sharpe Ratio of 1. I'm going to use a similar portfolio as the benchmark and to construct the system.

Ray Dalio All Weather Portfolio: ETF allocation and returns (Lazy Portfolio ETF, 2015)

Weight (%)	Ticker	ETF Name	Investment Themes
30.00	VTI	Vanguard Total Stock Market	Equity, U.S., Large Cap
40.00	TLT	iShares 20+ Year Treasury Bond	Bond, U.S., Long-Term
15.00	IEI	iShares 3-7 Year Treasury Bond	Bond, U.S., Intermediate-Term
7.50	GLD	SPDR Gold Trust	Commodity, Gold
7.50	GSG	iShares S&P GSCI Commodity Indexed Trust	Commodity, Broad Diversified

For simplicity, only VTI (equity), TLT (bond), and GLD (gold) will be used.

Original weight (%)	Rescaled weight (%)	Ticker
30.00	38.71	VTI
40.00	51.61	TLT
7.50	9.68	GLD

# Benchmark portfolio
symbols = ['VTI','TLT','GLD']
weights = [0.3871,0.5161,0.0968]

# Download 20-year daily data from YFinance
raw_data = yf.download(symbols,period='20y',interval='1d',prepost=False,repair=True)

# Indexing training/test data
train_index = raw_data.index[:len(raw_data)//2]
test_index = raw_data.index[len(raw_data)//2:]

# Use the adjusted close data only
prices = raw_data['Adj Close']

Compute the daily log returns for every instrument.

$$r_t = ln(p_t/p_{t-1}) = ln(p_t)-ln(p_{t-1})$$

def logReturns(prices,days=1):
  log_prices = np.log(prices)

  output = log_prices[::days].diff()
  return output

returns = logReturns(prices)

The weighted average return will be the return of the portfolio, assuming the portfolio re-balances the weight every day.

$$r_{BCHM,t} = \sum w_i \times r_{i,t}$$

# Daily return of benchmark
returns['BCHM'] = np.average(returns[symbols], weights=weights, axis=1)

Annualized return

Compute the annualized expected daily return.

$$Annualized Return=E(r) \times252$$

def annualizedReturn(returns,days=252):
  exp_ret = returns.mean()

  output = exp_ret*days
  return output

Annualized volatility

Compute the annualized daily volatility.

$$AnnualizedVolatility=\sigma \times \sqrt{252}$$

def annualizedVolatility(returns,days=252):
  vol = returns.std()

  output = vol*(days**0.5)
  return output

Sharpe Ratio

Compute the Sharpe Ratio (risk-free rate = 0).

$$SharpeRatio=\frac{AnnualizedReturn-r_f}{AnnualizedVolatility}$$

def sharpeRatio(returns,days=252,riskfree=0):
  exp_ret = annualizedReturn(returns,days)
  vol = annualizedVolatility(returns,days)

  output = (exp_ret-riskfree)/vol
  return output

Skewness

Compute the skewness. According to Carver, monthly skew may be a better estimation among other time frames.

Daily and weekly skew can be seriously affected by a couple of extreme daily returns, and annual skew does not give us enough data points for a reliable estimate.

def periodSkew(returns,days=252,periods=12):
  perioddays = round(days/periods)
  period_ret = returns.rolling(perioddays).sum()[::perioddays]

  output = period_ret.skew()
  return output

Upper-tail / Lower-tail Ratio

Compute the upper-tail / lower-tail ratio from demeaned returns. This ratio reflects how fat is the tail compared to the Gaussian distribution.

Carver uses 30% and 70% percentiles as the denominator as they proxy ±1 standard deviation. I don't know if that's a mistake or if there are some hidden calculations. Please comment and let me know. Anyway, I will use 15% and 85% percentiles instead.

Constant 2.245 is the normal ratio given a Gaussian distribution.

$$LowerTailRatio=\frac{1^{st}Percentile}{15^{th}Percentile}\div 2.245$$

$$UpperTailRatio=\frac{99^{th}Percentile}{85^{th}Percentile}\div 2.245$$

def upperTailRatio(returns):
  ret = returns.copy()
  ret[ret==0] = np.nan
  demean_ret = ret - ret.mean()
  ratio = demean_ret.quantile(0.99)/demean_ret.quantile(0.85)
  normal = 2.245

  output = ratio/normal
  return output

def lowerTailRatio(returns):
  ret = returns.copy()
  ret[ret==0] = np.nan
  demean_ret = ret - ret.mean()
  ratio = demean_ret.quantile(0.01)/demean_ret.quantile(0.15)
  normal = 2.245

  output = ratio/normal
  return output

Performance of Benchmark

# Test on training set
train_returns = returns.loc[train_index,:]
performance = pd.DataFrame([train_returns.apply(annualizedReturn,args=[252]),
                            train_returns.apply(annualizedVolatility,args=[252]),
                            train_returns.apply(sharpeRatio,args=[252,0]),
                            train_returns.apply(periodSkew,args=[252,12]),
                            train_returns.apply(lowerTailRatio),
                            train_returns.apply(upperTailRatio)])
performance.index = ['Return',
                     'Volatility',
                     'Sharpe',
                     'Skew',
                     'Lower-tail',
                     'Upper-tail']
print(performance)

	BCHM	VTI	TLT	GLD
Return	0.070788	0.082390	0.065255	0.119805
Volatility	0.088981	0.204455	0.142410	0.209580
Sharpe	0.795543	0.402973	0.458221	0.571642
Skew	0.490637	-2.198526	1.795937	-0.109771
Lower-tail	1.465557	1.916192	1.264555	1.619127
Upper-tail	1.381245	1.624076	1.347124	1.225781

Return and volatility are 7.08% and 8.90%. Giving a Sharpe of 0.8, significantly lower than the expected 1.0 by 0.2, but that already doubled the Sharpe of just holding VTI (equity), reminding us of the importance of diversification. The positive skew of 0.49 implies more small losses and less large gains (relatively). Values of lower-tail and upper-tail are 1.47 and 1.38. Tails are around 1.4 times fatter than the Gaussian distribution. The higher lower-tail / upper-tail ratios, the more probable extreme losses / gains.

The benchmark's return is undoubtedly more stable than any of its constituents. TLT (bond) also has a smoother curve among the others, but the negative relationship with VTI (equity) is obvious too. By averaging them up, the benchmark yields better risk-adjusted return. However, the performance is still not as good as expected (1.0). There may be several reasons for the unsatisfying Sharpe:

Sampling error. Since GLD is founded at the end of 2004, the test of the portfolio is conducted on the 8-year data set of 2005-2013. Compared to the 30-year statistics provided by Lazy Portfolio ETF, statistical error may occur.
Partial portfolio. The constructed portfolio only consists of equity, long-term bond, and gold, while Ray Dalio's All Weather Portfolio also includes intermediate-term bonds and other commodities. It's only a "some weather" portfolio. Some risks aren't hedged away yet (e.g. yield curve inversion).
Use of daily returns. The tested volatility is 9.13%, significantly higher than Lazy Portfolio ETF's 7.21%. That may be caused by using different time frames for calculations. Assuming daily rebalancing may increase the volatility. As volatility is mean-reverting, daily volatility is usually higher than the longer time frame (e.g. monthly, yearly).

Anyway, as long as the calculation is consistent, the benchmark and the trading systems are still comparable.

The next article is about choosing trading rules and variations.

Rob Carver's Systematic Trading Framework 1. Choosing instruments

Table of contents