# Statistical Arbitrage – Testing for Cointegration – Augmented Dicky Fuller

In my last post “Statistical Arbitrage Correlation vs Cointegration“, we discussed what statistical arbitrage is and what property between two pairs we aim to exploit. The mathematics essentially showed that if you go long stock A and short stock B with some appropriate hedging factor to cancel out the drift/growth terms in the Brownian motion equation then you are left with a stationary signal which is the spread between the two stocks. The maths also showed that in expectation the daily change in the spread is zero ($E_{t}[\Delta Spread]=0$) and hence any deviation from this presents the opportunity for a trade.

It was assumed that the stock growth terms $\mu_{a}$ and $\mu_{b}$ are constant (or drifting slowly over time both at the same rate) or in other words the hedge ratio is constant. This post will detail how to find the hedge ratio and present the Augmented Dicky Fuller test which can be used to identify with a certain level of confidence if the spread is stationary and hence cointegrated.

Determining the hedge factor

$Spread=A-nB$ set the spread to $0$

It is required to find the hedge factor, n, to satisfy the above equation. The hedge factor is easily found by regressing the $A vs B$. Notice that it is the prices being regressed and not the daily returns.

Examining the residuals

Once the hedge factor has been found the residuals of the regression must be analysed. The residual is how much the spread has changed for a given day, the whole idea of the regression was to identify the hedge factor that keeps the daily change of the spread as close to zero as possible (we want the residuals to be zero).

• If the residuals contain a trend then this implies that the daily change of the spread has a net direction and will cause the spread to consistently widen or contract.
• If the residuals contain no trend then this implies that the daily change of the spread will oscillate around zero (is stationary).

What residuals to analyse is open to debate, you may want to take another data set from a different point in time and apply the hedge factor you calculated in the above regression to this new set and analyse the residuals, this helps to identify an over fitting during the regression.

Dicky Fuller Test

For an AR(1) process $y_{t}=ay_{t-1}+\epsilon_{t}$ where $y_{0}=0$ and $\epsilon \sim N(0,\sigma^{2})$ what value of $a$ results in a stationary signal?

$y_{1}=ay_{0}+\epsilon_{0}$

$y_{2}=ay_{1}+\epsilon_{1}=a(ay_{0}+\epsilon_{0})+\epsilon{1}$

$y_{t}=a^{t}y_{0}+\Sigma_{j=0}^{t-1}a^{j}\epsilon_{j}$

Take expectations, already know $E[y_{0}]=0$ and $E[\epsilon_{t}]=0$

$E[y_{t}]=a^{t}E[y_{0}]+\Sigma_{j=0}^{t-1}a^{j}E[\epsilon_{j}]$

$E[y_{t}]=a^{t}*0+\Sigma_{j=0}^{t-1}a^{j}*0=0$ the mean is zero and constant
Examining the variance
$V[y_{t}]=0+\Sigma_{j=0}^{t-1}a^{j}V[\epsilon_{j}]=\sigma^{2}\Sigma_{j=0}^{t-1}a^{j}$
• If $a$ < $1$ then as $t \to \inf$ then $V[y_{t}]\to constant$
• If $a$ >= $1$ then $V[y_{t}]$ grows with time, hence is not stationary
This forms the premise of the Dicky Fuller test, perform a linear regression on the residuals and see what the value of $a$ is, if it’s greater than or equal to one this indicates the signal isn’t stationary and therefore isn’t suitable to stat arb. The dicky fuller test was invented by D.A Dickey and W.A Fuller, they produced the dickey-fuller distribution relating number of test samples and the value of $a$ to a probability of a unit root. Further reading can be found at Dickey Fuller Test – Wikipedia. The dickey fuller test is a hypothesis test that a signal contains a unit root,we want to reject this hypothesis (a unit root means our signal is non-stationary). The test gives a pValue, the lower this number the more confident we can be that we have found a stationary signal. pValues less than 0.1 are considered to be good candidates.

Royal Dutch Shell Case Study
As mentioned in the previous post, two different shares for the same company are usually good stat arb candidates. Here the hypothesis that Royal Dutch Shell A & B shares are cointegrated will be tested.

The beta or Hedge Ratio is: 0.965378555498888

Augmented Dickey-Fuller Test

Dickey-Fuller = -3.6192, Lag order = 0, p-value = 0.03084
alternative hypothesis: stationary

The low pValue indicates that this pair is cointegrated.

Royal Bank of Scotland vs Barclays Case Study

The beta or Hedge Ratio is: 0.645507426925485

Augmented Dickey-Fuller Test

Dickey-Fuller = -1.6801, Lag order = 0, p-value = 0.7137
alternative hypothesis: stationary

The pValue indicates that the spread isn’t stationary, and the bottom right graph of the spread is definitely in a strong trend and isn’t range bound. This would have been a horrific stat arb!
On to the code:
?View Code RSPLUS
 library("quantmod") library("PerformanceAnalytics") library("fUnitRoots") library("tseries")   #Specify dates for downloading data, training models and running simulation hedgeTrainingStartDate = as.Date("2008-01-01") #Start date for training the hedge ratio hedgeTrainingEndDate = as.Date("2010-01-01") #End date for training the hedge ratio   symbolLst<-c("RDS-A","RDS-B") title<-c("Royal Dutch Shell A vs B Shares")   symbolLst<-c("RBS.L","BARC.L") title<-c("Royal Bank of Scotland vs Barclays")   ### SECTION 1 - Download Data & Calculate Returns ### #Download the data symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbolLst, env = symbolData, src = "yahoo", from = hedgeTrainingStartDate, to=hedgeTrainingEndDate)   stockPair <- list( a = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep=""))))) #Stock A ,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep=""))))) #Stock B ,name=title)         testForCointegration <- function(stockPairs){ #Pass in a pair of stocks and do the necessary checks to see if it is cointegrated   #Plot the pairs   dev.new() par(mfrow=c(2,2)) print(c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b))))) plot(stockPairs$a, main=stockPairs$name, ylab="Price", type="l", col="red",ylim=c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b))))) lines(stockPairs$b, col="blue") print(length(stockPairs$a)) print(length(stockPairs$b)) #Step 1: Calculate the daily returns dailyRet.a <- na.omit((Delt(stockPairs$a,type="arithmetic"))) dailyRet.b <- na.omit((Delt(stockPairs$b,type="arithmetic"))) dailyRet.a <- dailyRet.a[is.finite(dailyRet.a)] #Strip out any Infs (first ret is Inf) dailyRet.b <- dailyRet.b[is.finite(dailyRet.b)] print(length(dailyRet.a)) print(length(dailyRet.b)) #Step 2: Regress the daily returns onto each other #Regression finds BETA and C in the linear regression retA = BETA * retB + C regression <- lm(dailyRet.a ~ dailyRet.b + 0) beta <- coef(regression)[1] print(paste("The beta or Hedge Ratio is: ",beta,sep="")) plot(x=dailyRet.b,y=dailyRet.a,type="p",main="Regression of RETURNS for Stock A & B") #Plot the daily returns lines(x=dailyRet.b,y=(dailyRet.b*beta),col="blue")#Plot in linear line we used in the regression #Step 3: Use the regression co-efficients to generate the spread spread <- stockPairs$a - beta*stockPairs$b #Could actually just use the residual form the regression its the same thing spreadRet <- Delt(spread,type="arithmetic") spreadRet <- na.omit(spreadRet) #spreadRet[!is.na(spreadRet)] plot((spreadRet), type="l",main="Spread Returns") #Plot the cumulative sum of the spread plot(spread, type="l",main="Spread Actual") #Plot the cumulative sum of the spread #For a cointegrated spread the cumsum should not deviate very far from 0 #For a none-cointegrated spread the cumsum will likely show some trending characteristics #Step 4: Use the ADF to test if the spread is stationary #can use tSeries library adfResults <- adf.test((spread),k=0,alternative="stationary") print(adfResults) if(adfResults$p.value <= 0.05){ print(paste("The spread is likely Cointegrated with a pvalue of ",adfResults$p.value,sep="")) } else { print(paste("The spread is likely NOT Cointegrated with a pvalue of ",adfResults$p.value,sep="")) }   }     testForCointegration(stockPair)

# Statistical Arbitrage – Correlation vs Cointegration

What is statistical arbitrage (stat arb)?

The premise of statistical arbitrage, stat arb for short, is that there is a statistical mispricing between a set of securities which we look to exploit. Typically a strategy requires going long a set of stocks and short another. StatArb evolved from pairs trading where one would go long a stock and short it’s competitor as a hedge, in pairs trading the aim is to select a stock that is going to outperform it’s peers. StatArb is all about mean reversion, in essence you are saying that the spread between any two stocks should be constant (or slowly evolving throughout time), any deviations from the spread present a trading opportunity since in StatArb we believe the spread is mean reverting. Contrary to the name statistical arbitrage isn’t about making risk free money (deterministic arbitrage is risk free).

What type of stocks make good pairs?

The best stocks to use in StatArb are those where there is a fundamental reason for believing that the spread is mean reverting / stationary. Typically this means that the stocks are in the same market sector or even better the same company (some companies have A and B shares with different voting rights or trade on different exchanges)! Some examples of fundamentally similar pairs would be Royal Dutch Shell A vs Royal Dutch Shell B shares, Goldman Sachs vs JP Morgan, Apple vs ARM (their chip supplier), ARM vs ARM ADR, some cross sector groups may also work such as Gold Mining vs Gold Price.

A poor example would be Royal Bank of Scotland vs Tesco since their businesses are completely different / don’t impact each other.

What is the mathematical definition of a good pair?

Upon coming up with a good fundamental stock pairing you next need to have a mathematical test for determining if it’s a good pair. The most common test is to look for cointegration (http://en.wikipedia.org/wiki/Cointegration) as this would imply that the pair is a stationary pair (the spread is fixed) and hence statistically it is mean reverting. When testing for cointegration a Pvalue(http://en.wikipedia.org/wiki/P-value) hypothesis test is performed, so we can express a level of confidence in the pair being mean reverting.

What is the difference between correlation and cointegration?

When talking about statisitical arbitrage many people often get confused between correlation and cointegration.

• Correlation – If two stocks are correlated then if stock A has an upday then stock B will have an upday
• Cointegration – If two stocks are cointegrated then it is possible to form a stationary pair from some linear combination of stock A and B

One of the best explanations of cointegration is as follows: “A man leaves a pub to go home with his dog, the man is drunk and goes on a random walk, the dog also goes on a random walk. They approach a busy road and the man puts his dog on a lead, the man and the dog are now cointegrated. They can both go on random walks but the maximum distance they can move away from each other is fixed ie length of the lead”. So in essence the distance/spread between the man and his dog is fixed, also note from the story that the man and dog are still on a random walk, there is nothing to say if their movements are correlated or uncorrelated.With correlated stocks they will move in the same direction most of the time however the magnitude of the moves is unknown, this means that if you’re trading the spread between two stocks then the spread can keep growing and growing showing no signs of mean reversion. This is in contract to cointegration where we say the spread is “fixed” and that if the spread deviates from the “fixing” then it will mean revert.

Lets explore cointegration some more:

Equation for Geometric brownian motion

$A_{t+T} = A_{t}+N(\mu_{a}T,\sigma_{a}\sqrt{T})$

where A stands for the price of stock A
$\Delta A = N(\mu_{a}\Delta t,\sigma_{a}\sqrt{\Delta t})$
$\Delta A = \mu_{a}\Delta t+N(0,\sigma_{a}\sqrt{\Delta t})$
$E_{t}[\Delta A] = \mu_{a}$
From the geometric Brownian motion equation we can see that the expected change in A over time $(E_{t}[\Delta A])$ is $\mu_{a}$, in other words A is not stationary  assuming $\mu_{a}$ isnt zero).

We want to find a cointegrated / stationary pair.

Equation for going long stock A and short n lots of Stock B
$Spread=A-nB$

$\Delta Spread=\Delta A-n\Delta B$
$\Delta Spread=\mu_{a}\Delta t+N(0,\sigma_{a}\sqrt{\Delta t})-n\mu_{b}\Delta t-nN(0,\sigma_{b}\sqrt{\Delta t})$
$E_{t}[\Delta Spread]=E_{t}[\Delta A-n\Delta B] = \mu_{a}-n\mu_{b}$
Setting $n=\frac{\mu_{a}}{\mu_{b}}$
$E_{t}[\Delta Spread]=E_{t}[\Delta A-n\Delta B] = \mu_{a}-\frac{\mu_{a}\mu_{b}}{\mu_{b}}=\mu_{a}-\mu_{a}=0$
Hence is stationary assuming that the hedge ratio, $n$, remains constant!!!

Example of correlated stocks : Notice the spread blowing up

Example of cointegrated stocks: Notice the spread looks oscillatory

Onto the code to generate those plots (mainly taken from http://quant.stackexchange.com/questions/1027/how-are-correlation-and-cointegration-related)

?View Code RSPLUS
  library(MASS)   #Code largely copied from http://quant.stackexchange.com/questions/1027/how-are-correlation-and-cointegration-related   #The input data nsim <- 250 #Number of data points mu_a <- 0.0002 #Mu_a growth rate for stock a sigma_a <- 0.010 #Sigma_a volatility for stock a mu_b <- 0.0005 #Mu_a growth rate for stock a sigma_b <- 0.005 #Sigma_a volatility for stock a corxy <- 0.8 #Correlation coeficient for xy   #Calculate a correlated return series #Build the covariance matrix and generate the correlated random results (covmat <- matrix(c(sigma_a^2, corxy*sigma_a*sigma_b, corxy*sigma_a*sigma_b, sigma_b^2), nrow=2)) res <- mvrnorm(nsim, c(mu_a, mu_b), covmat) #Calculate multivariate normal distribution plot(res[,1], res[,2])   #Calculate the stats of res[] so they can be checked with the input data mean(res[,1]) sd(res[,1]) mean(res[,2]) sd(res[,2]) cor(res[,1], res[,2])   path_a <- exp(cumsum(res[,1])) path_b <- exp(cumsum(res[,2])) spread <- path_a - path_b #Set the plotting area to a 2 by 1 grid layout(rbind(1,2)) #Plot the two price series that have correlated returns plot(path_a, main="Two Price Series with Correlated Returns", ylab="Price", type="l", col="red") lines(path_b, col="blue") plot(spread, type="l")     ##Cointegrated pair #The input data nsim <- 250 #Number of data points mu_a <- 0.0002 #Mu_a growth rate for stock a sigma_a <- 0.010 #Sigma_a volatility for stock a mu_b <- 0.0002 #Mu_a growth rate for stock a sigma_b <- 0.005 #Sigma_a volatility for stock a coea <- 0.0200 #Co-integration coefficient for x coeb <- 0.0200 #Co-integration coefficient for y   #Generate the noise terms for x and y rana <- rnorm(nsim, mean=mu_a, sd=sigma_a) #White noise for a ranb <- rnorm(nsim, mean=mu_b, sd=sigma_b) #White noise for b   #Generate the co-integrated series x and y a <- numeric(nsim) b <- numeric(nsim) a[1] <- 0 b[1] <- 0 for (i in 2:nsim) { #Logic here is that is b>a then we add on the difference so that #a starts to catch up with b, hence causing the spread to close a[i] <- a[i-1] + (coea * (b[i-1] - a[i-1])) + rana[i-1] b[i] <- b[i-1] + (coeb * (a[i-1] - b[i-1])) + ranb[i-1] }   #Plot a and b as prices ylim <- range(exp(a), exp(b)) path_a <- exp(a) path_b <- exp(b) spread <- path_a - path_b   dev.new() layout(rbind(1,2)) plot(path_a, ylim=ylim, type="l", main=paste("Co-integrated Pair (coea=",coea,", coeb=",coeb,")", sep=""), ylab="Price", col="red") lines(path_b, col="blue") legend("bottomleft", c("exp(a)", "exp(b)"), lty=c(1, 1), col=c("red", "blue"), bg="white")   plot(spread,type="l")

# Parameter Optimisation & Backtesting – Part 2

This is a follow on from: http://gekkoquant.com/2012/08/29/parameter-optimisation-backtesting-part1/

The code presented here will aim to optimise a strategy based upon the simple moving average indicator. The strategy will go Long when moving average A > moving average B. The optimisation is to determine what period to make each of the moving averages A & B.

Please note that this isn’t intended to be a good strategy, it is merely here to give an example of how to optimise a parameter.

Onto the code:

Functions

• TradingStrategy this function implements the trading logic and calculates the returns
• RunIterativeStrategy this function iterates through possible parameter combinations and calls TradingStrategy for each new parameter set
• CalculatePerformanceMetric takes in a table of returns (from RunIterativeStrategy) and runs a function/metric over each set of returns.
• PerformanceTable calls CalculatePerformanceMetric for lots of different metric and compiles the results into a table
• OrderPerformanceTable lets us order the performance table by a given metric, ie order by highest sharpe ratio
• SelectTopNStrategies selects the best N strategies for a specified performance metric (charts.PerformanceSummary can only plot ~20 strategies, hence this function to select a sample)
• FindOptimumStrategy does what it says on the tin
Note that when performing the out of sample test, you will need to manual specify the parameter set that you wish to use.
?View Code RSPLUS
   library("quantmod") library("PerformanceAnalytics")     nameOfStrategy <- "GSPC Moving Average Strategy"   #Specify dates for downloading data, training models and running simulation trainingStartDate = as.Date("2000-01-01") trainingEndDate = as.Date("2010-01-01") outofSampleStartDate = as.Date("2010-01-02")     #Download the data symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols("^GSPC", env = symbolData, src = "yahoo", from = trainingStartDate) trainingData <- window(symbolData$GSPC, start = trainingStartDate, end = trainingEndDate) testData <- window(symbolData$GSPC, start = outofSampleStartDate) indexReturns <- Delt(Cl(window(symbolData$GSPC, start = outofSampleStartDate))) colnames(indexReturns) <- "GSPC Buy&Hold" TradingStrategy <- function(mktdata,mavga_period,mavgb_period){ #This is where we define the trading strategy #Check moving averages at start of the day and use as the direciton signal #Enter trade at the start of the day and exit at the close #Lets print the name of whats running runName <- paste("MAVGa",mavga_period,".b",mavgb_period,sep="") print(paste("Running Strategy: ",runName)) #Calculate the Open Close return returns <- (Cl(mktdata)/Op(mktdata))-1 #Calculate the moving averages mavga <- SMA(Op(mktdata),n=mavga_period) mavgb <- SMA(Op(mktdata),n=mavgb_period) signal <- mavga / mavgb #If mavga > mavgb go long signal <- apply(signal,1,function (x) { if(is.na(x)){ return (0) } else { if(x>1){return (1)} else {return (-1)}}}) tradingreturns <- signal * returns colnames(tradingreturns) <- runName return (tradingreturns) } RunIterativeStrategy <- function(mktdata){ #This function will run the TradingStrategy #It will iterate over a given set of input variables #In this case we try lots of different periods for the moving average firstRun <- TRUE for(a in 1:10) { for(b in 1:10) { runResult <- TradingStrategy(mktdata,a,b) if(firstRun){ firstRun <- FALSE results <- runResult } else { results <- cbind(results,runResult) } } } return(results) } CalculatePerformanceMetric <- function(returns,metric){ #Get given some returns in columns #Apply the function metric to the data print (paste("Calculating Performance Metric:",metric)) metricFunction <- match.fun(metric) metricData <- as.matrix(metricFunction(returns)) #Some functions return the data the wrong way round #Hence cant label columns to need to check and transpose it if(nrow(metricData) == 1){ metricData <- t(metricData) } colnames(metricData) <- metric return (metricData) } PerformanceTable <- function(returns){ pMetric <- CalculatePerformanceMetric(returns,"colSums") pMetric <- cbind(pMetric,CalculatePerformanceMetric(returns,"SharpeRatio.annualized")) pMetric <- cbind(pMetric,CalculatePerformanceMetric(returns,"maxDrawdown")) colnames(pMetric) <- c("Profit","SharpeRatio","MaxDrawDown") print("Performance Table") print(pMetric) return (pMetric) } OrderPerformanceTable <- function(performanceTable,metric){ return (performanceTable[order(performanceTable[,metric],decreasing=TRUE),]) } SelectTopNStrategies <- function(returns,performanceTable,metric,n){ #Metric is the name of the function to apply to the column to select the Top N #n is the number of strategies to select pTab <- OrderPerformanceTable(performanceTable,metric) if(n > ncol(returns)){ n <- ncol(returns) } strategyNames <- rownames(pTab)[1:n] topNMetrics <- returns[,strategyNames] return (topNMetrics) } FindOptimumStrategy <- function(trainingData){ #Optimise the strategy trainingReturns <- RunIterativeStrategy(trainingData) pTab <- PerformanceTable(trainingReturns) toptrainingReturns <- SelectTopNStrategies(trainingReturns,pTab,"SharpeRatio",5) charts.PerformanceSummary(toptrainingReturns,main=paste(nameOfStrategy,"- Training"),geometric=FALSE) return (pTab) } pTab <- FindOptimumStrategy(trainingData) #pTab is the performance table of the various parameters tested #Test out of sample dev.new() #Manually specify the parameter that we want to trade here, just because a strategy is at the top of #pTab it might not be good (maybe due to overfit) outOfSampleReturns <- TradingStrategy(testData,mavga_period=9,mavgb_period=6) finalReturns <- cbind(outOfSampleReturns,indexReturns) charts.PerformanceSummary(finalReturns,main=paste(nameOfStrategy,"- Out of Sample"),geometric=FALSE) # Parameter Optimisation & Backtesting – Part 1 When developing a strategy that uses a technical indicator, such as a moving average, it is often difficult to decide on what period to use with the indicator. Should it look at the last 10, 20, or n days? The simple solution is just to iterate over many different parameters and look for the parameter combination that optimises a performance metric perhaps the Sharpe Ratio or minimises the Max Drawdown or some combination of both of them. This method must be approached in a sensible manner, essentially we are fitting the strategy to the data and must be careful not to over fit and capture spurious profits. It is likely that any parameter that is trained will vary over time, the strategy must be able to deal with changes to this parameter. For example say we are tuning the variable A, and we find that the optimum solution is A=n in the training data (it is very profitable with high sharpe ratio). How well does the strategy perform when we set A=n+1, or A=n-1 do we still get good results? If you don’t it’s an indicator that maybe you have over fitted. For more discussion on this see http://epchan.blogspot.co.uk/2010/01/method-for-optimizing-parameters.html Part two will contain a script to perform an optimisation on a simple moving average strategy. # Trading Strategy – VWAP Mean Reversion This strategy is going to use the volume weighted average price (VWAP) as an indicator to trade mean version back to VWAP. Annualized Sharpe Ratio (Rf=0%) is 0.9016936. This post is a response to http://gekkoquant.com/2012/07/29/trading-strategy-sp-vwap-trend-follow/ where there was a bug in the code indicating that VWAP wasn’t reverting (this didn’t sit well with me, or some of the people who commented). As always don’t take my word for anything, backtest the strategy yourself. One of the dangers of using R or Matlab is that it’s easy for forward bias to slip into your code. There are libraries such as Quantstrat for R which protect against this, but I’ve found them terribly slow to run. Trade logic: • All conditions are checked at the close, and the trade held for one day from the close • If price/vwap > uLim go short • If price/vwap < lLim go long Onto the code: ?View Code RSPLUS  library("quantmod") library("PerformanceAnalytics") #Trade logic - Look for mean reversion #If price/vwap > uLim go SHORT #If price/vwap < lLim go LONG #Script parameters symbol <- "^GSPC" #Symbol nlookback <- 3 #Number of days to lookback and calculate vwap uLim <- 1.001 #If price/vwap > uLim enter a short trade lLim <- 0.999 #If price/vwap < lLim enter a long trade #Specify dates for downloading data startDate = as.Date("2006-01-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate) mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE)))) mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo   #Calculate volume weighted average price vwap <- VWAP(Cl(mktdata), Vo(mktdata), n=nlookback) #Can calculate vwap like this, but it is slower #vwap <- runSum(Cl(mktdata)*Vo(mktdata),nlookback)/runSum(Vo(mktdata),nlookback)   #Calulate the daily returns dailyRet <- Delt(Cl(mktdata),k=1,type="arithmetic") #Daily Returns   #signal = price/vwap signal <- Cl(mktdata) / vwap signal[is.na(signal)] <- 1 #Setting to one means that no trade will occur for NA's #Stripping NA's caused all manner of problems in a previous post trade <- apply(signal,1, function(x) {if(xuLim) { return(-1) } else { return (0) }}})   #Calculate the P&L #The daily ret is DailyRet(T)=(Close(T)-Close(T-1))/Close(T-1) #We enter the trade on day T so need the DailyRet(T+1) as our potential profit #Hence the lag in the line below strategyReturns <- trade * lag(dailyRet,-1) strategyReturns <- na.omit(strategyReturns)   #### Performance Analysis ### #Calculate returns for the index indexRet <- dailyRet #Daily returns colnames(indexRet) <- "IndexRet" zooTradeVec <- cbind(as.zoo(strategyReturns),as.zoo(indexRet)) #Convert to zoo object colnames(zooTradeVec) <- c(paste(symbol," VWAP Trade"),symbol) zooTradeVec <- na.omit(zooTradeVec)   #Lets see how all the strategies faired against the index dev.new() charts.PerformanceSummary(zooTradeVec,main=paste("Performance of ", symbol, " VWAP Strategy"),geometric=FALSE)     #Lets calculate a table of montly returns by year and strategy cat("Calander Returns - Note 13.5 means a return of 13.5%\n") print(table.CalendarReturns(zooTradeVec)) #Calculate the sharpe ratio cat("Sharpe Ratio") print(SharpeRatio.annualized(zooTradeVec))

UPDATE: The exceptional returns seen in this strategy were due to a 2 day look forward bias in the signal (and then subsequent trade direction), ie when returns were calculated for day T the trade signal used was actually from day T+2.

This bias occurred in the lines:

?View Code RSCODE
 signal <- na.omit(signal)

Both the signal and trade dataframe had the correct dates for each signal/trades however when indexRet*trade happened then trade was treated as undated vectors (which is 2 elements shorter than index ret) hence the 2 day shift. The moral of this story is to merge dataframes before multiplying!

Thank you for everyone that commented on this, a corrected post is to follow!

Original Post

This strategy is going to use the volume weighted average price (VWAP) as an indicator to determine the direction of the current trend and trade the same direction as the trend. Annualized Sharpe Ratio (Rf=0%) is 8.510472.

• All conditions are checked at the close, and the trade held for one day from the close
• If price/vwap > uLim go long
• If price/vwap < lLim go short

Initially I thought that the price would be mean reverting to VWAP (this can be see in high freq data) however this didn’t appear to be the case with EOD data. For such a simple strategy I’m amazed that the Sharpe ratio is so high (suspiciously high). The code has been double&tripple checked to see if any forward bias has slipped in, however I haven’t spotted anything.

Onto the code:

?View Code RSPLUS
 library("quantmod") library("PerformanceAnalytics")   #Trade logic - Follow the trade demand, ie if price > vwap then go long #If price/vwap > uLim go LONG #If price/vwap < lLim go SHORT   #Script parameters symbol <- "^GSPC" #Symbol nlookback <- 3 #Number of days to lookback and calculate vwap uLim <- 1.001 #If price/vwap > uLim enter a long trade lLim <- 0.999 #If price/vwap < lLim enter a short trade     #Specify dates for downloading data startDate = as.Date("2006-01-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate) mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE)))) mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo #Calculate volume weighted average price vwap <- VWAP(Cl(mktdata), Vo(mktdata), n=nlookback) #Can calculate vwap like this, but it is slower #vwap <- runSum(Cl(mktdata)*Vo(mktdata),nlookback)/runSum(Vo(mktdata),nlookback) #Calulate the daily returns dailyRet <- Delt(Cl(mktdata),k=1,type="arithmetic") #Daily Returns #signal = price/vwap signal <- Cl(mktdata) / vwap signal <- na.omit(signal) trade <- apply(signal,1, function(x) {if(xuLim) { return(1) } else { return (0) }}}) #Calculate the P&L #The daily ret is DailyRet(T)=(Close(T)-Close(T-1))/Close(T-1) #We enter the trade on day T so need the DailyRet(T+1) as our potential profit #Hence the lag in the line below strategyReturns <- trade * lag(dailyRet,-1) strategyReturns <- na.omit(strategyReturns) #### Performance Analysis ### #Calculate returns for the index indexRet <- dailyRet #Daily returns colnames(indexRet) <- "IndexRet" zooTradeVec <- cbind(as.zoo(strategyReturns),as.zoo(indexRet)) #Convert to zoo object colnames(zooTradeVec) <- c(paste(symbol," VWAP Trade"),symbol) zooTradeVec <- na.omit(zooTradeVec) #Lets see how all the strategies faired against the index dev.new() charts.PerformanceSummary(zooTradeVec,main=paste("Performance of ", symbol, " VWAP Strategy"),geometric=FALSE) #Lets calculate a table of montly returns by year and strategy cat("Calander Returns - Note 13.5 means a return of 13.5%\n") print(table.CalendarReturns(zooTradeVec)) #Calculate the sharpe ratio cat("Sharpe Ratio") print(SharpeRatio.annualized(zooTradeVec)) # Trading Strategy – Volatility Carry Trade This strategy is going to look at a vega neutral volatility carry trading strategy. Two different futures contract will be traded, the VXX and VXZ. These contracts are rolling futures on the S&P 500 Vix index, the VXX is a short term future and the VXZ is a medium term future. Annualized Sharpe Ratio (Rf=0%) is 1.759449. The strategy is very simple, the rules are: • If VXX / VXZ > 1 then in backwardation so do a reverse carry trade (buy VXX, sell VXZ) • If VXX / VXZ < 1 then do a carry trade (sell VXX, buy VXZ) If the volatility spot price doesn’t change, then we’re extracting the cost of carry. Due to buying and selling (or vice versa) the short to mid term futures the vega exposure is hedged. In the script the above two rules have been slightly changed, a slight offset is added/subtracted from the ratio. Essentially we want to be deep into contango zone or deep into backwardation zone before we trade, if we’re close to the flip point then don’t trade. Section 1: Downloaded the data, and calculate the Open to Close return. This strategy will look for entry at the open and exit at the close. Section 2: Regress the daily returns of VXX with VXZ to calculate the hedge ratio Section 3: Generate the backwardation / contango signal Section 4: Simulate the trading Section 5: Analyse the performance Onto the code: ?View Code RSPLUS  library("quantmod") library("PerformanceAnalytics") #Control variables for entering a trade #Used to check for the level of contango / backwardation #IF signal < signalLowLim then in contango and do a carry trade #IF signal > signalUpperLim then in backwardation so do a reverse carry #ELSE do nothing signalLowLim <- 0.9 signalUpperLim <- 1.1 #Use volatility futures, shortdate vs medium dated #VXX iPath S&P 500 VIX Short-Term Futures ETN (VXX) #VXZ iPath S&P 500 VIX Mid-Term Futures ETN (VXZ) symbolLst <- c("VXX","VXZ") #Specify dates for downloading data, training models and running simulation startDate = as.Date("2009-01-01") #Specify what date to get the prices from hedgeTrainingStartDate = as.Date("2009-01-01") #Start date for training the hedge ratio hedgeTrainingEndDate = as.Date("2009-05-01") #End date for training the hedge ratio tradingStartDate = as.Date("2009-05-02") #Date to run the strategy from ### SECTION 1 - Download Data & Calculate Returns ### #Download the data symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbolLst, env = symbolData, src = "yahoo", from = startDate) #Plan is to check for trade entry at the open, and exit the trade at the close #So need to calculate the open to close return as they represens our profit or loss #Calculate returns for VXX and VXZ vxxRet <- (Cl(symbolData$VXX)/Op(symbolData$VXX))-1 colnames(vxxRet) <- "Ret" symbolData$VXX <- cbind(symbolData$VXX,vxxRet) vxzRet <- (Cl(symbolData$VXZ)/Op(symbolData$VXZ))-1 colnames(vxzRet) <- "Ret" symbolData$VXZ <- cbind(symbolData$VXZ,vxzRet) ### SECTION 2 - Calculating the hedge ratio ### #Want to work out a hedge ratio, so that we can remain Vega neutral (the futures contact are trading VEGA) #Select a small amount of data for training the hedge model on subVxx <- window(symbolData$VXX$Ret,start=hedgeTrainingStartDate ,end=hedgeTrainingEndDate) subVxz <- window(symbolData$VXZ$Ret,start=hedgeTrainingStartDate ,end=hedgeTrainingEndDate) datablock = na.omit(cbind(subVxx,subVxz)) colnames(datablock) <- c("VXX","VXZ") #Simply linearly regress the returns of Vxx with Vxz regression <- lm(datablock[,"VXZ"] ~ datablock[,"VXX"]) #Linear Regression #Plot the regression plot(x=as.vector(datablock[,"VXX"]), y=as.vector(datablock[,"VXZ"]), main=paste("Hedge Regression: XXZret =",regression$coefficient[2]," * RXXret + intercept"), xlab="Vxx Ret", ylab="Vxz ", pch=19) abline(regression, col = 2 ) hedgeratio = regression$coefficient[2] ### SECTION 3 - Generate trading signals ### #Generate Trading signal #Check ratio to see if contango or backwarded volatility future #If shortTermVega < midTermVega in contango so do carry trade #If shortTermVega > midTermVega in backwardation so do reverse carry #If VXX less than VXZ then want to short VXX and long VXZ #Calculate the contango / backwardation signal tSig <- Op(symbolData$VXX)/Op(symbolData$VXZ) colnames(tSig) <- "Signal" ### SECTION 4 - Do the trading ### #Generate the individual buy/sell signals for each of the futures contract vxxSignal <- apply(tSig,1, function(x) {if(xsignalUpperLim) { return(1) } else { return (0) }}}) vxzSignal <- -1 * vxxSignal #Strategy returns are simply the direction * the Open-to-Close return for the day #Include the hedge ratio here so that we remain vega neutral strategyReturns <- ((vxxSignal * symbolData$VXX$Ret) + hedgeratio * (vxzSignal * symbolData$VXZ$Ret) ) strategyReturns <- window(strategyReturns,start=tradingStartDate,end=Sys.Date(), extend = FALSE) #Normalise the amount of money being invested on each trade so that we can compare to the S&P index later strategyReturns <- strategyReturns * 1 / (1+abs(hedgeratio)) colnames(strategyReturns) <- "StrategyReturns" #plot(cumsum(strategyReturns)) #SECTION 5 #### Performance Analysis ### #Get the S&P 500 index data indexData <- new.env() startDate = as.Date("2009-01-01") #Specify what date to get the prices from getSymbols("^GSPC", env = indexData, src = "yahoo", from = startDate) #Calculate returns for the index indexRet <- (Cl(indexData$GSPC)-lag(Cl(indexData$GSPC),1))/lag(Cl(indexData$GSPC),1) colnames(indexRet) <- "IndexRet" zooTradeVec <- cbind(as.zoo(strategyReturns),as.zoo(indexRet)) #Convert to zoo object colnames(zooTradeVec) <- c("Vol Carry Trade","S&P500") zooTradeVec <- na.omit(zooTradeVec) #Lets see how all the strategies faired against the index dev.new() charts.PerformanceSummary(zooTradeVec,main="Performance of Volatility Carry Trade",geometric=FALSE)   #Lets calculate a table of montly returns by year and strategy cat("Calander Returns - Note 13.5 means a return of 13.5%\n") print(table.CalendarReturns(zooTradeVec)) #Calculate the sharpe ratio cat("Sharpe Ratio") print(SharpeRatio.annualized(zooTradeVec)) #Calculate other statistics cat("Other Statistics") print(table.CAPM(zooTradeVec[,"Vol Carry Trade"],zooTradeVec[,"S&P500"]))   dev.new() #Lets make a boxplot of the returns chart.Boxplot(zooTradeVec)   dev.new() #Set the plotting area to a 2 by 2 grid layout(rbind(c(1,2),c(3,4)))   #Plot various histograms with different overlays added chart.Histogram(zooTradeVec, main = "Plain", methods = NULL) chart.Histogram(zooTradeVec, main = "Density", breaks=40, methods = c("add.density", "add.normal")) chart.Histogram(zooTradeVec, main = "Skew and Kurt", methods = c("add.centered", "add.rug")) chart.Histogram(zooTradeVec, main = "Risk Measures", methods = c("add.risk"))

This post is going to investigate a strategy called Buy on Gap that was discussed by E.P Chan in his blog post “the life and death of a strategy”. The strategy is a mean reverting strategy that looks to buy the weakest stocks in the S&P 500 at the open and liquidate the positions at the close. The performance of the strategy is seen in the image below, Annualized Sharpe Ratio (Rf=0%) 2.129124.

All numbers in this table are %(ie 12.6 is 12.6%)
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec BuyOnGap S&P500
2005  0.0  0.0  0.0  0.0  0.0 -0.4 -0.6  1.1  0.7  1.0 -0.2  0.3      1.8   -0.1
2006  0.2 -0.6 -0.3  0.0  1.1  0.1  0.0  0.4  0.1  0.1  0.2 -0.2      1.1   -2.0
2007  0.8  0.9  0.1 -1.1  0.3 -0.2 -1.5  0.2 -0.2  0.9 -0.4  0.3      0.0    1.0
2008  4.3 -1.9  0.8 -0.5  0.0  0.7  0.2 -0.7  2.0  3.3  2.0  2.0     12.6    6.0
2009 -2.9  0.1  1.4 -1.1  1.3 -0.8  0.4  0.0 -0.3 -1.9  0.8 -1.0     -4.0   -7.3
2010 -1.0 -0.1  0.0 -1.1 -0.7 -0.6  1.1  0.7 -0.5  0.6  0.5  0.1     -1.1   -5.9
2011  0.6  0.3  0.2 -0.1  0.2  0.4  0.7  0.1 -1.4 -1.2  1.4 -0.2      1.0    2.1
2012 -0.3 -0.5 -0.1  0.0 -1.0   NA   NA   NA   NA   NA   NA   NA     -1.8   -0.8

From the post two trading criterion were mentioned:

1. Buy the 100 stocks out of the S&P 500 constituents that have the lowest previous days lows to the current days opening price
2. Provided that the above return is less than the 1 times the 90day standard deviation of Close to Close returns
The criterion are fairly specific however it is important to write flexible code where it is easy to change the main model parameters, below is a list of variable names that specify the parameters in the R script:
• stdLookback – How many days to look back for the standard deviation calculation
• stdMultiple – Number to multiply the standard deviation by (was 1 in criterion 2.), the larger this variable the more stocks that will satisfy criterion 2.

The code is split into 5 distinct sections.

Section 1: Loop through all the stocks loaded from the data file, for each stock calculate the previous day close to current days open (lowOpenRet). Calculate the Close Close return and calculate the standard deviation (stdClClRet). Also calculate the Open to Close return for every day (dayClOpRet), if we decide to trade this day this would be the return of the strategy for the day.

Section 2: This section combines columns from each of the individual stock data frames into large matrices that cover all the stocks. retMat contains the lowOpenRet for each stock. stdMat contains the stdClClRet for all stocks, dayretMat contains the dayClOpRet for all stocks.

Essentially instead of having lots of variables, we combine them into a  big matrix.

Section 3: This will check if matrices in section 2 match the trade entry criterion. This section produces two matrices (conditionOne and conditionTwo). The matrices contain a 1 for a passed entry criterion and a 0 for a failed entry criterion.

Section 4: This multiples the conditionOne with conditionTwo to give conditionsMet, since those matricies are binary multiplying them together identifies the regions where both conditions passed (1*1=1 ie a pass). This means enter a trade.

conditionsMet is then used as a mask, it has 1′s when a trade should occur and 0′s when no trade should happen. So multiplying this with dayClOpRet gives us the Open to Close daily returns for all days and stocks that a trade occurred on.

The script assumes capital is split equally between all the stocks that are bought at the open, if less than 100 stocks meet the entry criteria then it is acceptable to buy less.

Section 5: This section does simple performance analytics and plots the equity curve against the S&P 500 index.

Onto the code (note the datafile is generated in Stock Data Download & Saving R):

?View Code RSPLUS
 #install.packages("quantmod") library("quantmod") #install.packages("caTools") #for rolling standard deviation library("caTools") #install.packages("PerformanceAnalytics") library("PerformanceAnalytics") #Load the PerformanceAnalytics library   datafilename = "stockdata.RData"   stdLookback <- 90 #How many periods to lookback for the standard deviation calculation stdMultiple <- 1 #A Number to multiply the standard deviation by nStocksBuy <- 100 #How many stocks to buy   load(datafilename)   #CONDITION 1 #Buy 100 stocks with lowest returns from their previous days lows #To the current days open   #CONDITION 2 #Provided returns are lower than one standard deviation of the #90 day moving standard deviation of close close returns #Exit long positions at the end of the day     #SECTION 1 symbolsLst <- ls(stockData) #Loop through all stocks in stockData and calculate required returns / stdev's for (i in 1:length(symbolsLst)) { cat("Calculating the returns and standard deviations for stock: ",symbolsLst[i],"\n") sData <- eval(parse(text=paste("stockData",symbolsLst[i],sep=""))) #Rename the colums, there is a bug in quantmod if a stock is called Low then Lo() breaks! #Ie if a column is LOW.x then Lo() breaks oldColNames <- names(sData) colnames(sData) <- c("S.Open","S.High","S.Low","S.Close","S.Volume","S.Adjusted") #Calculate the return from low of yesterday to the open of today lowOpenRet <- (Op(sData)-lag(Lo(sData),1))/lag(Lo(sData),1) colnames(lowOpenRet) <- paste(symbolsLst[i],".LowOpenRet",sep="") #Calculate the n day standard deviation from the close of yesterday to close 2 days ago stdClClRet <- runsd((lag(Cl(sData),1)-lag(Cl(sData),2))/lag(Cl(sData),2),k=stdLookback,endrule="NA",align="right") stdClClRet <- stdMultiple*stdClClRet + runmean(lag(Cl(sData),1)/lag(Cl(sData),2),k=stdLookback,endrule="NA",align="right") colnames(stdClClRet) <- paste(symbolsLst[i],".StdClClRet",sep="") #Not part of the strategy but want to calculate the Close/Open ret for current day #Will use this later to evaluate performance if a trade was taken dayClOpRet <- (Cl(sData)-Op(sData))/Op(sData) colnames(dayClOpRet) <- paste(symbolsLst[i],".DayClOpRet",sep="") colnames(sData) <- oldColNames eval(parse(text=paste("stockData",symbolsLst[i]," <- cbind(sData,lowOpenRet,stdClClRet,dayClOpRet)",sep=""))) }   #SECTION 2 #Have calculated the relevent returns and standard deviations #Now need to to work out what 100 (nStocksBuy) stocks have the lowest returns #Make a returns matrix for (i in 1:length(symbolsLst)) { cat("Assing stock: ",symbolsLst[i]," to the returns table\n") sDataRET <- eval(parse(text=paste("stockData$",symbolsLst[i],"[,\"",symbolsLst[i],".LowOpenRet\"]",sep=""))) sDataSTD <- eval(parse(text=paste("stockData$",symbolsLst[i],"[,\"",symbolsLst[i],".StdClClRet\"]",sep=""))) sDataDAYRET <- eval(parse(text=paste("stockData$",symbolsLst[i],"[,\"",symbolsLst[i],".DayClOpRet\"]",sep=""))) if(i == 1){ retMat <- sDataRET stdMat <- sDataSTD dayretMat <- sDataDAYRET } else { retMat <- cbind(retMat,sDataRET) stdMat <- cbind(stdMat,sDataSTD) dayretMat <- cbind(dayretMat,sDataDAYRET) } } #SECTION 3 #CONDITON 1 test output (0 = failed test, 1 = passed test) #Now will loop over the returns matrix finding the nStocksBuy smallest returns conditionOne <- retMat #copying the structure and data, only really want the structure conditionOne[,] <- 0 #set all the values to 0 for (i in 1:length(retMat[,1])){ orderindex <- order((retMat[i,]),decreasing=FALSE) #order row entries smallest to largest orderindex <- orderindex[1:nStocksBuy] #want the smallest n (nStocksBuy) stocks conditionOne[i,orderindex] <- 1 #1 Flag indicates entry is one of the nth smallest } #CONDITON 2 #Check Close to Open return is less than 90day standard deviation conditionTwo <- retMat #copying the structure and data, only really want the structure conditionTwo[,] <- 0 #set all the values to 0 conditionTwo <- retMat/stdMat #If ClOp ret is < StdRet tmp will be < 1 conditionTwo[is.na(conditionTwo)] <- 2 #GIVE IT FAIL CONDITION JUST STRIPPING NAs here conditionTwo <- apply(conditionTwo,1:2, function(x) {if(x<1) { return (1) } else { return (0) }}) #SECTION 4 #CHECK FOR TRADE output (1 = passed conditions for trade, 0 = failed test) #Can just multiply the two conditions together since they're boolean conditionsMet <- conditionOne * conditionTwo colnames(conditionsMet) <- gsub(".LowOpenRet","",names(conditionsMet)) #Lets calculate the results tradeMat <- dayretMat colnames(tradeMat) <- gsub(".DayClOpRet","",names(tradeMat)) tradeMat <- tradeMat * conditionsMet tradeMat[is.na(tradeMat)] <- 0 tradeVec <- as.data.frame(apply(tradeMat, 1,sum) / apply(conditionsMet, 1,sum)) #Calculate the mean for each row colnames(tradeVec) <- "DailyReturns" tradeVec[is.nan(tradeVec[,1]),1] <- 0 #Didnt make or loose anything on this day plot(cumsum(tradeVec[,1]),xlab="Date", ylab="EPCHAN Buy on Gap",xaxt = "n") #SECTION 5 #### Performance Analysis ### #Get the S&P 500 index data indexData <- new.env() startDate = as.Date("2005-01-13") #Specify what date to get the prices from getSymbols("^GSPC", env = indexData, src = "yahoo", from = startDate) #Calculate returns for the index indexRet <- (Cl(indexData$GSPC)-lag(Cl(indexData$GSPC),1))/lag(Cl(indexData$GSPC),1) colnames(indexRet) <- "IndexRet" zooTradeVec <- cbind(as.zoo(tradeVec),as.zoo(indexRet)) #Convert to zoo object colnames(zooTradeVec) <- c("BuyOnGap","S&P500")   #Lets see how all the strategies faired against the index dev.new() charts.PerformanceSummary(zooTradeVec,main="Performance of EPCHAN Buy on Gap",geometric=FALSE)   #Lets calculate a table of montly returns by year and strategy cat("Calander Returns - Note 13.5 means a return of 13.5%\n") table.CalendarReturns(zooTradeVec)   dev.new() #Lets make a boxplot of the returns chart.Boxplot(zooTradeVec)   dev.new() #Set the plotting area to a 2 by 2 grid layout(rbind(c(1,2),c(3,4)))   #Plot various histograms with different overlays added chart.Histogram(zooTradeVec, main = "Plain", methods = NULL) chart.Histogram(zooTradeVec, main = "Density", breaks=40, methods = c("add.density", "add.normal")) chart.Histogram(zooTradeVec, main = "Skew and Kurt", methods = c("add.centered", "add.rug")) chart.Histogram(zooTradeVec, main = "Risk Measures", methods = c("add.risk"))

Possible Future Modifications

• Add shorting the strongest stocks so that the strategy is market neutral
• Vary how many stocks to hold
• Vary the input variables (discussed above)
• Try a different asset class, does this work for forex?