Analysis of returns after n consecutive up/down days – Predicting the Sign of Open to Close Returns

This weekend I was spammed for a “binary option trading system with 90% accuracy”. The advert caught my curiosity  in essence it detailed a method that was a variation of the well known roulette playing strategy that mathematically guarantees a profit (assuming infinite money, and no table limit).

Roulette Strategy

If you double your bet size after a loss and repeat the same bet you are guaranteed a profit, your next winning will cover all the preceding loses.

e.g Bet $1 on red, lose, Bet $2 on red, Win get $4 back ($2 is your stake, and $1 covers the loss from your first bet) giving $1 profit.

Exponential growth of lot size, no thanks :)

Binary options are analogous to betting on red, they offer virtually fixed odds for up or down directional bets. Naturally I want to know whats is the maximum number of consecutive up or down days in the market, how much pain would I have to suffer with this strategy.

Occurrences of n Consecutive Up or Down Days

Analysing the last 12 years of returns data for the S&P 500, the maximum consecutive number of up days is 9 (occurred in 2004-2005), the maximum consecutive number of down days is -8 which, you guessed it, occurred in 2008-2009.

So 9 days of pain should we always short the direction of the market.

Instead of enduring the 9 days of draw down, it is interesting to see what the consecutive number of up/down days says about the probability the next day is an up day. The maximum likelihood probability of an up day is count(up days)/count(up and down days). Naturally we will condition this data on the consecutive number of up/down days.

Consecutive Up or Down Days vs Maximum Likelihood Probability the next day is up

This data is fairly nice looking, for example in 2012-2013 there is a clear relationship. The more down days in a row the higher the likelihood of an up day. 6 down days implied the probability of an up day is 80%! I must raise a note of caution here, 6 down days in a row was seen less than 5 times in the year. Hence the probability estimate is based on 5 points and not statistically significant, perhaps looking at 5 years of returns might be better.

I appreciate that most people don’t trade binary options can we trade the index/stock out right, it is interesting to see what the consecutive number of up/down days says about future Open to Close Returns. The image below regresses Open to Close Returns (time t) with Consecutive Up/Down days (time t-1).

Consecutive Up or Down Days vs Next Day Open to Close Returns

Very disappointing chart, doesn’t really show much relationship between returns and consecutive up/down days. For some of the data points the up move is more probable than a down move but the magnitude of the up moves are significantly smaller than that of the down moves. These charts vary greatly by asset class and by security, single stocks have much more favorable plots.

Prediction Accuracy

The plot below shows the accuracy of using this maximum likelihood estimate approach. The model takes the last 250 days of returns and calculates the probability of an up move given that the current day has seen n consecutive days of trading. If the prob of an up move or is over a certain threshold go long, if its below a certain threshold go short.

Heatmap of Accuracy vs Model Parameters

 

The histogram on the heatmap shows that approximately half of the parameter combinations can predict the direction with accuracy greater than 50%.

Final Comments

The beauty of this approach is that it’s simple, it can be applied to any asset class but most importantly it can be applied across different time frames.

Onto the code:

?View Code RSPLUS
library("quantmod")
library("reshape")
library("gplots")
 
#Control Parameters
dataStartDate = as.Date("2000-01-01")
symbol<- "^GSPC"
longThreshold <- 0.70  #If the probability of an upday is greater than this limit go long
shortThreshold <- 1-longThreshold  #If the probability of an upday is lower than this limit go short
nLookback<-250 #Days to look back when generating the rolling probability distribution (approx 250 trading days in a year)
 
#Function to turn a boolean vector into a vector containing the consecutive num of trues or falses seen
#Will be used to calculate the consecutive number of up and down days
consecutiveTruesExtractor <- function(data){
        genNumOfConsecutiveTrues <- function(x, y) { (x+y)*y  } #Y is either 0 or 1
        upDaysCount <- Reduce(genNumOfConsecutiveTrues,data,accumulate=TRUE)
        upDaysCount <- as.vector(Lag(upDaysCount))
        upDaysCount[is.na(upDaysCount)] <- 0
 
 
        downDaysCount <- Reduce(genNumOfConsecutiveTrues,!data,accumulate=TRUE)
        downDaysCount <- as.vector(Lag(downDaysCount))
        downDaysCount[is.na(downDaysCount)] <- 0
        consecutiveTruesExtractor <- upDaysCount-downDaysCount
}
 
#Function to plot data and add regression line
doPlot <- function(x,y,title,xlabel,ylabel,ylimit){
  x<-as.vector(x)
  y<-as.vector(y)
  boxplot(y~x,main=title, xlab=xlabel,ylab=ylabel,ylim=ylimit)
  abline(lm(y~x),col="red",lwd=1.5)
}
 
#Function to calculate the percentage of updays from returns set
calcPercentageOfUpdaysFromReturnsSet <- function(data){
  calcPercentageOfUpdaysFromReturnsSet <- sum(data>0)/length(data)
}
 
#Function takes a set of returns and consecutive up down day data and aggregates it into a probability distribution
#Generated a matrix of consecutive Direction vs prob of up move
generateProbOfUpDayDistribution <- function(dataBlock){
 
 y <- as.matrix(by(dataBlock[,"OpClRet"],list(ConsecutiveDir = dataBlock[,"ConsecutiveDir"]),calcPercentageOfUpdaysFromReturnsSet)) #Prob of upmove
 x <- as.matrix(as.numeric(as.matrix(rownames(y))))
 
 res <- cbind(x,y)
 colnames(res) <- c("ConsecutiveDir","Prob")
 generateProbOfUpDayDistribution <- res
}
 
#Given current consecutive up down day data, what is the probability the current day is an up day
#For use with the rollapply function (since needs to use the past n days worth of data for generating probability distribution)
probOfUpDayForUseWithRollApply <- function(dataBlock){
  dist <- generateProbOfUpDayDistribution(head(dataBlock,-1)) #Use head to drop the last row, prevents a lookforward issue
 
  currentConsecutiveRun <- last(dataBlock[,"ConsecutiveDir"])
  probOfUpDay <- dist[dist[,"ConsecutiveDir"] == rep(coredata(currentConsecutiveRun), length(dist[,"ConsecutiveDir"])),"Prob"]
  if(!is.numeric(probOfUpDay)) {probOfUpDay <- 0.5 } #Never this many consecutive days before, dont know what will happen make up and down events equally likely
  #print(paste("Current Run:",coredata(currentConsecutiveRun),"Prob of up day:",probOfUpDay ))
  probOfUpDayForUseWithRollApply <- probOfUpDay
}
 
#Just a quick test to check that the consecutiveTruesExtractor is working as expected
#Define the input data, and define the expected output
#Check that the output of the function equals the expected output
data <-           c(0, 0, 0, 1,0, 1,1,0,0)  #0 is down day, 1 is up day
expectedOutput <- c(0,-1,-2,-3,1,-1,1,2,-1)
res <- consecutiveTruesExtractor(data)
if( identical(res,expectedOutput)){
  print("Match consecutiveTruesExtractor is correct")
} else {
  print("Error consecutiveTruesExtractor contains bugs")
}
 
 
 
 
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbol, env = symbolData, src = "yahoo", from = dataStartDate)
mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE))))
opClRet <- (Cl(mktdata)/Op(mktdata))-1
consecutiveDir <- consecutiveTruesExtractor(as.matrix(opClRet>0))
completeData<- cbind(opClRet,consecutiveDir)
colnames(completeData) <- c("OpClRet","ConsecutiveDir")
 
 
#Plot of consecutive up down days vs next day Open Close Returns
dev.new()
par(oma=c(0,0,2,0))
par(mfrow=c(3,3))
 
for(i in seq(2012,2004,-1)){
  windowedData <- window(completeData,start=as.Date(paste(i,"-01-01",sep="")),end=as.Date(paste(i+1,"-01-01",sep="")))
  doPlot(windowedData$ConsecutiveDir,windowedData$OpClRet,paste("Consecutive Up / Down days vs Open close Return (",i,",",i+1,")"),"Consecutive Up or Down days","Open Close Returns",c(-0.07,0.07))
}
title(main=paste("Symbol",symbol),outer=T)
 
#Plot of consecutive up down days vs the maximum likelihood probability that the next day is an up day
dev.new()
par(oma=c(0,0,2,0))
par(mfrow=c(3,3))
 
for(i in seq(2012,2004,-1)){
  windowedData <- window(completeData,start=as.Date(paste(i,"-01-01",sep="")),end=as.Date(paste(i+1,"-01-01",sep="")))
  y <-as.matrix(by(as.vector(windowedData$OpClRet),list(ConsecutiveDir = windowedData$ConsecutiveDir),calcPercentageOfUpdaysFromReturnsSet)) #Prob of upmove
  x <- as.matrix(as.numeric(as.matrix(rownames(y)))) #Consecutive up or down days
  plot(cbind(x,y),main=paste("Consecutive Up / Down days vs Next day Dir (",i,",",i+1,")"), xlab="Consecutive Up or Down days",ylab="Conditional Probability of an Up day")
  abline(lm(y~x),col="red",lwd=1.5)
}
title(main=paste("Symbol",symbol),outer=T)
 
#Plot of consecutive up down days vs the number of occurences seen
dev.new()
par(oma=c(0,0,2,0))
par(mfrow=c(3,3))
 
for(i in seq(2012,2004,-1)){
  windowedData <- window(completeData,start=as.Date(paste(i,"-01-01",sep="")),end=as.Date(paste(i+1,"-01-01",sep="")))
  y <-abs(as.matrix(by(windowedData$ConsecutiveDir,list(ConsecutiveDir = windowedData$ConsecutiveDir),sum))) #Count the number of occurences of each consecutive run
  x <- as.matrix(as.numeric(as.matrix(rownames(y))))
  plot(y,xaxt="n",main=paste("Consecutive Up / Down days vs Next day Dir (",i,",",i+1,")"), xlab="Consecutive Up or Down days",ylab="Occurences of consecutive run",type="l")
  axis(1, at=1:length(x),labels=x)
}
title(main=paste("Symbol",symbol),outer=T)
 
predictionPerformance <- function(completeData,longThreshold,shortThreshold,nLookback,displayPlot){
    #Calcuate the probabiliy of an up day using a nLookback day window
    rollingProbOfAnUpday <- rollapply(completeData,FUN=probOfUpDayForUseWithRollApply,align="right",fill=NA,width=nLookback,by.column=FALSE)
    rollingProbOfAnUpday <- as.matrix(rollingProbOfAnUpday)
    print(head(rollingProbOfAnUpday))
    colnames(rollingProbOfAnUpday) <- "ProbTodayIsAnUpDay"
    completeData <- cbind(completeData,rollingProbOfAnUpday)
 
    suggestedTradeDir <- rollingProbOfAnUpday #Just to copy the structure
    colnames(suggestedTradeDir) <- "SuggestedTradeDir"
    suggestedTradeDir[rollingProbOfAnUpday>longThreshold] <- 1  #Long Trade
    suggestedTradeDir[rollingProbOfAnUpday<shortThreshold] <- -1 #Short Trade
    suggestedTradeDir[rollingProbOfAnUpday<longThreshold & rollingProbOfAnUpday>shortThreshold] <- 0 #Do nothing
    completeData <- cbind(completeData,suggestedTradeDir)
 
    isPredictionCorrect <- suggestedTradeDir #Just to copy structure
    isPredictionCorrect <- sign(completeData$SuggestedTradeDir * completeData$OpClRet) #sign(0) is 0 so will capture no trades as well
    isPredictionCorrect[is.na(isPredictionCorrect)] <- 0
    isPredictionCorrect[is.nan(isPredictionCorrect)] <- 0
    if(displayPlot){
      dev.new()
      plot(cumsum(isPredictionCorrect), main=paste("Market Direction Prediction Performance for",symbol,"(Probability Threshold, Long=",longThreshold,"Short=",shortThreshold,"Lookback=",nLookback,")"),xlab="Date",ylab="Cumulative Sum of Correct (+1) and Wrong(-1) Predictions")
      msgIncorrectPred <- (paste("Incorrect Predictions (out of the days when a prediction was made)",100*abs(sum(isPredictionCorrect[isPredictionCorrect==-1]))/length(isPredictionCorrect[isPredictionCorrect!=0]),"%"))
      msgCorrectPred <- (paste("Correct Predictions (out of the days when a prediction was made)",100*sum(isPredictionCorrect[isPredictionCorrect==1])/length(isPredictionCorrect[isPredictionCorrect!=0]),"%"))
      msgPercOfDaysWithPred <- (paste("Percent of days when a prediction was made",100*sum(abs(isPredictionCorrect[isPredictionCorrect!=0]))/length(isPredictionCorrect),"%"))
      legend("topleft",c(msgIncorrectPred,msgCorrectPred,msgPercOfDaysWithPred),bg="lightblue")
    }
    predictionPerformance <- sum(isPredictionCorrect[isPredictionCorrect==1])/length(isPredictionCorrect[isPredictionCorrect!=0])
}
 
 
predictionPerformance(completeData,longThreshold,shortThreshold,nLookback,TRUE)
 
#Parameter search
resultsMat <- matrix(numeric(0), 0,3)
colnames(resultsMat) <- c("Lookback","LongProbThreshold","Accuracy")
for(nLookback in seq(30,240,30)){
  for(longThreshold in seq(0.55,1,0.05)){
    shortThreshold <- 1-longThreshold
    accuracy <- predictionPerformance(completeData,longThreshold,shortThreshold,nLookback*2,FALSE)
    resultsMat <- rbind(resultsMat,as.matrix(cbind(nLookback,longThreshold,accuracy)))
    print(resultsMat)
  }
}
print(resultsMat)
tt <-(as.matrix(as.data.frame(cast(as.data.frame(resultsMat),Lookback~LongProbThreshold))))
rownames(tt)<-tt[,"Lookback"]
heatmap.2(tt[,-1],key=TRUE,Rowv=FALSE,Colv=FALSE,xlab="Prob of Up Day Threshold",ylab="Lookback",trace="none",main=paste("Prediction Accuracy (correct predictions as % of all predictions) for ",symbol))

 

Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-includes/images/smilies/icon_smile.gifDigg ThisSubmit to redditShare via email

Skewness Revisited

In my last post http://gekkoquant.com/2013/03/03/trend-following-skewness-signal/ I suggested that measuring the skewness of asset returns can possibly be used to identify trends. Or mathematically more accurately put, it can identify when the return distribution is symmetric and hence NOT in a trending environment (assuming returns are Gaussian with 0 mean).

This post presents a simple regression of Skewness vs Asset returns. The skewness is calculated at time t using OpenCloseReturns[t-1,t-2,.....,t-lookback]. It is then regressed against OpenCloseReturn[t]. Where t donates today, and t-1 donates yesterday.

The data set is 2000 to present.

The red line in the above plot is a linear regression. It’s clear to see that the skewness doesn’t explain the Open Close Returns. I must eat humble pie, the returns seen in the previous post are down to a lucky parameter selection. Thanks must go to Pietro for testing over a longer period.

Onto the code:

?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
library(e1071)     #For the skewness command
 
 
#Script parameters
symbol <- "^GSPC"     #Symbol
 
#Specify dates for downloading data
startDate = as.Date("2000-01-01") #Specify what date to get the prices from
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate)
mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE))))
mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo
OpClRet  <- (Cl(mktdata)/Op(mktdata))    - 1
colnames(OpClRet) <- "OpClRet"
mktdata <- cbind(mktdata,OpClRet)
 
 
getSkewnessFromReturns <- function(mktdata,skewLookback){
   #Calculate the skew
  rollingSkew <- Lag(rollapply(mktdata$OpClRet ,FUN="skewness",width=skewLookback, align="right",na.pad=TRUE),1)
   colnames(rollingSkew) <- "Skew"
  getSkewnessFromReturns <- na.omit(cbind(mktdata$OpClRet,rollingSkew))
}
 
 
 
dev.new()
par(mfrow=c(3,3))
startSkew <-30
stepSize <- 30
numOfSteps <- 9
 
doPlot <- function(x,y,title,xlabel,ylabel){
  #Function to plot data and add regression line
  plotData <-cbind(x,y)
  print(head(plotData))
  colnames(plotData) <-c("x","y")
  plot(coredata(plotData),type="p",main=title, xlab=xlabel,ylab=ylabel)
  abline(lm(y~x),col="red",lwd=1.5)
}
 
 
for(i in seq(startSkew,numOfSteps*stepSize,stepSize)){
dat <- getSkewnessFromReturns(mktdata,i)
colnames(dat) <- c("OpClRet","Skew")
doPlot(dat$Skew,dat$OpClRet,paste("Skewness vs Open Close Returns (lookback=",i,")"),"Skewness","Open Close Returns")
}

 

Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2013/03/skewness-returns-regression.jpegDigg ThisSubmit to redditShare via email

$20mm seed capital tournament, “The Sharpe Ratio Shootout”

Hi, I’ve been asked to advertise this seed capital tournament. I get asked to advertise all sorts of things but this looks very promising for beginner quant traders to launch their careers.

Looking forwards to competing against some of you :)

Announcing: $20mm seed capital tournament, “The Sharpe Ratio Shootout” Registration ends March 15th, 2013.

Battle-Fin is launching its 4th systematic trading tournament.  Tournament Sponsors have pledged up to $20mm in allocations.

Battle-Fin uses cutting edge real-time tournaments to democratically identify tomorrow’s best trading strategies across asset classes.

The idea is to harness the power of the internet to identify trading strategies.

If you are a quantitative trader with a successful investment strategy, register for our next tournament by March 15th, 2013. There is no cost to enter and we do not ask to own trader IP.

REGISTER AT: www.battlefin.com

You can also contact us at (212) 201-5376 or info@battlefin.com

BATTE-FIN IN THE NEWS

Business Week recently wrote a feature article on Battle-Fin highlighting the 4 winners that are currently trading their allocations.
Click: http://www.businessweek.com/articles/2012-12-20/the-hedge-fund-hunger-games

Here is a recent Bloomberg TV interview on Market Makers with Battle Fin.

http://bloom.bg/XFIluz
Like us on Facebook at: http://www.facebook.com/BattleFin

Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-includes/images/smilies/icon_smile.gifDigg ThisSubmit to redditShare via email

Trend Following – Skewness Signal

Oxford Dictionary Definition of a trend

  • noun: a general direction in which something is developing or changing
Often people try to capture a trend using technical indicators such as moving average cross overs. If there is a trend this technique is generally good at capturing it. However if there isn’t a trend it becomes difficult to identify side ways markets with moving averages (see http://gekkoquant.com/2012/08/29/parameter-optimisation-backtesting-part-2/ moving averages performing excellently well during 2009 when there was a real trend).

Is there a better way to identify a trend? Potentially. A common modelling assumption in finance is to say that stock returns follow Brownian motion and that over short periods the mean return is 0, or in other words daily returns are Gaussian distributed with 0 mean.

This post will use the same assumption, asset returns are Guassian and hence returns are symmetrically distributed around their mean (assume 0 mean). Symmetric distributions have a skewness of 0, any deviation from 0 indicates that one of the tails is beginning to get bigger and possibly that the stock is trending.

Skewness is a measure of assymetery for a probability distribution, the skewness tells us the amount and direction of the skew. This strategy will take a rolling distribution of returns and calculate the skew for those returns. The skew will then be used to take trades.

Strategy:
  • If skewness > UptrendSkewLimit then go Long
  • If skewness < DowntrendSkewLimit then go Short
  • If skewness between UptrendSkewLimit and DowntrendSkewLimit then returns are symmetrical, it’s a sideways market do nothing
 

 

Trend Following - Annualized Sharpe Ratio (Rf=0%) 0.7162438

S&P500 Long Open Close Returns - Annualized Sharpe Ratio (Rf=0%) 0.2129997

?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
library(e1071)     #For the skewness command
 
#Model paramters
nLookback <- 30 #When calculating the rolling skew use the nLookback number of days
UptrendSkewLimit <- 0.3 #If the rolling skew is greater than this value go long
DowntrendSkewLimit <- -0.5 #If the rolling skew is lower than this value go short
#If the rolling skew is between the two limits do nothing, the skew is too weak to indicate a trend
 
#Script parameters
symbol <- "^GSPC"     #Symbol
 
#Specify dates for downloading data
startDate = as.Date("2005-01-01") #Specify what date to get the prices from
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate)
mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE))))
mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo
dayOpClRet <- Cl(mktdata)/Op(mktdata)    - 1
cat("About to calculate the rolling skew")
 
#Lets calculate the rolling skew
#Lag the rolling skew by one day so the skew measured at the close of day T is sifted to day T+1
#The skew will be used to determine the trade at the open
rollingSkew <- Lag(rollapply(dayOpClRet,FUN="skewness",width=nLookback, align="right"),1)
 
#Possible improvement - Do a exponential moving average on the skew signal to smooth it
#rollingSkew <- EMA(rollingSkew,n=nLookback)
 
longSignals <- (rollingSkew>UptrendSkewLimit)
longReturns <- longSignals*dayOpClRet
shortSignals <- (rollingSkew<DowntrendSkewLimit)
shortReturns <- -1*shortSignals*dayOpClRet
totalReturns <- longReturns + shortReturns
#Uncomment the line below to increase the position size for larger skews
#totalReturns <- totalReturns * (abs(rollingSkew)+1)
totalReturns[is.na(totalReturns)] <- 0
 
GEKKORed <- rgb(255/255, 0/255, 0/255, 0.1)
GEKKOGreen <- rgb(0/255, 255/255, 0/255, 0.1)
 
dev.new()
par(mfrow=c(3,1))
plot(Cl(mktdata), main="Close of S&P 500")
lines(longSignals*max(Cl(mktdata)), col=(GEKKOGreen),type="h")
lines(shortSignals*max(Cl(mktdata)), col=(GEKKORed),type="h")
plot(rollingSkew)
abline(UptrendSkewLimit,0,col="green")
abline(DowntrendSkewLimit,0,col="red")
plot(cumsum(totalReturns), main="Cumulative Returns - Trend Following")
 
 
#### Performance Analysis ###
colnames(dayOpClRet) <- "Long IndexOpCloseRet"
zooTradeVec <- cbind(as.zoo(totalReturns),as.zoo(dayOpClRet)) #Convert to zoo object
colnames(zooTradeVec) <- c("Trend Following","S&P500 Long Open Close Returns")
zooTradeVec <- na.omit(zooTradeVec)
#Lets see how all the strategies faired against the index
dev.new()
charts.PerformanceSummary(zooTradeVec,main="Performance of Trend Following",geometric=FALSE)
 
#Calculate the sharpe ratio
cat("Sharpe Ratio")
print(SharpeRatio.annualized(zooTradeVec))
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2013/02/trend-following-trade-signals.jpegDigg ThisSubmit to redditShare via email

Statistical Arbitrage – Trading a cointegrated pair

In my last post http://gekkoquant.com/2012/12/17/statistical-arbitrage-testing-for-cointegration-augmented-dicky-fuller/ I demonstrated cointegration, a mathematical test to identify stationary pairs where the spread by definition must be mean reverting.

In this post I intend to show how to trade a cointegrated pair and will continue analysing Royal Dutch Shell A vs B shares (we know they’re cointegrated from my last post). Trading a cointegrated pair is straight forward, we know the mean and variance of the spread, we know that those values are constant. The entry point for a stat arb is to simply look for a large deviation away from the mean.

A basic strategy is:

  • If spread(t) >= Mean Spread + 2*Standard Deviation then go Short
  • If spread(t) <= Mean Spread – 2*Standard Deviation then go Long
There are many variations of this strategy
Moving average / moving standard deviation (this will be explored later):
  • If spread(t) >= nDay Moving Average + 2*nDay Rolling Standard deviation then go Short
  • If spread(t) <= nDay Moving Average – 2*nDay Rolling Standard deviation then go long
Wait for mean reversion:
  • If spread(t) <= Mean Spread + 2*Std AND spread(t-1)> Mean Spread + 2*Std
  • If spread(t) >= Mean Spread – 2*Std AND spread(t-1)< Mean Spread – 2*Std
  • Advantage is that we only trade when we see the mean reversion, where as the other models are hoping for mean reversion on a large deviation from the mean (is the spread blowing up?)
All the above strategies look to exit their position when the spread has reverted to the mean. Personally I wouldn’t trade any of the above as they don’t specify an exit strategy for adverse trades. Ie if there is a 6 standard deviation move in the spread is this an amazing trade opportunity? OR more likely did the spread just blow up.

This post will look at the moving average and rolling standard deviation model for Royal Dutch Shell A vs B shares, it will use the hedge ratio found in the last post.

Sharpe Ratio Shell A & B Stat Arb Shell A
Annualized Sharpe Ratio (Rf=0%):

Shell A&B Stat Arb 0.8224211

Shell A 0.166307

The stat arb has a Superior Sharpe ratio over simply investing in Shell A. At a first glance the sharpe ratio of 0.8 looks disappointing, however since the strategy spends most of it’s time out of the market it will have a low annualized sharpe ratio. To increase the sharpe ratio one can look at trading higher frequencies or have a portfolio pairs so that more time is spent in the market.

Onto the code:

?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
 
 
 
backtestStartDate = as.Date("2010-01-02") #Starting date for the backtest
 
symbolLst<-c("RDS-A","RDS-B")
title<-c("Royal Dutch Shell A vs B Shares")
 
### SECTION 1 - Download Data & Calculate Returns ###
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = backtestStartDate)
 
#We know this pair is cointegrated from the tutorial
#http://gekkoquant.com/2012/12/17/statistical-arbitrage-testing-for-cointegration-augmented-dicky-fuller/
#The tutorial found the hedge ratio to be 0.9653
stockPair <- list(
 a = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))))   #Stock A
,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep=""))))) #Stock B
,hedgeRatio = 0.9653
,name=title)
 
simulateTrading <- function(stockPair){
#Generate the spread
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
 
#Strategy is if the spread is greater than +/- nStd standard deviations of it's rolling 'lookback' day standard deviation
#Then go long or short accordingly
lookback <- 90 #look back 90 days
nStd <- 1.5 #Number of standard deviations from the mean to trigger a trade
 
movingAvg = rollmean(spread,lookback, na.pad=TRUE) #Moving average
movingStd = rollapply(spread,lookback,sd,align="right", na.pad=TRUE) #Moving standard deviation / bollinger bands
 
upperThreshold = movingAvg + nStd*movingStd
lowerThreshold = movingAvg - nStd*movingStd
 
aboveUpperBand <- spread>upperThreshold
belowLowerBand <- spread<lowerThreshold
 
aboveMAvg <- spread>movingAvg
belowMAvg <- spread<movingAvg
 
aboveUpperBand[is.na(aboveUpperBand)]<-0
belowLowerBand[is.na(belowLowerBand)]<-0
aboveMAvg[is.na(aboveMAvg)]<-0
belowMAvg[is.na(belowMAvg)]<-0
 
#The cappedCumSum function is where the magic happens
#Its a nice trick to avoid writing a while loop
#Hence since using vectorisation is faster than the while loop
#The function basically does a cumulative sum, but caps the sum to a min and max value
#It's used so that if we get many 'short sell triggers' it will only execute a maximum of 1 position
#Short position - Go short if spread is above upper threshold and go long if below the moving avg
#Note: shortPositionFunc only lets us GO short or close the position
cappedCumSum <- function(x, y,max_value,min_value) max(min(x + y, max_value), min_value)
shortPositionFunc <- function(x,y) { cappedCumSum(x,y,0,-1) }
longPositionFunc <- function(x,y) { cappedCumSum(x,y,1,0) }
shortPositions <- Reduce(shortPositionFunc,-1*aboveUpperBand+belowMAvg,accumulate=TRUE)
longPositions <- Reduce(longPositionFunc,-1*aboveMAvg+belowLowerBand,accumulate=TRUE)
positions = longPositions + shortPositions
 
dev.new()
par(mfrow=c(2,1))
plot(movingAvg,col="red",ylab="Spread",type='l',lty=2)
title("Shell A vs B spread with bollinger bands")
lines(upperThreshold, col="red")
lines(lowerThreshold, col="red")
lines(spread, col="blue")
legend("topright", legend=c("Spread","Moving Average","Upper Band","Lower Band"), inset = .02,
lty=c(1,2,1,1),col=c("blue","red","red","red")) # gives the legend lines the correct color and width
 
plot((positions),type='l')
 
#Calculate spread daily ret
 stockPair$a - stockPair$hedgeRatio*stockPair$b
aRet <- Delt(stockPair$a,k=1,type="arithmetic")
bRet <- Delt(stockPair$b,k=1,type="arithmetic")
dailyRet <- aRet - stockPair$hedgeRatio*bRet
dailyRet[is.na(dailyRet)] <- 0
 
tradingRet <- dailyRet * positions
    simulateTrading <- tradingRet
}
 
tradingRet <- simulateTrading(stockPair)
 
 
#### Performance Analysis ###
#Calculate returns for the index
indexRet <- Delt(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))),k=1,type="arithmetic") #Daily returns
indexRet <- as.zoo(indexRet)
tradingRetZoo <- indexRet
tradingRetZoo[,1] <- tradingRet
zooTradeVec <- as.zoo(cbind(tradingRetZoo,indexRet)) #Convert to zoo object
colnames(zooTradeVec) <- c("Shell A & B Stat Arb","Shell A")
zooTradeVec <- na.omit(zooTradeVec)
 
#Lets see how all the strategies faired against the index
dev.new()
charts.PerformanceSummary(zooTradeVec,main="Performance of Shell Statarb Strategy",geometric=FALSE)
 
cat("Sharpe Ratio")
print(SharpeRatio.annualized(zooTradeVec))
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2013/01/shellAvsB-bollinger-bands.jpegDigg ThisSubmit to redditShare via email

Statistical Arbitrage – Testing for Cointegration – Augmented Dicky Fuller

In my last post “Statistical Arbitrage Correlation vs Cointegration“, we discussed what statistical arbitrage is and what property between two pairs we aim to exploit. The mathematics essentially showed that if you go long stock A and short stock B with some appropriate hedging factor to cancel out the drift/growth terms in the Brownian motion equation then you are left with a stationary signal which is the spread between the two stocks. The maths also showed that in expectation the daily change in the spread is zero (E_{t}[\Delta Spread]=0) and hence any deviation from this presents the opportunity for a trade.

It was assumed that the stock growth terms \mu_{a} and \mu_{b} are constant (or drifting slowly over time both at the same rate) or in other words the hedge ratio is constant. This post will detail how to find the hedge ratio and present the Augmented Dicky Fuller test which can be used to identify with a certain level of confidence if the spread is stationary and hence cointegrated.

Determining the hedge factor

Spread=A-nB set the spread to 0

It is required to find the hedge factor, n, to satisfy the above equation. The hedge factor is easily found by regressing the A vs B. Notice that it is the prices being regressed and not the daily returns.

Examining the residuals

Once the hedge factor has been found the residuals of the regression must be analysed. The residual is how much the spread has changed for a given day, the whole idea of the regression was to identify the hedge factor that keeps the daily change of the spread as close to zero as possible (we want the residuals to be zero).

  • If the residuals contain a trend then this implies that the daily change of the spread has a net direction and will cause the spread to consistently widen or contract.
  • If the residuals contain no trend then this implies that the daily change of the spread will oscillate around zero (is stationary).

What residuals to analyse is open to debate, you may want to take another data set from a different point in time and apply the hedge factor you calculated in the above regression to this new set and analyse the residuals, this helps to identify an over fitting during the regression.

Dicky Fuller Test

For an AR(1) process y_{t}=ay_{t-1}+\epsilon_{t} where y_{0}=0 and \epsilon \sim N(0,\sigma^{2}) what value of a results in a stationary signal?

y_{1}=ay_{0}+\epsilon_{0}

y_{2}=ay_{1}+\epsilon_{1}=a(ay_{0}+\epsilon_{0})+\epsilon{1}

y_{t}=a^{t}y_{0}+\Sigma_{j=0}^{t-1}a^{j}\epsilon_{j}

Take expectations, already know E[y_{0}]=0 and E[\epsilon_{t}]=0

E[y_{t}]=a^{t}E[y_{0}]+\Sigma_{j=0}^{t-1}a^{j}E[\epsilon_{j}]

E[y_{t}]=a^{t}*0+\Sigma_{j=0}^{t-1}a^{j}*0=0 the mean is zero and constant
Examining the variance
V[y_{t}]=0+\Sigma_{j=0}^{t-1}a^{j}V[\epsilon_{j}]=\sigma^{2}\Sigma_{j=0}^{t-1}a^{j}
  • If a < 1 then as t \to \inf then V[y_{t}]\to constant
  • If a >= 1 then V[y_{t}] grows with time, hence is not stationary
 This forms the premise of the Dicky Fuller test, perform a linear regression on the residuals and see what the value of a is, if it’s greater than or equal to one this indicates the signal isn’t stationary and therefore isn’t suitable to stat arb. The dicky fuller test was invented by D.A Dickey and W.A Fuller, they produced the dickey-fuller distribution relating number of test samples and the value of a to a probability of a unit root. Further reading can be found at Dickey Fuller Test – Wikipedia. The dickey fuller test is a hypothesis test that a signal contains a unit root,we want to reject this hypothesis (a unit root means our signal is non-stationary). The test gives a pValue, the lower this number the more confident we can be that we have found a stationary signal. pValues less than 0.1 are considered to be good candidates.

Royal Dutch Shell Case Study
As mentioned in the previous post, two different shares for the same company are usually good stat arb candidates. Here the hypothesis that Royal Dutch Shell A & B shares are cointegrated will be tested.

The beta or Hedge Ratio is: 0.965378555498888

Augmented Dickey-Fuller Test

Dickey-Fuller = -3.6192, Lag order = 0, p-value = 0.03084
alternative hypothesis: stationary

The low pValue indicates that this pair is cointegrated.

Royal Bank of Scotland vs Barclays Case Study

The beta or Hedge Ratio is: 0.645507426925485

Augmented Dickey-Fuller Test

Dickey-Fuller = -1.6801, Lag order = 0, p-value = 0.7137
alternative hypothesis: stationary

The pValue indicates that the spread isn’t stationary, and the bottom right graph of the spread is definitely in a strong trend and isn’t range bound. This would have been a horrific stat arb!
On to the code:
?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
library("fUnitRoots")
library("tseries")
 
#Specify dates for downloading data, training models and running simulation
hedgeTrainingStartDate = as.Date("2008-01-01") #Start date for training the hedge ratio
hedgeTrainingEndDate = as.Date("2010-01-01") #End date for training the hedge ratio
 
symbolLst<-c("RDS-A","RDS-B")
title<-c("Royal Dutch Shell A vs B Shares")
 
symbolLst<-c("RBS.L","BARC.L")
title<-c("Royal Bank of Scotland vs Barclays")
 
### SECTION 1 - Download Data & Calculate Returns ###
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = hedgeTrainingStartDate, to=hedgeTrainingEndDate)
 
stockPair <- list(
 a = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))))   #Stock A
,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep=""))))) #Stock B
,name=title)
 
 
 
 
testForCointegration <- function(stockPairs){
#Pass in a pair of stocks and do the necessary checks to see if it is cointegrated
 
#Plot the pairs
 
dev.new()
par(mfrow=c(2,2))
print(c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b)))))
plot(stockPairs$a, main=stockPairs$name, ylab="Price", type="l", col="red",ylim=c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b)))))
lines(stockPairs$b, col="blue")
  print(length(stockPairs$a))
print(length(stockPairs$b))
#Step 1: Calculate the daily returns
dailyRet.a <- na.omit((Delt(stockPairs$a,type="arithmetic")))
dailyRet.b <- na.omit((Delt(stockPairs$b,type="arithmetic")))
dailyRet.a <- dailyRet.a[is.finite(dailyRet.a)] #Strip out any Infs (first ret is Inf)
dailyRet.b <- dailyRet.b[is.finite(dailyRet.b)]
 
print(length(dailyRet.a))
print(length(dailyRet.b))
#Step 2: Regress the daily returns onto each other
#Regression finds BETA and C in the linear regression retA = BETA * retB + C
regression <- lm(dailyRet.a ~ dailyRet.b + 0)
beta <- coef(regression)[1]
print(paste("The beta or Hedge Ratio is: ",beta,sep=""))
plot(x=dailyRet.b,y=dailyRet.a,type="p",main="Regression of RETURNS for Stock A & B") #Plot the daily returns
lines(x=dailyRet.b,y=(dailyRet.b*beta),col="blue")#Plot in linear line we used in the regression
 
 
#Step 3: Use the regression co-efficients to generate the spread
spread <- stockPairs$a - beta*stockPairs$b #Could actually just use the residual form the regression its the same thing
spreadRet <- Delt(spread,type="arithmetic")
spreadRet <- na.omit(spreadRet)
#spreadRet[!is.na(spreadRet)]
plot((spreadRet), type="l",main="Spread Returns") #Plot the cumulative sum of the spread
plot(spread, type="l",main="Spread Actual") #Plot the cumulative sum of the spread
#For a cointegrated spread the cumsum should not deviate very far from 0
#For a none-cointegrated spread the cumsum will likely show some trending characteristics
 
#Step 4: Use the ADF to test if the spread is stationary
#can use tSeries library
adfResults <- adf.test((spread),k=0,alternative="stationary")
 
print(adfResults)
if(adfResults$p.value <= 0.05){
print(paste("The spread is likely Cointegrated with a pvalue of ",adfResults$p.value,sep=""))
} else {
print(paste("The spread is likely NOT Cointegrated with a pvalue of ",adfResults$p.value,sep=""))
}
 
}
 
 
testForCointegration(stockPair)
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://l.wordpress.com/latex.php?latex=E_%7Bt%7D%5B%5CDelta%20Spread%5D%3D0&bg=FFFFFF&fg=000000&s=1Digg ThisSubmit to redditShare via email

Statistical Arbitrage – Correlation vs Cointegration

What is statistical arbitrage (stat arb)?

The premise of statistical arbitrage, stat arb for short, is that there is a statistical mispricing between a set of securities which we look to exploit. Typically a strategy requires going long a set of stocks and short another. StatArb evolved from pairs trading where one would go long a stock and short it’s competitor as a hedge, in pairs trading the aim is to select a stock that is going to outperform it’s peers. StatArb is all about mean reversion, in essence you are saying that the spread between any two stocks should be constant (or slowly evolving throughout time), any deviations from the spread present a trading opportunity since in StatArb we believe the spread is mean reverting. Contrary to the name statistical arbitrage isn’t about making risk free money (deterministic arbitrage is risk free).

What type of stocks make good pairs?

The best stocks to use in StatArb are those where there is a fundamental reason for believing that the spread is mean reverting / stationary. Typically this means that the stocks are in the same market sector or even better the same company (some companies have A and B shares with different voting rights or trade on different exchanges)! Some examples of fundamentally similar pairs would be Royal Dutch Shell A vs Royal Dutch Shell B shares, Goldman Sachs vs JP Morgan, Apple vs ARM (their chip supplier), ARM vs ARM ADR, some cross sector groups may also work such as Gold Mining vs Gold Price.

A poor example would be Royal Bank of Scotland vs Tesco since their businesses are completely different / don’t impact each other.

What is the mathematical definition of a good pair?

Upon coming up with a good fundamental stock pairing you next need to have a mathematical test for determining if it’s a good pair. The most common test is to look for cointegration (http://en.wikipedia.org/wiki/Cointegration) as this would imply that the pair is a stationary pair (the spread is fixed) and hence statistically it is mean reverting. When testing for cointegration a Pvalue(http://en.wikipedia.org/wiki/P-value) hypothesis test is performed, so we can express a level of confidence in the pair being mean reverting.

What is the difference between correlation and cointegration?

When talking about statisitical arbitrage many people often get confused between correlation and cointegration.

  • Correlation – If two stocks are correlated then if stock A has an upday then stock B will have an upday
  • Cointegration – If two stocks are cointegrated then it is possible to form a stationary pair from some linear combination of stock A and B

One of the best explanations of cointegration is as follows: “A man leaves a pub to go home with his dog, the man is drunk and goes on a random walk, the dog also goes on a random walk. They approach a busy road and the man puts his dog on a lead, the man and the dog are now cointegrated. They can both go on random walks but the maximum distance they can move away from each other is fixed ie length of the lead”. So in essence the distance/spread between the man and his dog is fixed, also note from the story that the man and dog are still on a random walk, there is nothing to say if their movements are correlated or uncorrelated.With correlated stocks they will move in the same direction most of the time however the magnitude of the moves is unknown, this means that if you’re trading the spread between two stocks then the spread can keep growing and growing showing no signs of mean reversion. This is in contract to cointegration where we say the spread is “fixed” and that if the spread deviates from the “fixing” then it will mean revert.

Lets explore cointegration some more:

Equation for Geometric brownian motion

A_{t+T} = A_{t}+N(\mu_{a}T,\sigma_{a}\sqrt{T})

where A stands for the price of stock A
\Delta A = N(\mu_{a}\Delta t,\sigma_{a}\sqrt{\Delta t})
\Delta A = \mu_{a}\Delta t+N(0,\sigma_{a}\sqrt{\Delta t})
E_{t}[\Delta A] = \mu_{a}
From the geometric Brownian motion equation we can see that the expected change in A over time (E_{t}[\Delta A]) is \mu_{a}, in other words A is not stationary  assuming \mu_{a} isnt zero).

We want to find a cointegrated / stationary pair.

Equation for going long stock A and short n lots of Stock B
Spread=A-nB

\Delta Spread=\Delta A-n\Delta B
\Delta Spread=\mu_{a}\Delta t+N(0,\sigma_{a}\sqrt{\Delta t})-n\mu_{b}\Delta t-nN(0,\sigma_{b}\sqrt{\Delta t})
E_{t}[\Delta Spread]=E_{t}[\Delta A-n\Delta B] = \mu_{a}-n\mu_{b}
Setting n=\frac{\mu_{a}}{\mu_{b}}
E_{t}[\Delta Spread]=E_{t}[\Delta A-n\Delta B] = \mu_{a}-\frac{\mu_{a}\mu_{b}}{\mu_{b}}=\mu_{a}-\mu_{a}=0
Hence is stationary assuming that the hedge ratio, n, remains constant!!!

 

Example of correlated stocks : Notice the spread blowing up

Example of cointegrated stocks: Notice the spread looks oscillatory

Onto the code to generate those plots (mainly taken from http://quant.stackexchange.com/questions/1027/how-are-correlation-and-cointegration-related)

?View Code RSPLUS
 library(MASS)
 
 #Code largely copied from http://quant.stackexchange.com/questions/1027/how-are-correlation-and-cointegration-related
 
 #The input data
 nsim <- 250  #Number of data points
 mu_a <- 0.0002  #Mu_a growth rate for stock a
 sigma_a <- 0.010   #Sigma_a volatility for stock a
 mu_b <- 0.0005  #Mu_a growth rate for stock a
 sigma_b <- 0.005   #Sigma_a volatility for stock a
 corxy <- 0.8    #Correlation coeficient for xy
 
 #Calculate a correlated return series
 #Build the covariance matrix and generate the correlated random results
 (covmat <- matrix(c(sigma_a^2, corxy*sigma_a*sigma_b, corxy*sigma_a*sigma_b, sigma_b^2), nrow=2))
 res <- mvrnorm(nsim, c(mu_a, mu_b), covmat)    #Calculate multivariate normal distribution
 plot(res[,1], res[,2])
 
 #Calculate the stats of res[] so they can be checked with the input data
 mean(res[,1])
 sd(res[,1])
 mean(res[,2])
 sd(res[,2])
 cor(res[,1], res[,2])
 
 path_a <- exp(cumsum(res[,1]))
 path_b <- exp(cumsum(res[,2]))
 spread <- path_a - path_b
                                 #Set the plotting area to a 2 by 1 grid
layout(rbind(1,2))
 #Plot the two price series that have correlated returns
 plot(path_a, main="Two Price Series with Correlated Returns", ylab="Price", type="l", col="red")
 lines(path_b, col="blue")
  plot(spread, type="l")
 
 
 ##Cointegrated pair
 #The input data
 nsim <- 250  #Number of data points
 mu_a <- 0.0002  #Mu_a growth rate for stock a
 sigma_a <- 0.010   #Sigma_a volatility for stock a
 mu_b <- 0.0002  #Mu_a growth rate for stock a
 sigma_b <- 0.005   #Sigma_a volatility for stock a
coea <- 0.0200    #Co-integration coefficient for x
coeb <- 0.0200    #Co-integration coefficient for y
 
#Generate the noise terms for x and y
rana <- rnorm(nsim, mean=mu_a, sd=sigma_a) #White noise for a
ranb <- rnorm(nsim, mean=mu_b, sd=sigma_b) #White noise for b
 
#Generate the co-integrated series x and y
a <- numeric(nsim)
b <- numeric(nsim)
a[1] <- 0
b[1] <- 0
for (i in 2:nsim) {
#Logic here is that is b>a then we add on the difference so that
#a starts to catch up with b, hence causing the spread to close
  a[i] <- a[i-1] + (coea * (b[i-1] - a[i-1])) + rana[i-1]
  b[i] <- b[i-1] + (coeb * (a[i-1] - b[i-1])) + ranb[i-1]
}
 
#Plot a and b as prices
ylim <- range(exp(a), exp(b))
path_a <- exp(a)
path_b <- exp(b)
spread <- path_a - path_b
 
dev.new()
layout(rbind(1,2))
plot(path_a, ylim=ylim, type="l", main=paste("Co-integrated Pair (coea=",coea,",  coeb=",coeb,")", sep=""), ylab="Price", col="red")
lines(path_b, col="blue")
legend("bottomleft", c("exp(a)", "exp(b)"), lty=c(1, 1), col=c("red", "blue"), bg="white")
 
plot(spread,type="l")
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://l.wordpress.com/latex.php?latex=A_%7Bt%2BT%7D%20%3D%20A_%7Bt%7D%2BN%28%5Cmu_%7Ba%7DT%2C%5Csigma_%7Ba%7D%5Csqrt%7BT%7D%29&bg=FFFFFF&fg=000000&s=1Digg ThisSubmit to redditShare via email

Parameter Optimisation & Backtesting – Part 2

This is a follow on from: http://gekkoquant.com/2012/08/29/parameter-optimisation-backtesting-part1/

The code presented here will aim to optimise a strategy based upon the simple moving average indicator. The strategy will go Long when moving average A > moving average B. The optimisation is to determine what period to make each of the moving averages A & B.

Please note that this isn’t intended to be a good strategy, it is merely here to give an example of how to optimise a parameter.

Onto the code:

Functions

  • TradingStrategy this function implements the trading logic and calculates the returns
  • RunIterativeStrategy this function iterates through possible parameter combinations and calls TradingStrategy for each new parameter set
  • CalculatePerformanceMetric takes in a table of returns (from RunIterativeStrategy) and runs a function/metric over each set of returns.
  • PerformanceTable calls CalculatePerformanceMetric for lots of different metric and compiles the results into a table
  • OrderPerformanceTable lets us order the performance table by a given metric, ie order by highest sharpe ratio
  • SelectTopNStrategies selects the best N strategies for a specified performance metric (charts.PerformanceSummary can only plot ~20 strategies, hence this function to select a sample)
  • FindOptimumStrategy does what it says on the tin
Note that when performing the out of sample test, you will need to manual specify the parameter set that you wish to use.
?View Code RSPLUS
 
library("quantmod")
library("PerformanceAnalytics")
 
 
nameOfStrategy <- "GSPC Moving Average Strategy"
 
#Specify dates for downloading data, training models and running simulation
trainingStartDate = as.Date("2000-01-01")
trainingEndDate = as.Date("2010-01-01")
outofSampleStartDate = as.Date("2010-01-02")
 
 
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols("^GSPC", env = symbolData, src = "yahoo", from = trainingStartDate)
trainingData <- window(symbolData$GSPC, start = trainingStartDate, end = trainingEndDate)
testData <- window(symbolData$GSPC, start = outofSampleStartDate)
indexReturns <- Delt(Cl(window(symbolData$GSPC, start = outofSampleStartDate)))
colnames(indexReturns) <- "GSPC Buy&Hold"
 
TradingStrategy <- function(mktdata,mavga_period,mavgb_period){
  #This is where we define the trading strategy
  #Check moving averages at start of the day and use as the direciton signal
  #Enter trade at the start of the day and exit at the close
 
  #Lets print the name of whats running
  runName <- paste("MAVGa",mavga_period,".b",mavgb_period,sep="")
  print(paste("Running Strategy: ",runName))
 
  #Calculate the Open Close return
  returns <- (Cl(mktdata)/Op(mktdata))-1
 
  #Calculate the moving averages
  mavga <- SMA(Op(mktdata),n=mavga_period)
  mavgb <- SMA(Op(mktdata),n=mavgb_period)
 
  signal <- mavga / mavgb
  #If mavga > mavgb go long
  signal <- apply(signal,1,function (x) { if(is.na(x)){ return (0) } else { if(x>1){return (1)} else {return (-1)}}})
 
  tradingreturns <- signal * returns
  colnames(tradingreturns) <- runName
 
  return (tradingreturns)
}
 
RunIterativeStrategy <- function(mktdata){
  #This function will run the TradingStrategy
  #It will iterate over a given set of input variables
  #In this case we try lots of different periods for the moving average
  firstRun <- TRUE
    for(a in 1:10) {
        for(b in 1:10) {
 
          runResult <- TradingStrategy(mktdata,a,b)
 
          if(firstRun){
              firstRun <- FALSE
              results <- runResult
          } else {
              results <- cbind(results,runResult)
          }
        }
    }
 
   return(results)
}
 
CalculatePerformanceMetric <- function(returns,metric){
  #Get given some returns in columns
  #Apply the function metric to the data
 
  print (paste("Calculating Performance Metric:",metric))
 
  metricFunction <- match.fun(metric)
  metricData <- as.matrix(metricFunction(returns))
  #Some functions return the data the wrong way round
  #Hence cant label columns to need to check and transpose it
  if(nrow(metricData) == 1){
    metricData <- t(metricData)
  }
  colnames(metricData) <- metric
 
  return (metricData)
}
 
 
 
PerformanceTable <- function(returns){
  pMetric <- CalculatePerformanceMetric(returns,"colSums")
  pMetric <- cbind(pMetric,CalculatePerformanceMetric(returns,"SharpeRatio.annualized"))
  pMetric <- cbind(pMetric,CalculatePerformanceMetric(returns,"maxDrawdown"))
  colnames(pMetric) <- c("Profit","SharpeRatio","MaxDrawDown")
 
  print("Performance Table")
  print(pMetric)
  return (pMetric)
}
 
OrderPerformanceTable <- function(performanceTable,metric){
return (performanceTable[order(performanceTable[,metric],decreasing=TRUE),])
}
 
SelectTopNStrategies <- function(returns,performanceTable,metric,n){
#Metric is the name of the function to apply to the column to select the Top N
#n is the number of strategies to select
  pTab <- OrderPerformanceTable(performanceTable,metric)
 
  if(n > ncol(returns)){
     n <- ncol(returns)
  }
  strategyNames <- rownames(pTab)[1:n]
  topNMetrics <- returns[,strategyNames]
  return (topNMetrics)
}
 
FindOptimumStrategy <- function(trainingData){
  #Optimise the strategy
  trainingReturns <- RunIterativeStrategy(trainingData)
  pTab <- PerformanceTable(trainingReturns)
  toptrainingReturns <- SelectTopNStrategies(trainingReturns,pTab,"SharpeRatio",5)
  charts.PerformanceSummary(toptrainingReturns,main=paste(nameOfStrategy,"- Training"),geometric=FALSE)
  return (pTab)
}
 
pTab <- FindOptimumStrategy(trainingData) #pTab is the performance table of the various parameters tested
 
#Test out of sample
dev.new()
#Manually specify the parameter that we want to trade here, just because a strategy is at the top of
#pTab it might not be good (maybe due to overfit)
outOfSampleReturns <- TradingStrategy(testData,mavga_period=9,mavgb_period=6)
finalReturns <- cbind(outOfSampleReturns,indexReturns)
charts.PerformanceSummary(finalReturns,main=paste(nameOfStrategy,"- Out of Sample"),geometric=FALSE)
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/08/MovingAverage-backtest.jpegDigg ThisSubmit to redditShare via email

Parameter Optimisation & Backtesting – Part 1

When developing a strategy that uses a technical indicator, such as a moving average, it is often difficult to decide on what period to use with the indicator. Should it look at the last 10, 20, or n days? The simple solution is just to iterate over many different parameters and look for the parameter combination that optimises a performance metric perhaps the Sharpe Ratio or minimises the Max Drawdown or some combination of both of them.

This method must be approached in a sensible manner, essentially we are fitting the strategy to the data and must be careful not to over fit and capture spurious profits. It is likely that any parameter that is trained will vary over time, the strategy must be able to deal with changes to this parameter.

For example say we are tuning the variable A, and we find that the optimum solution is A=n in the training data (it is very profitable with high sharpe ratio). How well does the strategy perform when we set A=n+1, or A=n-1 do we still get good results? If you don’t it’s an indicator that maybe you have over fitted.

For more discussion on this see http://epchan.blogspot.co.uk/2010/01/method-for-optimizing-parameters.html

Part two will contain a script to perform an optimisation on a simple moving average strategy.

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Trading Strategy – VWAP Mean Reversion

This strategy is going to use the volume weighted average price (VWAP) as an indicator to trade mean version back to VWAP. Annualized Sharpe Ratio (Rf=0%) is 0.9016936.

This post is a response to http://gekkoquant.com/2012/07/29/trading-strategy-sp-vwap-trend-follow/ where there was a bug in the code indicating that VWAP wasn’t reverting (this didn’t sit well with me, or some of the people who commented). As always don’t take my word for anything, backtest the strategy yourself. One of the dangers of using R or Matlab is that it’s easy for forward bias to slip into your code. There are libraries such as Quantstrat for R which protect against this, but I’ve found them terribly slow to run.

Trade logic:

  • All conditions are checked at the close, and the trade held for one day from the close
  • If price/vwap > uLim go short
  • If price/vwap < lLim go long

Onto the code:

?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
 
#Trade logic - Look for mean reversion
#If price/vwap > uLim go SHORT
#If price/vwap < lLim go LONG
 
#Script parameters
symbol <- "^GSPC"     #Symbol
nlookback <- 3 #Number of days to lookback and calculate vwap
uLim <- 1.001  #If price/vwap > uLim enter a short trade
lLim <- 0.999  #If price/vwap < lLim enter a long trade
 
 
#Specify dates for downloading data
startDate = as.Date("2006-01-01") #Specify what date to get the prices from
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate)
mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE))))
mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo
 
#Calculate volume weighted average price
vwap <- VWAP(Cl(mktdata), Vo(mktdata), n=nlookback)
#Can calculate vwap like this, but it is slower
#vwap <- runSum(Cl(mktdata)*Vo(mktdata),nlookback)/runSum(Vo(mktdata),nlookback)
 
#Calulate the daily returns
dailyRet <- Delt(Cl(mktdata),k=1,type="arithmetic") #Daily Returns
 
#signal = price/vwap
signal <- Cl(mktdata) / vwap
signal[is.na(signal)] <- 1 #Setting to one means that no trade will occur for NA's
#Stripping NA's caused all manner of problems in a previous post
trade <- apply(signal,1, function(x) {if(x<lLim) { return (1) } else { if(x>uLim) { return(-1) } else { return (0) }}})
 
#Calculate the P&L
#The daily ret is DailyRet(T)=(Close(T)-Close(T-1))/Close(T-1)
#We enter the trade on day T so need the DailyRet(T+1) as our potential profit
#Hence the lag in the line below
strategyReturns <- trade * lag(dailyRet,-1)
strategyReturns <- na.omit(strategyReturns)
 
#### Performance Analysis ###
#Calculate returns for the index
indexRet <- dailyRet #Daily returns
colnames(indexRet) <- "IndexRet"
zooTradeVec <- cbind(as.zoo(strategyReturns),as.zoo(indexRet)) #Convert to zoo object
colnames(zooTradeVec) <- c(paste(symbol," VWAP Trade"),symbol)
zooTradeVec <- na.omit(zooTradeVec)
 
#Lets see how all the strategies faired against the index
dev.new()
charts.PerformanceSummary(zooTradeVec,main=paste("Performance of ", symbol, " VWAP Strategy"),geometric=FALSE)
 
 
#Lets calculate a table of montly returns by year and strategy
cat("Calander Returns - Note 13.5 means a return of 13.5%\n")
print(table.CalendarReturns(zooTradeVec))
#Calculate the sharpe ratio
cat("Sharpe Ratio")
print(SharpeRatio.annualized(zooTradeVec))
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/07/sp-500-VWAP-mean-revert.jpegDigg ThisSubmit to redditShare via email