In my last post “Statistical Arbitrage Correlation vs Cointegration“, we discussed what statistical arbitrage is and what property between two pairs we aim to exploit. The mathematics essentially showed that if you go long stock A and short stock B with some appropriate hedging factor to cancel out the drift/growth terms in the Brownian motion equation then you are left with a stationary signal which is the spread between the two stocks. The maths also showed that in expectation the daily change in the spread is zero () and hence any deviation from this presents the opportunity for a trade.
It was assumed that the stock growth terms and
are constant (or drifting slowly over time both at the same rate) or in other words the hedge ratio is constant. This post will detail how to find the hedge ratio and present the Augmented Dicky Fuller test which can be used to identify with a certain level of confidence if the spread is stationary and hence cointegrated.
Determining the hedge factor
set the spread to
It is required to find the hedge factor, n, to satisfy the above equation. The hedge factor is easily found by regressing the . Notice that it is the prices being regressed and not the daily returns.
Examining the residuals
Once the hedge factor has been found the residuals of the regression must be analysed. The residual is how much the spread has changed for a given day, the whole idea of the regression was to identify the hedge factor that keeps the daily change of the spread as close to zero as possible (we want the residuals to be zero).
- If the residuals contain a trend then this implies that the daily change of the spread has a net direction and will cause the spread to consistently widen or contract.
- If the residuals contain no trend then this implies that the daily change of the spread will oscillate around zero (is stationary).
What residuals to analyse is open to debate, you may want to take another data set from a different point in time and apply the hedge factor you calculated in the above regression to this new set and analyse the residuals, this helps to identify an over fitting during the regression.
Dicky Fuller Test
For an AR(1) process where
and
what value of
results in a stationary signal?
Take expectations, already know and
- If
<
then as
then
- If
>=
then
grows with time, hence is not stationary
The beta or Hedge Ratio is: 0.965378555498888
Augmented Dickey-Fuller Test
Dickey-Fuller = -3.6192, Lag order = 0, p-value = 0.03084
alternative hypothesis: stationary
The low pValue indicates that this pair is cointegrated.
The beta or Hedge Ratio is: 0.645507426925485
Augmented Dickey-Fuller Test
Dickey-Fuller = -1.6801, Lag order = 0, p-value = 0.7137
alternative hypothesis: stationary
library("quantmod") library("PerformanceAnalytics") library("fUnitRoots") library("tseries") #Specify dates for downloading data, training models and running simulation hedgeTrainingStartDate = as.Date("2008-01-01") #Start date for training the hedge ratio hedgeTrainingEndDate = as.Date("2010-01-01") #End date for training the hedge ratio symbolLst<-c("RDS-A","RDS-B") title<-c("Royal Dutch Shell A vs B Shares") symbolLst<-c("RBS.L","BARC.L") title<-c("Royal Bank of Scotland vs Barclays") ### SECTION 1 - Download Data & Calculate Returns ### #Download the data symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbolLst, env = symbolData, src = "yahoo", from = hedgeTrainingStartDate, to=hedgeTrainingEndDate) stockPair <- list( a = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep=""))))) #Stock A ,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep=""))))) #Stock B ,name=title) testForCointegration <- function(stockPairs){ #Pass in a pair of stocks and do the necessary checks to see if it is cointegrated #Plot the pairs dev.new() par(mfrow=c(2,2)) print(c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b))))) plot(stockPairs$a, main=stockPairs$name, ylab="Price", type="l", col="red",ylim=c((0.99*min(rbind(stockPairs$a,stockPairs$b))),(1.01*max(rbind(stockPairs$a,stockPairs$b))))) lines(stockPairs$b, col="blue") print(length(stockPairs$a)) print(length(stockPairs$b)) #Step 1: Calculate the daily returns dailyRet.a <- na.omit((Delt(stockPairs$a,type="arithmetic"))) dailyRet.b <- na.omit((Delt(stockPairs$b,type="arithmetic"))) dailyRet.a <- dailyRet.a[is.finite(dailyRet.a)] #Strip out any Infs (first ret is Inf) dailyRet.b <- dailyRet.b[is.finite(dailyRet.b)] print(length(dailyRet.a)) print(length(dailyRet.b)) #Step 2: Regress the daily returns onto each other #Regression finds BETA and C in the linear regression retA = BETA * retB + C regression <- lm(dailyRet.a ~ dailyRet.b + 0) beta <- coef(regression)[1] print(paste("The beta or Hedge Ratio is: ",beta,sep="")) plot(x=dailyRet.b,y=dailyRet.a,type="p",main="Regression of RETURNS for Stock A & B") #Plot the daily returns lines(x=dailyRet.b,y=(dailyRet.b*beta),col="blue")#Plot in linear line we used in the regression #Step 3: Use the regression co-efficients to generate the spread spread <- stockPairs$a - beta*stockPairs$b #Could actually just use the residual form the regression its the same thing spreadRet <- Delt(spread,type="arithmetic") spreadRet <- na.omit(spreadRet) #spreadRet[!is.na(spreadRet)] plot((spreadRet), type="l",main="Spread Returns") #Plot the cumulative sum of the spread plot(spread, type="l",main="Spread Actual") #Plot the cumulative sum of the spread #For a cointegrated spread the cumsum should not deviate very far from 0 #For a none-cointegrated spread the cumsum will likely show some trending characteristics #Step 4: Use the ADF to test if the spread is stationary #can use tSeries library adfResults <- adf.test((spread),k=0,alternative="stationary") print(adfResults) if(adfResults$p.value <= 0.05){ print(paste("The spread is likely Cointegrated with a pvalue of ",adfResults$p.value,sep="")) } else { print(paste("The spread is likely NOT Cointegrated with a pvalue of ",adfResults$p.value,sep="")) } } testForCointegration(stockPair) |


[...] Post navigation ← Previous [...]
Great work, I had created something to do the exact same thing but yours is much more elegant and I am quite the R noob. I was wondering if there was a way to adjust the input to take any CSV or txt file of closing prices. I tried to modify it using
read.csv(choose.files(), stringsAsFactors=F)
but ran into a lot of problems I didn’t understand. I have historical databases in csv and would like to use your test against it. Anyway great work.
Hi Derek,
Im a bit confused by saying that the prices are regressed in hedge ratio calculation. The paragraph above says it is prices, the code says it is returns.??
I’m currently travelling so can’t give a full answer but it appears that the regression co-efficient used to make the spread is wrong.
Let’s use intuition to work out what should happen. I have two stocks, stock A price 100, stock B price 500. These are special stocks if A changes by % then B will change by the same %.
The problem is that in $ terms B changes by approx 5 times and much as A due to being of different prices.
Hence we regress prices of stock A with B to figure out for each long of stock A how many of stock B do I need to short so that the $ loss matches the gain of A.
So whilst I’ve plotted returns and drawn the line of best fit, I should also regress prices.