Statistical Arbitrage – Trading a cointegrated pair

In my last post I demonstrated cointegration, a mathematical test to identify stationary pairs where the spread by definition must be mean reverting.

In this post I intend to show how to trade a cointegrated pair and will continue analysing Royal Dutch Shell A vs B shares (we know they’re cointegrated from my last post). Trading a cointegrated pair is straight forward, we know the mean and variance of the spread, we know that those values are constant. The entry point for a stat arb is to simply look for a large deviation away from the mean.

A basic strategy is:

  • If spread(t) >= Mean Spread + 2*Standard Deviation then go Short
  • If spread(t) <= Mean Spread – 2*Standard Deviation then go Long
There are many variations of this strategy
Moving average / moving standard deviation (this will be explored later):
  • If spread(t) >= nDay Moving Average + 2*nDay Rolling Standard deviation then go Short
  • If spread(t) <= nDay Moving Average – 2*nDay Rolling Standard deviation then go long
Wait for mean reversion:
  • If spread(t) <= Mean Spread + 2*Std AND spread(t-1)> Mean Spread + 2*Std
  • If spread(t) >= Mean Spread – 2*Std AND spread(t-1)< Mean Spread – 2*Std
  • Advantage is that we only trade when we see the mean reversion, where as the other models are hoping for mean reversion on a large deviation from the mean (is the spread blowing up?)
All the above strategies look to exit their position when the spread has reverted to the mean. Personally I wouldn’t trade any of the above as they don’t specify an exit strategy for adverse trades. Ie if there is a 6 standard deviation move in the spread is this an amazing trade opportunity? OR more likely did the spread just blow up.

This post will look at the moving average and rolling standard deviation model for Royal Dutch Shell A vs B shares, it will use the hedge ratio found in the last post.

Sharpe Ratio Shell A & B Stat Arb Shell A
Annualized Sharpe Ratio (Rf=0%):

Shell A&B Stat Arb 0.8224211

Shell A 0.166307

The stat arb has a Superior Sharpe ratio over simply investing in Shell A. At a first glance the sharpe ratio of 0.8 looks disappointing, however since the strategy spends most of it’s time out of the market it will have a low annualized sharpe ratio. To increase the sharpe ratio one can look at trading higher frequencies or have a portfolio pairs so that more time is spent in the market.

Onto the code:

?View Code RSPLUS
backtestStartDate = as.Date("2010-01-02") #Starting date for the backtest
title<-c("Royal Dutch Shell A vs B Shares")
### SECTION 1 - Download Data & Calculate Returns ###
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = backtestStartDate)
#We know this pair is cointegrated from the tutorial
#The tutorial found the hedge ratio to be 0.9653
stockPair <- list(
 a = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))))   #Stock A
,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep=""))))) #Stock B
,hedgeRatio = 0.9653
simulateTrading <- function(stockPair){
#Generate the spread
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
#Strategy is if the spread is greater than +/- nStd standard deviations of it's rolling 'lookback' day standard deviation
#Then go long or short accordingly
lookback <- 90 #look back 90 days
nStd <- 1.5 #Number of standard deviations from the mean to trigger a trade
movingAvg = rollmean(spread,lookback, na.pad=TRUE) #Moving average
movingStd = rollapply(spread,lookback,sd,align="right", na.pad=TRUE) #Moving standard deviation / bollinger bands
upperThreshold = movingAvg + nStd*movingStd
lowerThreshold = movingAvg - nStd*movingStd
aboveUpperBand <- spread>upperThreshold
belowLowerBand <- spread<lowerThreshold
aboveMAvg <- spread>movingAvg
belowMAvg <- spread<movingAvg
#The cappedCumSum function is where the magic happens
#Its a nice trick to avoid writing a while loop
#Hence since using vectorisation is faster than the while loop
#The function basically does a cumulative sum, but caps the sum to a min and max value
#It's used so that if we get many 'short sell triggers' it will only execute a maximum of 1 position
#Short position - Go short if spread is above upper threshold and go long if below the moving avg
#Note: shortPositionFunc only lets us GO short or close the position
cappedCumSum <- function(x, y,max_value,min_value) max(min(x + y, max_value), min_value)
shortPositionFunc <- function(x,y) { cappedCumSum(x,y,0,-1) }
longPositionFunc <- function(x,y) { cappedCumSum(x,y,1,0) }
shortPositions <- Reduce(shortPositionFunc,-1*aboveUpperBand+belowMAvg,accumulate=TRUE)
longPositions <- Reduce(longPositionFunc,-1*aboveMAvg+belowLowerBand,accumulate=TRUE)
positions = longPositions + shortPositions
title("Shell A vs B spread with bollinger bands")
lines(upperThreshold, col="red")
lines(lowerThreshold, col="red")
lines(spread, col="blue")
legend("topright", legend=c("Spread","Moving Average","Upper Band","Lower Band"), inset = .02,
lty=c(1,2,1,1),col=c("blue","red","red","red")) # gives the legend lines the correct color and width
#Calculate spread daily ret
 stockPair$a - stockPair$hedgeRatio*stockPair$b
aRet <- Delt(stockPair$a,k=1,type="arithmetic")
bRet <- Delt(stockPair$b,k=1,type="arithmetic")
dailyRet <- aRet - stockPair$hedgeRatio*bRet
dailyRet[] <- 0
tradingRet <- dailyRet * positions
    simulateTrading <- tradingRet
tradingRet <- simulateTrading(stockPair)
#### Performance Analysis ###
#Calculate returns for the index
indexRet <- Delt(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))),k=1,type="arithmetic") #Daily returns
indexRet <- as.zoo(indexRet)
tradingRetZoo <- indexRet
tradingRetZoo[,1] <- tradingRet
zooTradeVec <- as.zoo(cbind(tradingRetZoo,indexRet)) #Convert to zoo object
colnames(zooTradeVec) <- c("Shell A & B Stat Arb","Shell A")
zooTradeVec <- na.omit(zooTradeVec)
#Lets see how all the strategies faired against the index
charts.PerformanceSummary(zooTradeVec,main="Performance of Shell Statarb Strategy",geometric=FALSE)
cat("Sharpe Ratio")

22 thoughts on “Statistical Arbitrage – Trading a cointegrated pair

  1. Pingback: Statistical Arbitrage – Trading a cointegrated pair « European Edges

    • Hi Gekko,
      it also means that when identified the maximum divergence i can take position in derivatives like options?
      -selling ATM Call option on first stock
      -buy Call option on the second one

      or with a BacKSpreadCall on the first and a BackSpreadPut on the second so I can set the protections and I can roll them if they go out control…
      The short positions should be moneyness ATM or lightly OTM in my opinion.
      What do yo think about?
      many thanx


  2. Hi,
    Did you tried using Johansen’s testing approach in order to perform a more rigorous testing of cointegration? What do you think about combining Engle-Granger with Johansen?

  3. The spread in the above does not oscillate around it mean ,ideally,a cointegrated pair should trade sideways not in a trending manner as shown above….your write-up was perfect on proper cointegration you demonstrated. but this spread is not a perfect spread.

    • I 100% agree with you.

      However for practical purposes as long as the mean reversion happens faster than the mean changes then you’ll do well.

      I guess that’s something I’ve missed, how to quantify the half life/reversion speed.

      Please note that in the above demo the look back period is 90days. This is fairly short. Choosing 200 days will result in a mean that is less responsive / changes direction. It will most likely increase the size of the standard deviation bands and result in less trades per year. This usually results in a lower Sharpe ratio.

  4. Hello Gekko,

    I do some changes in your programme to calculate the bollinger bands and I wanna know why you’re put the Standard deviation to the right? (movingStd = rollapply(spread,lookback,sd,align=”right”, na.pad=TRUE))

    • Quick example

      dat <- c(1,2,3,4,5,6,7,8,9,10) rollapply(dat,3,sum,align='right', na.pad=TRUE) # [1] NA NA 6 9 12 15 18 21 24 27 rollapply(dat,3,sum,align='left', na.pad=TRUE) # [1] 6 9 12 15 18 21 24 27 NA NA rollapply(dat,3,sum,align='center', na.pad=TRUE) # [1] NA 6 9 12 15 18 21 24 27 NA In rollapply we set a "lookback" window size (in our case 3), the align attribute tells the function what side of the window sits on the current element in the vector. say the current element is 5 right align sets the right of the window on 5 1,2,[3,4,5],6,7,8,9 left align sets the left of the window on 5 1,2,3,4,[5,6,7],8,9 center align sets the center of the window on 5 1,2,3,[4,5,6],7,8,9 We set the align value to right, to prevent look forward in the code.

      • OK thank you for answering!

        Your blog give me the chance to implement and build more quickly my stat arb strategy.

        I am going to test different models for statistical arbitrage. I keep all the visitors in the loop!

        Thanks again!

        • Hello again,

          In your program, the martingale effect is not here. How can I add this effect?

          I am running my iwn backtests with differents programs (Excel, R et ProRealTime (a french platform)) and in order to do some comparison, I need to add the martingale effect.

      • Thanks for the clarification. By the same argument, rollmean has to have the same: rollmean(spread,lookback, na.pad=TRUE, align=’right’)
        With this new modification the Sharpe ratio drops dramatically ..

  5. Hi,

    Great stuff!! I think there are two bugs in your code, though. First one is in calculation of moving average. You forgot to set align parameter to “right” (like you do for standard deviation). Function uses default “center” and your data – spread and moving average are not aligned. You can see this from the plot as well. Moving average ends 45 days before the spread. Second bug is in calculation of trading returns. I think you should take return from the next day as we enter the position at the closing price.


  6. Pingback: Test pour la cointégration, test de Dickey Fuller augmenté… » Traders pro | Traders pro

  7. Thanks for your elegant code. I noticed that your line of code:
    shortPositions <- Reduce(shortPositionFunc,-1*aboveUpperBand+belowMAvg,accumulate=TRUE)
    is meant to apply the function shortPositionFunc to (-1*aboveUpperBand+belowMAvg).
    However, the function shortPositionFunc takes two arguments x and y.
    Is there any typo in the code?

    Thank you for your clarification!

  8. Thanks Gekko for the backtesting code. It is very useful. Couple of comments below:
    1) Another reader has already commented about this above. movingAvg needs to be amended by adding align=”right” in order to have the first moving avg number on day 90:
    movingAvg = rollmean(spread,lookback,align=”right”, na.pad=TRUE)
    2) since we enter trades at end of day, the return on trade date shouldn’t count. we can simply shift every element in the “positions” vector down by using the “shift” function in the taRifx library.
    Also, I don’t believe daily return is (aRet – stockPair$hedgeRatio*bRet). Imagine if you had a large hedge ratio, i.e. if stock A is priced at $100 and stock B is priced at $10, then the hedgeRatio would be in the neighborhood of 10. Since aRet and bRet are in % terms, the formula won’t work. Daily return should be aRet – bRet * (ratio between dollar neutral ratio vs hedge ratio).
    See amendments:
    #Calculate spread daily ret
    aRet <- Delt(t[,1],k=1,type="arithmetic")
    bRet <- Delt(t[,2],k=1,type="arithmetic")
    dollarNeutralRatio <- stockPair$a/stockPair$b
    hedgeRatioOVERdollarNeutralRatio <- stockPair$hedgeRatio/shift(dollarNeutralRatio,-1)
    dailyRet <- aRet – bRet*hedgeRatioOVERdollarNeutralRatio
    dailyRet[] <- 0
    tradingRet <- dailyRet * shift(positions,-1)
    simulateTrading <- tradingRet

  9. Hi Gekko,

    I am looking for new strategies in equity pair trading that improve the standard cointegration approach (for instance I started looking into the pair trading with copulas, which still seems an “unstable” alternative to cointegration). Do you have any new paper to suggest me? Thank you very much and congrats for the great blog.


  10. HI,
    i am a bit confused in this step
    tradingRet <- simulateTrading(stockPair)

    when i plotted the longPositions and ShortPositions along with the spread, bands and moving average lines found then there are consecutive long signals and short signals. According to my understanding

    longPostions <- if spread is below lowerband
    longExit <- if spread is above movAvg while long

    shortPostions <- if spread is above upperband
    shortExit <- if spread is below movAvg while short

    is this same thing your code is doing. Please help me understand this part.

  11. Hi Gekko, I read the books of EP Chan that talks about this topic and I a little bit confused about mean reservion. When two assets ara cointegrated we are supposing that they will come back to their mean, but their moving average or their total mean in a fixed period? I’m giving better results using static parameters than using bollinger bands. I will show you an image with my doubt. Could you write another article of mean reversion! Thanks for all

  12. Hi Gekko. Great Code. Could you closer explain an idea behind this cappedCumSum function ? I do not understand the moment when you are specifing two input variables, but in Reduce() function is only one parameter, – is it because of 0?

  13. There is a mistake. Your algorithm looks in the future, the problem in rollmean function. Algorithm using moving average from future days to close position.

  14. Pingback: Cointegration | QuantSt

Leave a Reply

Your email address will not be published. Required fields are marked *