The term statistical arbitrage (stat-arb) encompasses a wide variety of investment strategies that typically aim to exploit a statistical equilibrium relationship between two or more securities. The general principal is that any divergence from the equilibrium is a temporary effect and that bets should be placed on the process reverting to it’s equilibrium.

The major caveat of stat-arb /pairs trading type strategies is that as the divergence from equilibrium grows the trade becomes more desirable, however at some point the divergence will grow so large that one has to concede that the equilibrium relationship no longer exists / the model is broken. Naturally it is desirable to estimate the power of the statistical tools used to determine these relationships and to asses the duration of any observed equilibrium out of sample.

This post will investigate the power of the statistical tests in relation to pairs-trading for the following statistical tests ADF, BVR, HURST, PP, PGFF,

JO-T and JO-E.

The general principal is that for two stocks and they form a stationary, and by definition, mean reverting pair if the following equation holds:

If is between and then and are co-integrated,

is the coefficient of mean reversion. A statistical test must be performed to check if , this is known as a unit root test. If the series contains a unit root it isn’t suitable for pairs trading. There are multiple unit root tests, each running a different test on the residual process. One might be tempted to estimate the AR(1) residual model and check for using the conventional linear regression method calculating the standard t-ratio. However it was shown by Dicky and Fuller [1979] that the t-ratio does not follow the t-distribution, hence non-standard significance tests are needed known as unit root tests.

As with every model there are trade off when determining the training window size, too long a window and the model may contain irrelevant data and be slow to adjust to recent events, too short a window and the model only responds to recent events and forgets about past events quickly. This trade off is problematic in co-integration testing, it was demonstrated in *Clegg, M., January 2014. On the persistence of cointegration in pairs trading* that for a fixed window size the power of most unit root tests decrease as as

tends to 1 from below, for 250 data points with

the barrage of co-integration tests only detect co-integration less than 25% of the time!

Intuitively this makes sense, the slower the process is to revert the more data points will be needed to see the reversion. It is somewhat undesirable that the power of the unit root tests vary depending upon the properties of the underlying process, however it is not required for successful pairs trading that all co-integrated pairs are identified as such the varying power property of unit root tests is largely irrelevant.

What is more interesting is the false positive rate, so pairs identified as mean reverting when they are not, and how persistent the results are.

**Accuracy Test**

Generate 1000 co-integrated time series with and uniformly distributed in the set , and in the set according to Clegg this is similar to the types of stock pairs encountered in reality. Repeat this for different lengths of time series and test to see how many time series get correctly classified as co-integrated / mean reverting using various tests for different pValues.

In the majority of tests PP and PGFF outperform the other methods. When the process was strongly reverting with less than 0.85 the tests PP, PGFF, JO-E and JO-T correctly identified the process as co-integrated / mean reverting more than 75% of the time at pValue 0.01. For some of the weaker reverting pairs with greater than 0.95 the performance of the statistical tests is woeful with just 250 data points.

It is worth bearing in mind that 250 data points is approximatlythe number of trading days in a year, and perhaps gives an indication of how much historical data is needed in a pairs trading strategy.

**False Positive Tests**

Follow the same procedure outlined for the accuracy test but chose in the set to generate time series that isn’t co-integrated. See what percentage of the paths get falsely reported as co-integrated / mean reverting.

I’ve never seen this chart in a text book and was surprised at the results, both HURST and BVR report more false positives as increases! The more the process explodes the more likely the test was to show a false positive!

Thankfully the other tests behave in a reasonable fashion with few false positives.

**Onto the code:**

^{?}View Code RSPLUS

library("egcm") #Has lots of co-integration tests library("ggplot2") #Used for plotting data set.seed(1) #Makes the random number generator start from the same point for everyone so hopefully creates the same results assertTrueFunc <- function(testExpression, messageIfNotTrue){ #Utility method to ensure parameters into other functions are correct if(!testExpression){ cat(paste("Assertion Fail:",messageIfNotTrue,"\n")) stop() } } createTimeSeriesForRho <- function(rho,nDataPoint){ #The library egcm contains a function rcoint that generates co-integrated data: # function (n, alpha = runif(1, -10, 10), beta = runif(1, -10, 10), rho = runif(1, 0, 1), sd_eps = 1, sd_delta = 1, X0 = 0, Y0 = 0) return (rcoint(nDataPoint,rho=rho)) } createTimeSeriesForRhoAndTestForCointegration <- function(rhos, pValues, testMethods, nDataPoints,nMonteCarlo){ combinations <- length(rhos)*length(pValues)*length(testMethods)*length(nDataPoints)*nMonteCarlo results <- data.frame(Rho=numeric(combinations),PValue=numeric(combinations),NDataPoint=numeric(combinations),Test=character(combinations),IsCointegrated=numeric(combinations),stringsAsFactors=F) counter <- 1 for(nDataPoint in nDataPoints){ for(rho in rhos){ print(paste("Progress:", round(100*counter/combinations,2),"%")) for(nMc in seq(1,nMonteCarlo)){ simData <- createTimeSeriesForRho(rho,nDataPoint) for(testMethod in testMethods){ for(pValue in pValues){ isCointegrated <- 0 try({ e <- egcm(simData,p.value=pValue,urtest=testMethod) isCointegrated<-is.cointegrated(e)*1 },silent=TRUE) results[counter,]<-c(rho,pValue,nDataPoint,testMethod,isCointegrated) counter <- counter + 1 } } } } } return (results) } testPowerForCointegratedSeries <- function(rhos, pValues, testMethods, nDataPoints,nMonteCarlo){ assertTrueFunc(max(rhos)<1,"Rho must be less than 1") assertTrueFunc(min(rhos)>-1,"Rho must be greater than -1") assertTrueFunc(max(pValues)<1,"pValue must be less than 1") assertTrueFunc(min(pValues)>0,"pValue must be greater than 0") assertTrueFunc(min(nDataPoints)>0,"nDataPoint must be greater than 0") assertTrueFunc(nMonteCarlo>0,"nMonteCarlo must be greater than 0") results <- createTimeSeriesForRhoAndTestForCointegration(rhos,pValues,testMethods,nDataPoints,nMonteCarlo) results <- aggregate(IsCointegrated~Rho+PValue+NDataPoint+Test,data=results,FUN=function(x){100*mean(as.numeric(x))}) colnames(results) <- c("Rho","PValue","NDataPoint","Test","AccuracyPct") return (results) } testFalsePositiveRateForNoneCointegratedSeries <- function(rhos, pValues, testMethods, nDataPoints,nMonteCarlo){ assertTrueFunc(min(rhos)>1,"Rho must be greater than 1") assertTrueFunc(max(pValues)<1,"pValue must be less than 1") assertTrueFunc(min(pValues)>0,"pValue must be greater than 0") assertTrueFunc(min(nDataPoints)>0,"nDataPoint must be greater than 0") assertTrueFunc(nMonteCarlo>0,"nMonteCarlo must be greater than 0") results <- createTimeSeriesForRhoAndTestForCointegration(rhos,pValues,testMethods,nDataPoints,nMonteCarlo) results <- aggregate(IsCointegrated~Rho+PValue+NDataPoint+Test,data=results,FUN=function(x){100*mean(as.numeric(x))}) colnames(results) <- c("Rho","PValue","NDataPoint","Test","FalsePositiveRatePct") return (results) } nMC <- 1000 rhosCoint <- seq(0.8,0.999,0.025) rhosFalsePositive <- seq(1.001,1.2,0.025) nDataPoints <- c(250,500,750,1000) pValues <- c(0.01,0.05, 0.1) urTests <- c("adf","bvr","hurst","jo-e","jo-t","pp","pgff") tt <- testPowerForCointegratedSeries(rhosCoint,pValues,urTests,nDataPoints,nMC) dev.new() tt$NDataPoints <- factor(tt$NDataPoint,levels=nDataPoints) #Creating a factor to get the chart order in same order as nDataPoints variable ggplot(tt, aes(Rho,AccuracyPct,color=Test))+ geom_line(aes(group=Test))+ facet_grid(PValue ~ NDataPoints,labeller = label_both)+ ggtitle(paste("Co-integration test accuracy for",nMC,"time series simulations"))+ theme(plot.title = element_text(lineheight=.8, face="bold")) qq <- testFalsePositiveRateForNoneCointegratedSeries(rhosFalsePositive,pValues,urTests,nDataPoints,nMC) dev.new() qq$NDataPoints <- factor(qq$NDataPoint,levels=nDataPoints) #Creating a factor to get the chart order in same order as nDataPoints variable ggplot(qq, aes(Rho,FalsePositiveRatePct,color=Test))+ geom_line(aes(group=Test))+ facet_grid(PValue ~ NDataPoints,labeller = label_both)+ ggtitle(paste("Co-integration test false positive rate for",nMC,"time series simulations"))+ theme(plot.title = element_text(lineheight=.8, face="bold")) |