**Method One – Single HMM Each State is a Regime**

The credit for this section must go to the fantastic Systematic Investor blog http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/. The code is well commented and should be self explanatory. Essentially two markets regimes (bull and bear) are simulated, a 2 state HMM is then trained on the data. The forward backward algorithm is then used to calculate the probability of being in a given state at any given time.

**Method Two – Multiple HMMs with multiple states – Each HMM a regime**

Three market regimes are simulated; bull, bear and a sideways market. Three different 2 stage HMM models are trained on each regime. Model 1 is the HMM for the bull market, Model 2 is the HMM for the bear market, and Model 3 is the HMM for a side ways market. A rolling window of 50 days worth of data is passed into each HMM and a log likelihood score produced. The higher the log likelihood the more likely it is that the model generated the observed data.

As can be seen in the above chart, the log likelihood is fairly decent for determining the difference between the bull and bear markets. Sadly the side ways model seems very likely in both the bull and bear cases, it’s log likelihood is fairly stable and doesn’t change per regime.

**Code for method 1:**

library('RHmm') #Load HMM package #Code based upon http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/ bullMarketOne = rnorm( 100, 0.1/365, 0.05/sqrt(365) ) bearMarket = rnorm( 100, -0.2/365, 0.15/sqrt(365)) bullMarketTwo = rnorm( 100, 0.15/365, 0.07/sqrt(365) ) true.states = c(rep(1,100),rep(2,100),rep(1,100)) returns = c( bullMarketOne, bearMarket, bullMarketTwo ) y=returns ResFit = HMMFit(y, nStates=2) #Fit a HMM with 2 states to the data VitPath = viterbi(ResFit, y) #Use the viterbi algorithm to find the most likely state path (of the training data) fb = forwardBackward(ResFit, y) #Forward-backward procedure, compute probabilities # Plot probabilities and implied states layout(1:3) plot(cumsum(returns),ylab="Cumulative Market Return",type="l", main="Fake Market Data") plot(VitPath$states, type='s', main='Implied States', xlab='', ylab='State') matplot(fb$Gamma, type='l', main='Smoothed Probabilities', ylab='Probability') legend(x='topright', c('Bear Market - State 2','Bull Market - State 1'), fill=1:2, bty='n') |

**Code for method 2:**

library('RHmm') #Load HMM package library('zoo') #HMM model 1 (high vol and low vol upwards trend) model1ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.1,if(isHighVol){0.15}else{0.02})) } bullLowVol = model1ReturnsFunc(F) bullHighVol = model1ReturnsFunc(T) model1TrainingReturns = c(bullLowVol, bullHighVol) Model1Fit = HMMFit(model1TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 2 (high vol and low vol downwards trend) model2ReturnsFunc <- function(isHighVol){ return(rnorm( 100, -0.1,if(isHighVol){0.15}else{0.02})) } bearLowVol = model2ReturnsFunc(F) bearHighVol = model2ReturnsFunc(T) model2TrainingReturns = c(bearLowVol, bearHighVol) Model2Fit = HMMFit(model2TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 3 (sideways market) model3ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.0,if(isHighVol){0.16}else{0.08})) } sidewaysLowVol = model3ReturnsFunc(F) sidewaysHighVol = model3ReturnsFunc(T) model3TrainingReturns = c(sidewaysLowVol, sidewaysHighVol) Model3Fit = HMMFit(model3TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data generateDataFunc <- function(modelSequence,highVolSequence){ results <- c() if(length(modelSequence) != length(highVolSequence)){ print("Model Sequence and Vol Sequence must be the same length"); return(NULL)} for(i in 1:length(modelSequence)){ #Bit rubish having all these IFs here but its easy to understand for novice R users if(modelSequence[i] == 1){ results <- c(results,model1ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 2){ results <- c(results,model2ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 3){ results <- c(results,model3ReturnsFunc(highVolSequence[i])) } } return(results) } #Create some out of sample data actualModelSequence <- c(1,1,1,3,2,2,1) actualVolRegime <- c(T,T,T,T,T,T,T) outOfSampleData <- generateDataFunc(actualModelSequence,actualVolRegime) #Will take 50 days of data and calculate the rolling log likelihood for each HMM model model1Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model1Fit,x)$LLH}) model2Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model2Fit,x)$LLH}) model3Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model3Fit,x)$LLH}) layout(1:3) plot(cumsum(outOfSampleData),main="Fake Market Data",ylab="Cumulative Returns",type="l") plot(model1Likelihood,type="l",ylab="Log Likelihood of Each Model",main="Log Likelihood for each HMM Model") lines(model2Likelihood,type="l",col="red") lines(model3Likelihood,type="l",col="blue") plot(rep((actualModelSequence==3)*3,each=100),col="blue",type="o",ylim=c(0.8,3.1),ylab="Actual MODEL Number",main="Actual MODEL Sequence") lines(rep((actualModelSequence==2)*2,each=100),col="red",type="o") lines(rep((actualModelSequence==1)*1,each=100),col="black",type="o") legend(x='topright', c('Model 1 - Bull Mkt','Model 2 - Bear Mkt','Model 3 - Side ways Mkt'), col=c("black","red","blue"), bty='n',lty=c(1,1,1)) |

The forward algorithm calculates the likelihood of the data given the model **over all possible** state sequences.

The Viterbi algorithm calculates the likelihood of the data given the model **over the single most likely** state sequence.

The forward algorithm allows for efficient calculation of the likelihood function .

The forward variable is the likelihood of the HMM producing all the observations up to time t

and occupying state at time , it is defined as:

It is calculated recursively by calculating the forward variable for time being in state and then calculating the probability of moving to state at time :

Where is the probability of a jump from state to state , and is the probability of generating feature vector from state .

for and for

For

…… For

……………..

The forward algorithm calculated by summing over all state sequences, it is sometimes preferable to approximate which used all state sequences with which will use the single most likely state sequence. This is known as the Viterbi algorithm, the algorithm finds the most likely state sequence.

The probability of the best partial path of length through the HMM ended at state is defined as: . Where is the best partial path / state sequence.

As with the forward variable

can be calculated recursively

for and for

For

…… For

……………..

…………….. store the preceding node

store the preceding node

The most likely path is found by following the preceding node information backwards that is stored in

The direct calculation of will most likely cause arithmetic underflow errors. The probabilities can become so small that the computer is unable to calculate them correctly. You should instead calculate the log likelihood e.g

]]>This post will develop a general framework for classification tasks using hidden markov models. The tutorial series will cover how to build and train a hidden markov models in R. Initially the maths will be explained, then an example in R provided and then an application on financial data will be explored.

A set of features are derived from data set and a class identified by finding the most likely class given the data

However is unknown, so Bayes’ rule must be used.

Since the maximisation does not depend upon we can ignore it. The terms and , are the likelihood of the data given the class and prior probability of a class respective, both terms are defined by a model. The feature model will be described by the hidden markov model (HMM), each class will have it’s own HMM.

First we need to generate a set of features from the raw data . I will skip this step for now because it is specific to the application of your hidden markov model, for example in finance may be various stock prices and could be a set of technical indicators / volatility calculations applied to the data . HMM’s are popular in speech recognition and typically is a vector describing the characteristics of the frequency spectrum of the speech.

Secondly the feature vector must then be assigned a class from the HMM. This is done the via maximum likelihood estimation, the HMM is a generative model, choose the class that is most likely to have generated the feature vector .

For finance the class might be a market regime (trending/mean reverting) or in speech recognition the class is a word.

The number of states in the HMM

The probability of transitioning from state i to state j

The probability of generating feature vector upon entering state j (provided j is not the entry or exit state)

The HMM may be written as

the observed feature vectors

the specified state sequence

The joint probability is the probability of jumping from one state to the next multiplied by the prob of generating the feature vector in that state:

Where is always the entry state 1, and is always the exit state N.

In the above joint probability calculation we have assumed a state sequence . However this is a latent variable, we do not know it, it is hidden (hence the name HIDDEN markov model)! However if we sum over all possible state sequences we can marginalise it out.

This can be problematic due to the number of possible state sequences (especially in a real-time application), luckily algorithms exist to effectively perform the calculation without needing to explore every state sequence. One such algorithm is the forward algorithm.

This is the output distribution for a given state j. The distribution can be anything you like however it should hopefully match the distribution of the data at state j, and it must be mathematically tractable. The most natural choice at this stage is to assume can be described by the multivariate Gaussian. As a word of caution if the elements of your feature vector are highly correlated then , the covariance matrix, has a lot of parameters to measure. See if you can collapse

to a diagonal matrix.

E.g

We already know how to fit a normal distribution, the MLE for is the mean, and the covariance of the feature vector. However we must only calculate the mean and covariance on feature vectors that came from state j, this is known as Viterbi Segmentation. Viterbi Segmentation means there is a hard assignment between feature vector and the state that generated it, an alternative method is called Balm-Welch which probabilistically assigns feature vectors to multiple states.

State j generated observations starting at

It is not known in advance which state generated which observation vector, fortunately there is an algorithm called the Viterbi algorithm to approximately solve this problem.

The forward algorithm for efficient calculation of and the Viterbi algorithm will be explored in my next post.

]]>I will show how to download Nonfarms Payroll numbers, although it is very easy to modify the code below to download GDP, CPI etc…

The top chart shows non-farms plotted with the S&P 500. It is interesting to note that in the % change charts there is a crash in the market around mid 08 this is then followed by a crash in the non-farms numbers. Although not a very rigorous analysis it looks like non-farms numbers LAG the market.

The second chart regress the % change in payrolls with the % change in the S&P for the month. It is seen in the scatter plot that there is no clear relationship between payroll change and S&P change.

The second regression on the right takes this months payroll change and regress it against next months S&P return, ie try and see if the numbers from this month can tell us anything about the return in the S&P for the coming month. Payrolls don’t look very predictive at the 1month time horizon. I think a more interesting analysis would look at payrolls on the market over 10,20,30min horizons intraday.

Onto the code:

library("quantmod") #To see what the datasets are available from the FED goto the link below #http://research.stlouisfed.org/fred2/ economicData <- new.env() #Make a new environment for quantmod to store data in startDate = as.Date("2000-01-01") #Specify what date to get the prices from getSymbols("PAYEMS",src="FRED",env=economicData,from=startDate) #Payems is non-farms payrolls getSymbols("^GSPC",env=economicData,from=startDate) #S&P 500 economicData$PAYEMS <- window(economicData$PAYEMS,start=startDate) #Window our data (FRED ignores the from parameter above) :@ economicData$GSPC <- window(economicData$GSPC,start=startDate) #Window our data mergedData <- merge(economicData$PAYEMS,Cl(economicData$GSPC),all=FALSE) #join the two datasets based on their SHARED dates #Calculate the % diff mergedPercDiff<- mergedData mergedPercDiff$PAYEMS <- diff(mergedData$PAYEMS)/Lag(mergedData$PAYEMS) mergedPercDiff$GSPC.Close <- diff(mergedData$GSPC.Close)/Lag(mergedData$GSPC.Close) dev.new() par(mfrow=c(2,2)) plot(mergedData$PAYEMS, main="Non-Farm Payrolls",ylab="Thousands of Persons") plot(mergedPercDiff$PAYEMS, main="Non-Farm Payrolls", ylab="% Change") plot(mergedData$GSPC.Close, main="S&P 500 Close",ylab="Close Price") plot(mergedPercDiff$GSPC.Close, main="&P 500 Close",ylab="% Change") #Function to plot data and add regression line doPlot <- function(x,y,title,xlabel,ylabel){ x<-as.vector(x) y<-as.vector(y) regression <- lm(y~x) print(regression) plot(y~x,main=title, xlab=xlabel,ylab=ylabel) abline(regression,col="red",lwd=1.5) legend("bottomleft",paste("y=",regression$coefficients[2],"x+",regression$coefficients[1],sep=""),bg="lightblue") } dev.new() par(mfrow=c(1,2)) doPlot(mergedPercDiff$PAYEMS,mergedPercDiff$GSPC.Close,"Regress Non-Farms Payroll with S&P Monthly Returns","Non-Farms Monthly % Change","S&P 500 Monthly % Change") doPlot(Lag(mergedPercDiff$PAYEMS),mergedPercDiff$GSPC.Close,"Regress Non-Farms Payroll with NEXT MONTH'S S&P Monthly Return","Non-Farms Monthly % Change","S&P 500 Monthly % Change") |

In the last post I speculated that the poor performance of the algo was potentially down to trying to compare the current day and find the most similar days in history, rather we should try to take the last N days and find the most similar period in history.

The code below does exactly that use windowSize to control how big the periods are that we compare.

**Sharpe ratio: -0.591864**

The performance is still poor, perhaps the similarity measure i’m using is rubbish. Maybe using implied vol would be good for identifying market regimes and should be used in the similarity measure.

Unfortunately this algo is very very slow (and gets worse over time since we have more history to look back over), this makes it difficult / time consuming to optimise variables.

Onto the code:

library("quantmod") library("PerformanceAnalytics") library("zoo") #INPUTS marketSymbol <- "^GSPC" nFastLookback <- 30 #The fast signal lookback used in linear regression curve nSlowLookback <- 50 #The slow signal lookback used in linear regression curve nFastVolLookback <- 30 #The fast signal lookback used to calculate the stdev nSlowVolLookback <- 50 #The slow signal lookback used calculate the stdev nFastRSILookback <- 30 #The fast signal lookback used to calculate the stdev nSlowRSILookback <- 50 #The slow signal lookback used calculate the stdev kNearestGroupSize <- 30 #How many neighbours to use normalisedStrengthVolWeight <- 2 #Make some signals more important than others in the MSE normalisedStrengthRegressionWeight <- 1 fastRSICurveWeight <- 2 slowRSICurveWeight <- 0.8 windowSize <- 10 #Compare the last 10 days with the most similar 10 day period in history #Specify dates for downloading data, training models and running simulation startDate = as.Date("2006-08-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in stockCleanNameFunc <- function(name){ return(sub("^","",name,fixed=TRUE)) } getSymbols(marketSymbol, env = symbolData, src = "yahoo", from = startDate) cleanName <- stockCleanNameFunc(marketSymbol) mktData <- get(cleanName,symbolData) linearRegressionCurve <- function(data,n){ regression <- function(dataBlock){ fit <-lm(dataBlock~seq(1,length(dataBlock),1)) return(last(fit$fitted.values)) } return (rollapply(data,width=n,regression,align="right",by.column=FALSE,na.pad=TRUE)) } volCurve <- function(data,n){ stdev <- function(dataBlock){ sd(dataBlock) } return (rollapply(data,width=n,stdev,align="right",by.column=FALSE,na.pad=TRUE))^2 } fastRegression <- linearRegressionCurve(Cl(mktData),nFastLookback) slowRegression <- linearRegressionCurve(Cl(mktData),nSlowLookback) normalisedStrengthRegression <- slowRegression / (slowRegression+fastRegression) fastVolCurve <- volCurve(Cl(mktData),nFastVolLookback) slowVolCurve <- volCurve(Cl(mktData),nSlowVolLookback) normalisedStrengthVol <- slowVolCurve / (slowVolCurve+fastVolCurve) fastRSICurve <-RSI(Cl(mktData),nFastRSILookback)/100 #rescale it to be in the same range as the other indicators slowRSICurve <-RSI(Cl(mktData),nSlowRSILookback)/100 #Lets plot the signals just to see what they look like dev.new() par(mfrow=c(2,2)) plot(normalisedStrengthVol,type="l") plot(normalisedStrengthRegression,type="l") plot(fastRSICurve,type="l") plot(slowRSICurve,type="l") #DataMeasure will be used to determine how similar other days are to today #It is used later on for calculate the days which are most similar to today according to MSE measure dataMeasure <- cbind(normalisedStrengthVol*normalisedStrengthVolWeight,normalisedStrengthRegression*normalisedStrengthRegression,fastRSICurve*fastRSICurveWeight,slowRSICurve*slowRSICurveWeight) colnames(dataMeasure) <- c("normalisedStrengthVol","normalisedStrengthRegression","fastRSICurve","slowRSICurve") #Finds the nearest neighbour and calculates the trade signal calculateNearestNeighbourTradeSignal <- function(dataMeasure,K,mktReturns,windowSize){ findKNearestNeighbours <- function(dataMeasure,K,windowSize){ calculateMSE <- function(dataA,dataB){ if(length(dataA) != length(dataB)){ return (Inf) } se <- ((as.vector(as.matrix(dataA)) - as.vector(as.matrix(dataB)))^2) res <- mean(se) if(is.na(res)){ res <- Inf } return (res) } mseScores <- as.data.frame(dataMeasure[,1]) mseScores[,1] <- Inf #Default all the mse scores to inf (we've not calculated them yet) colnames(mseScores) <- c("MSE") indexOfTheMostRecentWindowSizeDays <- seq(max(1,length(dataMeasure[,1])-windowSize),length(dataMeasure[,1])) mostRecentWindowDataMeasure <- dataMeasure[indexOfTheMostRecentWindowSizeDays,] for(i in seq(1,length(dataMeasure[,1]))){ indexHistoricalWindowDataMeasure <- seq(max(1,i-windowSize),i) historicalWindowDataMeasure <- dataMeasure[indexHistoricalWindowDataMeasure,] mseScores[i,1] <- calculateMSE(mostRecentWindowDataMeasure,historicalWindowDataMeasure) # print(paste("MSE is",mseScores[i,1])) } rowNum <- seq(1,length(dataMeasure[,1]),1) tmp <- c("MSE", colnames(dataMeasure)) dataMeasureWithMse <- as.data.frame(cbind(mseScores[,1],dataMeasure)) colnames(dataMeasureWithMse) <- tmp #print(head(mseScores)) #print(head(dataMeasureWithMse)) tmp <- c("rowNum", colnames(dataMeasureWithMse)) dataMeasureWithMse <- cbind(rowNum,dataMeasureWithMse) colnames(dataMeasureWithMse) <- tmp dataMeasureWithMse <- dataMeasureWithMse[order(dataMeasureWithMse[,"MSE"]),] #Starting from the 2nd item as the 1st is the current day (MSE will be 0) want to drop it return (dataMeasureWithMse[seq(2,min(K,length(dataMeasureWithMse[,1]))),]) } calculateTradeSignalFromKNeighbours <- function(mktReturns,kNearestNeighbours){ rowNums <- kNearestNeighbours[,"rowNum"] rowNums <- na.omit(rowNums) if(length(rowNums) <= 1) { return (0) } print("The kNearestNeighbours are:") print(rowNums) #So lets see what happened on the day AFTER our nearest match mktRet <- mktReturns[rowNums+1] # return (sign(sum(mktRet))) return (SharpeRatio.annualized(mktRet)) } kNearestNeighbours <- findKNearestNeighbours(dataMeasure,K,windowSize) tradeSignal <- calculateTradeSignalFromKNeighbours(mktReturns,kNearestNeighbours) return(tradeSignal) } ret <- (Cl(mktData)/Op(mktData))-1 signalLog <- as.data.frame(ret) signalLog[,1] <- 0 colnames(signalLog) <- c("TradeSignal") #Loop through all the days we have data for, and calculate a signal for them using nearest neighbour for(i in seq(1,length(ret))){ print (paste("Simulating trading for day",i,"out of",length(ret),"@",100*i/length(ret),"%")) index <- seq(1,i) signal <- calculateNearestNeighbourTradeSignal(dataMeasure[index,],kNearestGroupSize,ret,windowSize) signalLog[i,1] <- signal } dev.new() tradeRet <- Lag(signalLog[,1])*ret[,1] #Combine todays signal with tomorrows return (no lookforward issues) totalRet <- cbind(tradeRet,ret) colnames(totalRet) <- c("Algo",paste(marketSymbol," Long OpCl Returns")) charts.PerformanceSummary(totalRet,main=paste("K nearest trading algo for",marketSymbol,"kNearestGroupSize=",kNearestGroupSize,"windowSize=",windowSize),geometric=FALSE) print(SharpeRatio.annualized(tradeRet)) |

The algorithm is very simple and can be split into 3 components:

1. A Data Measure – What features / observation describe the current trading day? (vol, rsi, moving avg etc…, don’t forget to normalise your measurements) (variable **dataMeasure**)

2. Error Measure – How to measure the similarity between data measures (just use MSE) this identifies the K-most similar trading days to today (function **calculateMSE)**

3. Convert k-nearest neighbors to trading signal (function **calculateTradeSignalFromKNeighbours**)

In the data measure we look to come up with some quantitative measures that capture information about the current trading today. In the example presented below I’ve used a normalised volatility measure (vol(fast)/(vol(fast)+vol(slow)) where fast and slow indicate the window size, slower = longer window. The same procedure but for linear regression curves is used, additionally i’ve included a fast / slow rsi. We take this measure and compare it to the measures on all of the previous trading days, trying to identify the most similar K days in history.

You should look to normalise your signals in some fashion. The reason you need to do this is so that during the MSE calculation you haven’t unexpectedly put a large weight on one of your measurement variables.

Now that you’ve found a set of trading days that are most similar to your current trading day you still have to determine how to convert those days into a trading signal. In the code I take the Knearest neighbours and look at what occurred on the day after after them. I take the open close return and calculate the sharpe ratio of the K neighbors and use this as the number of contracts to buy the following day. If the K neighbors are unrelated their trading will be erratic and the sharpe ratio close to 0, hence we will only trade a small number of contracts.

This algo is potentially interesting when using vol as one of the data measures, it naturally captures the different regimes in the market. If today is a high vol day, it’ll be compared to the historical days that also have high vol. It is hoped that todays market still behaves in the same fashion as in a historically similar day.

Sadly the performance of this strategy is terrible (could just be poor input parameter selection / poor data measures). I suspect that there are better forms of K-nearest neighbour to use. I take today, and compare it to single days in history. There could be significant gains to be had if I take 1 month of data and find the historical most similar month. This will identify patterns of similar behavior which may be more tradeable. I will investigate this in my next post.

On to the code:

library("quantmod") library("PerformanceAnalytics") library("zoo") #INPUTS marketSymbol <- "ARM.L" nFastLookback <- 30 #The fast signal lookback used in linear regression curve nSlowLookback <- 50 #The slow signal lookback used in linear regression curve nFastVolLookback <- 30 #The fast signal lookback used to calculate the stdev nSlowVolLookback <- 50 #The slow signal lookback used calculate the stdev nFastRSILookback <- 30 #The fast signal lookback used to calculate the stdev nSlowRSILookback <- 50 #The slow signal lookback used calculate the stdev kNearestGroupSize <- 50 #How many neighbours to use normalisedStrengthVolWeight <- 2 #Make some signals more important than others in the MSE normalisedStrengthRegressionWeight <- 1 fastRSICurveWeight <- 2 slowRSICurveWeight <- 0.8 #Specify dates for downloading data, training models and running simulation startDate = as.Date("2006-08-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in stockCleanNameFunc <- function(name){ return(sub("^","",name,fixed=TRUE)) } getSymbols(marketSymbol, env = symbolData, src = "yahoo", from = startDate) cleanName <- stockCleanNameFunc(marketSymbol) mktData <- get(cleanName,symbolData) linearRegressionCurve <- function(data,n){ regression <- function(dataBlock){ fit <-lm(dataBlock~seq(1,length(dataBlock),1)) return(last(fit$fitted.values)) } return (rollapply(data,width=n,regression,align="right",by.column=FALSE,na.pad=TRUE)) } volCurve <- function(data,n){ stdev <- function(dataBlock){ sd(dataBlock) } return (rollapply(data,width=n,stdev,align="right",by.column=FALSE,na.pad=TRUE))^2 } fastRegression <- linearRegressionCurve(Cl(mktData),nFastLookback) slowRegression <- linearRegressionCurve(Cl(mktData),nSlowLookback) normalisedStrengthRegression <- slowRegression / (slowRegression+fastRegression) fastVolCurve <- volCurve(Cl(mktData),nFastVolLookback) slowVolCurve <- volCurve(Cl(mktData),nSlowVolLookback) normalisedStrengthVol <- slowVolCurve / (slowVolCurve+fastVolCurve) fastRSICurve <-RSI(Cl(mktData),nFastRSILookback)/100 #rescale it to be in the same range as the other indicators slowRSICurve <-RSI(Cl(mktData),nSlowRSILookback)/100 #Lets plot the signals just to see what they look like dev.new() par(mfrow=c(2,2)) plot(normalisedStrengthVol,type="l") plot(normalisedStrengthRegression,type="l") plot(fastRSICurve,type="l") plot(slowRSICurve,type="l") #DataMeasure will be used to determine how similar other days are to today #It is used later on for calculate the days which are most similar to today according to MSE measure dataMeasure <- cbind(normalisedStrengthVol*normalisedStrengthVolWeight,normalisedStrengthRegression*normalisedStrengthRegression,fastRSICurve*fastRSICurveWeight,slowRSICurve*slowRSICurveWeight) colnames(dataMeasure) <- c("normalisedStrengthVol","normalisedStrengthRegression","fastRSICurve","slowRSICurve") #Finds the nearest neighbour and calculates the trade signal calculateNearestNeighbourTradeSignal <- function(dataMeasure,K,mktReturns){ findKNearestNeighbours <- function(dataMeasure,K){ calculateMSE <- function(dataMeasure){ calculateMSEInner <- function(dataA,dataB){ se <- ((as.matrix(dataA) - as.matrix(dataB))^2) apply(se,1,mean) } #Repeat the last row of dataMeasure multiple times #This is so we can compare dataMeasure[today] with all the previous dates lastMat <- last(dataMeasure) setA <- lastMat[rep(1, length(dataMeasure[,1])),] setB <- dataMeasure mse <- calculateMSEInner(setB,setA) mse[is.na(mse)] <- Inf #Give it a terrible MSE if it's NA colName <- c(colnames(dataMeasure),"MSE") dataMeasure <- cbind(dataMeasure,mse) colnames(dataMeasure) <- colName return (dataMeasure) } rowNum <- seq(1,length(dataMeasure[,1]),1) dataMeasureWithMse <- as.data.frame(calculateMSE(dataMeasure)) tmp <- c("rowNum", colnames(dataMeasureWithMse)) dataMeasureWithMse <- cbind(rowNum,dataMeasureWithMse) colnames(dataMeasureWithMse) <- tmp dataMeasureWithMse <- dataMeasureWithMse[order(dataMeasureWithMse[,"MSE"]),] #Starting from the 2nd item as the 1st is the current day (MSE will be 0) want to drop it return (dataMeasureWithMse[seq(2,min(K,length(dataMeasureWithMse[,1]))),]) } calculateTradeSignalFromKNeighbours <- function(mktReturns,kNearestNeighbours){ rowNums <- kNearestNeighbours[,"rowNum"] rowNums <- na.omit(rowNums) if(length(rowNums) <= 1) { return (0) } print("The kNearestNeighbours are:") print(rowNums) #So lets see what happened on the day AFTER our nearest match mktRet <- mktReturns[rowNums+1] #return (sign(sum(mktRet))) return (SharpeRatio.annualized(mktRet)) } kNearestNeighbours <- findKNearestNeighbours(dataMeasure,K) tradeSignal <- calculateTradeSignalFromKNeighbours(mktReturns,kNearestNeighbours) return(tradeSignal) } ret <- (Cl(mktData)/Op(mktData))-1 signalLog <- as.data.frame(ret) signalLog[,1] <- 0 colnames(signalLog) <- c("TradeSignal") #Loop through all the days we have data for, and calculate a signal for them using nearest neighbour for(i in seq(1,length(ret))){ print (paste("Simulating trading for day",i,"out of",length(ret),"@",100*i/length(ret),"%")) index <- seq(1,i) signal <- calculateNearestNeighbourTradeSignal(dataMeasure[index,],kNearestGroupSize,ret) signalLog[i,1] <- signal } dev.new() tradeRet <- Lag(signalLog[,1])*ret[,1] #Combine todays signal with tomorrows return (no lookforward issues) totalRet <- cbind(tradeRet,ret) colnames(totalRet) <- c("Algo",paste(marketSymbol," Long OpCl Returns")) charts.PerformanceSummary(totalRet,main=paste("K nearest trading algo for",marketSymbol),geometric=FALSE) print(SharpeRatio.annualized(tradeRet)) |

The strategy is simple:

- Calculate a rolling ‘average’ and a rolling ‘deviation’
- If the Close price is greater than the average+n*deviation go short (and close when you cross the mean)
- If the Close price is less than the average-n*deviation go long (and close when you cross the mean)

Two cases will be analysed, one strategy will use a simple moving average(SMA), the other will use the linear regression curve(LRC) for the average. The deviation function will be Standard Devation, Average True Range, and LRCDeviation (same as standard deviation but replace the mean with the LRC).

**Results (Lookback = 20 and Deviation Multiplier = 2:**

Annualized Sharpe Ratio (Rf=0%)

- GSPC = 0.05257118
- Simple Moving Avg – Standard Deviation = 0.2535342
- Simple Moving Avg – Average True Range = 0.1165512
- Simple Moving Avg – LRC Deviation 0.296234
- Linear Regression Curve – Standard Deviation = 0.2818447
- Linear Regression Curve – Average True Range =
**0.5824727** - Linear Regression Curve – LRC Deviation = 0.04672071

**Optimisation analysis:**

Annoyingly the colour scale is different between the two charts, however the sharpe ratio is written in each cell. Lighter colours indicate better performance.

Over a 13year period and trading the GSPC the LRC achieved a sharpe of ~0.6 where as the SMA achieved a sharpe of ~0.3. The LRC appears superior to the SMA.

I will update this post at a later point in time when my optimisation has finished running for the other strategies.

library("quantmod") library("PerformanceAnalytics") library("zoo") library("gplots") #INPUTS marketSymbol <- "^GSPC" nLookback <- 20 #The lookback to calcute the moving average / linear regression curve / average true range / standard deviation nDeviation <- 2 #Specify dates for downloading data, training models and running simulation startDate = as.Date("2000-01-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in stockCleanNameFunc <- function(name){ return(sub("^","",name,fixed=TRUE)) } getSymbols(marketSymbol, env = symbolData, src = "yahoo", from = startDate) cleanName <- stockCleanNameFunc(marketSymbol) mktData <- get(cleanName,symbolData) linearRegressionCurve <- function(data,n){ regression <- function(dataBlock){ fit <-lm(dataBlock~seq(1,length(dataBlock),1)) return(last(fit$fitted.values)) } return (rollapply(data,width=n,regression,align="right",by.column=FALSE,na.pad=TRUE)) } linearRegressionCurveStandardDeviation <- function(data,n){ deviation <- function(dataBlock){ fit <-lm(dataBlock~seq(1,length(dataBlock),1)) quasiMean <- (last(fit$fitted.values)) quasiMean <- rep(quasiMean,length(dataBlock)) stDev <- sqrt((1/length(dataBlock))* sum((dataBlock - quasiMean)^2)) return (stDev) } return (rollapply(data,width=n,deviation,align="right",by.column=FALSE,na.pad=TRUE)) } reduceLongTradeEntriesToTradOpenOrClosedSignal <- function(trades){ #Takes something like #000011110000-1-1000011 (1 = go long, -1 = go short) #and turns it into #00001111111100000011 #trades[is.na(trades)] <- 0 out <- trades #copy the datastructure over currentPos <-0 for(i in 1:length(out[,1])){ if((currentPos == 0) & (trades[i,1]==1)){ currentPos <- 1 out[i,1] <- currentPos next } if((currentPos == 1) & (trades[i,1]==-1)){ currentPos <- 0 out[i,1] <- currentPos next } out[i,1] <- currentPos } return(out) } reduceShortTradeEntriesToTradOpenOrClosedSignal <- function(trades){ return(-1*reduceLongTradeEntriesToTradOpenOrClosedSignal(-1*trades)) } generateTradingReturns <- function(mktPrices, nLookback, nDeviation, avgFunction, deviationFunction,title,showGraph=TRUE){ quasiMean <- avgFunction(mktPrices,n=nLookback) quasiDeviation <- deviationFunction(mktPrices,n=nLookback) colnames(quasiMean) <- "QuasiMean" colnames(quasiDeviation) <- "QuasiDeviation" price <- Cl(mktPrices) upperThreshold = quasiMean + nDeviation*quasiDeviation lowerThreshold = quasiMean - nDeviation*quasiDeviation aboveUpperBand <- price>upperThreshold belowLowerBand <- price<lowerThreshold aboveMAvg <- price>quasiMean belowMAvg <- price<quasiMean aboveUpperBand[is.na(aboveUpperBand)]<-0 belowLowerBand[is.na(belowLowerBand)]<-0 aboveMAvg[is.na(aboveMAvg)]<-0 belowMAvg[is.na(belowMAvg)]<-0 rawShort <- (-1)*aboveUpperBand+belowMAvg shortPositions <- reduceShortTradeEntriesToTradOpenOrClosedSignal(rawShort) rawLong <- (-1)*aboveMAvg+belowLowerBand longPositions <- reduceLongTradeEntriesToTradOpenOrClosedSignal(rawLong) positions = longPositions + shortPositions signal <- positions if(showGraph){ dev.new() par(mfrow=c(2,1)) plot(Cl(mktPrices),type="l",main=paste(marketSymbol, "close prices")) lines(upperThreshold,col="red",type="l") lines(lowerThreshold,col="red",type="l") lines(quasiMean,col="blue",type="l") legend('bottomright',c("Close",paste("Band - ",title),paste("Average - ",title)),lty=1, col=c('black', 'red', 'blue'), bty='n', cex=.75) plot(signal) } mktReturns <- Cl(mktPrices)/Lag(Cl(mktPrices)) - 1 tradingReturns <- Lag(signal)*mktReturns tradingReturns[is.na(tradingReturns)] <- 0 colnames(tradingReturns) <- title return (tradingReturns) } strategySMAandSTDEV <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { SMA(Cl(x),n) },function(x,n) { rollapply(Cl(x),width=n, align="right",sd) },"Simple Moving Avg - Standard Deviation",FALSE) } strategySMAandATR <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { SMA(Cl(x),n) },function(x,n) { atr <- ATR(x,n); return(atr$atr) },"Simple Moving Avg - Average True Range",FALSE) } strategySMAandLRCDev <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { SMA(Cl(x),n) },function(x,n) { linearRegressionCurveStandardDeviation(Cl(x),n) },"Simple Moving Avg - LRC Deviation",FALSE) } strategyLRCandSTDEV <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { linearRegressionCurve(Cl(x),n) },function(x,n) { rollapply(Cl(x),width=n, align="right",sd) },"Linear Regression Curve - Standard Deviation",FALSE) } strategyLRCandATR <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { linearRegressionCurve(Cl(x),n) },function(x,n) { atr <- ATR(x,n); return(atr$atr) },"Linear Regression Curve - Average True Range",FALSE) } strategyLRCandLRCDev <- function(mktData,nLookback,nDeviation){ generateTradingReturns(mktData,nLookback,nDeviation,function(x,n) { linearRegressionCurve(Cl(x),n) },function(x,n) { linearRegressionCurveStandardDeviation(Cl(x),n) },"Linear Regression Curve - LRC Deviation",FALSE) } if(TRUE){ bollingerBandsSMAandSTDEVTradingReturns <- strategySMAandSTDEV(mktData,nLookback,nDeviation) bollingerBandsSMAandATRTradingReturns <- strategySMAandATR(mktData,nLookback,nDeviation) bollingerBandsSMAandLRCDevTradingReturns <- strategySMAandLRCDev(mktData,nLookback,nDeviation) bollingerBandsLRCandSTDEVTradingReturns <- strategyLRCandSTDEV(mktData,nLookback,nDeviation) bollingerBandsLRCandATRTradingReturns <- strategyLRCandATR(mktData,nLookback,nDeviation) bollingerBandsLRCandLRCDevTradingReturns <- strategyLRCandLRCDev(mktData,nLookback,nDeviation) mktClClRet <- Cl(mktData)/Lag(Cl(mktData))-1 tradingReturns <- merge(as.zoo(mktClClRet), as.zoo(bollingerBandsSMAandSTDEVTradingReturns), as.zoo(bollingerBandsSMAandATRTradingReturns), as.zoo(bollingerBandsSMAandLRCDevTradingReturns), as.zoo(bollingerBandsLRCandSTDEVTradingReturns), as.zoo(bollingerBandsLRCandATRTradingReturns), as.zoo(bollingerBandsLRCandLRCDevTradingReturns)) dev.new() charts.PerformanceSummary(tradingReturns,main=paste("Mean Reversion using nLookback",nLookback,"and nDeviation",nDeviation,"bands"),geometric=FALSE) print(table.Stats(tradingReturns)) cat("Sharpe Ratio") print(SharpeRatio.annualized(tradingReturns)) } colorFunc <- function(x){ x <- max(-4,min(4,x)) if(x > 0){ colorFunc <- rgb(0,(255*x/4)/255 , 0/255, 1) } else { colorFunc <- rgb((255*(-1*x)/4)/255,0 , 0/255, 1) } } optimiseTradingStrat <- function(mktData,lookbackStart,lookbackEnd,lookbackStep,deviationStart,deviationEnd,deviationStep,strategy,title){ lookbackRange <- seq(lookbackStart,lookbackEnd,lookbackStep) deviationRange <- seq(deviationStart,deviationEnd,deviationStep) combinations <- length(lookbackRange)*length(deviationRange) combLookback <- rep(lookbackRange,each=combinations/length(lookbackRange)) combDeviation <- rep(deviationRange,combinations/length(deviationRange)) optimisationMatrix <- t(rbind(t(combLookback),t(combDeviation),rep(NA,combinations),rep(NA,combinations),rep(NA,combinations))) colnames(optimisationMatrix) <- c("Lookback","Deviation","SharpeRatio","CumulativeReturns","MaxDrawDown") for(i in 1:length(optimisationMatrix[,1])){ print(paste("On run",i,"out of",length(optimisationMatrix[,1]),"nLookback=",optimisationMatrix[i,"Lookback"],"nDeviation=",optimisationMatrix[i,"Deviation"])) runReturns <- strategy(mktData,optimisationMatrix[i,"Lookback"],optimisationMatrix[i,"Deviation"]) optimisationMatrix[i,"SharpeRatio"] <- SharpeRatio.annualized(runReturns) optimisationMatrix[i,"CumulativeReturns"] <- sum(runReturns) optimisationMatrix[i,"MaxDrawDown"] <- maxDrawdown(runReturns,geometric=FALSE) print(optimisationMatrix) } print(optimisationMatrix) dev.new() z <- matrix(optimisationMatrix[,"SharpeRatio"],nrow=length(lookbackRange),ncol=length(deviationRange),byrow=TRUE) colors <- colorFunc(optimisationMatrix[,"SharpeRatio"]) rownames(z) <- lookbackRange colnames(z) <-deviationRange heatmap.2(z, key=TRUE,trace="none",cellnote=round(z,digits=2),Rowv=NA, Colv=NA, scale="column", margins=c(5,10),xlab="Deviation",ylab="Lookback",main=paste("Sharpe Ratio for Strategy",title)) } if(FALSE){ dev.new() plot(Cl(mktData),type="l",main=paste(marketSymbol, "close prices")) lines(SMA(Cl(mktData),n=50),col="red",type="l") lines(linearRegressionCurve(Cl(mktData),n=50),col="blue",type="l") legend('bottomright',c("Close",paste("Simple Moving Average Lookback=50"),paste("Linear Regression Curve Lookback=50")),lty=1, col=c('black', 'red', 'blue'), bty='n', cex=.75) } nLookbackStart <- 20 nLookbackEnd <- 200 nLookbackStep <- 20 nDeviationStart <- 1 nDeviationEnd <- 2.5 nDeviationStep <- 0.1 #optimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategySMAandSTDEV,"AvgFunc=SMA and DeviationFunc=STDEV") #optimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategySMAandATR,"AvgFunc=SMA and DeviationFunc=ATR") #optimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategySMAandLRCDev,"AvgFunc=SMA and DeviationFunc=LRCDev") #optimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategyLRCandSTDEV,"AvgFunc=LRC and DeviationFunc=STDEV") #optimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategyLRCandATR,"AvgFunc=LRC and DeviationFunc=ATR") #doptimiseTradingStrat(mktData,nLookbackStart,nLookbackEnd,nLookbackStep,nDeviationStart,nDeviationEnd,nDeviationStep,strategyLRCandLRCDev,"AvgFunc=LRC and DeviationFunc=LRCDev") |

It is assumed that the price will revert to the moving average hence any price move to the bands is a good entry point. A common problem with this strategy is that the moving average is a LAGGING indicator and is often very slow to track the price moves if a long lookback period is used.

Video 1 presents a technique called “linear regression curves” about 10mins in. Linear regression curves aim to solve the problem of the moving average being slow to track the price.

**Linear Regression Curve vs Simple Moving Average**

See how tightly the blue linear regression curve follows the close price, it’s significantly quicker to identify turns in the market where as the simple moving average has considerable tracking error. The MSE could be taken to quantify the tightness.

**How to calculate the linear regression curve:**

In this example you have 100 closing prices for your given stock. Bar 1 is the oldest price, bar 100 is the most recent price. We will use a 20day regression.

1. Take prices 1-20 and draw the line of best fit through them

2. At the end of your best fit line (so bar 20), draw a little circle

3. Take prices 2-21 and draw the line of best fit through them

4. At the end of your best fit line (so bar 21) draw a little circle

5. Repeat upto bar 100

6. Join all of your little circles, this is your ‘linear regression curve’

So in a nutshell you just join the ends of a rolling linear regression.

A genetic algo consists of three things:

- A gene
- A fitness function
- Methods to breed/mate genes

The gene is typically a binary number, each bit in the binary number controls various parts of your trading strategy. The gene below contains 4 sub gene, a stock gene to select what stock to trade, a strategy gene to select what strategy to use, paramA sets a parameter used in your strategy and paramB sets another parameter to use in your strategy.

**Gene = [StockGene,StrategyGene,ParamA,ParamB]**

Stock Gene | ||
---|---|---|

00 | ||

01 | ||

10 | IBM | |

11 |

Strategy Gene | ||
---|---|---|

0 | Simple Moving Average | |

1 | Exponential Moving Average |

ParamA Gene – Moving Average 1 Lookback | ||
---|---|---|

00 | 10 | |

01 | 20 | |

10 | 30 | |

11 | 40 |

ParamB Gene – Moving Average 2 Lookback | ||
---|---|---|

00 | 15 | |

01 | 25 | |

10 | 35 | |

11 | 45 |

So **Gene = [01,1,00,11]**

Would be stock=Facebook, strategy=Exponential Moving Average,paramA=10,paramB=45].

The strategy rules are simple, if the moving average(length=paramA) > moving average(length=paramB) then go long, and vice versa.

A gene is quantified as a good or bad gene using a fitness function. The success of a genetic trading strategy depends heavily upon your choice of fitness function and whether it makes sense with the strategies you intend to use. You will trade each of the strategies outlined by your active genes and then rank them by their fitness. A good starting point would be to use the sharp ratio as the fitness function.

You need to be careful that you apply the fitness function to statistically significant data. For example if you used a mean reverting strategy that might trade once a month (or what ever your retraining window is), then your fitness is determined by 1 or 2 datapoints!!! This will result in poor genetic optimisation (in my code i’ve commented out a mean reversion strategy test for yourself). Typically what happens is your sharpe ratio from 2 datapoints is very very high merely down to luck. You then mark this as a good gene and trade it the next month with terrible results.

With a genetic algo you need to breed genes, for the rest of this post i’ll assume you are breeding once a month. During breeding you take all of the genes in your gene pool and rank them according to the fitness function. You then select the top N genes and breed them (discard all the other genes they’re of no use).

Breeding consists of two parts:

**Hybridisation** – Take a gene and cut a chunk out of it, you can use whatever random number generator you want to determine the cut locations, swap this chunk with a corresponding chunk from another gene.

Eg.

Old gene: 00110010 and 11100110 (red is the randomly select bits to cut)

New gene: 00100110 and 11110010

You do this for every possible pair of genes in your top N list.

**Mutation** – After hybridisation go through all your genes and randomly flip the bits with an fixed probability. The mutation prevents your strategy from getting locked into an every shrinking gene pool.

For a more detailed explanation with diagrams please see:

http://blog.equametrics.com/ scroll down to Genetic Algorithms and its Application in Trading

**Annualized Sharpe Ratio (Rf=0%) 1.15**

On to the code:

library("quantmod") library("PerformanceAnalytics") library("zoo") #INPUTS topNToSelect <- 5 #Top n genes are selected during the mating, these will be mated with each other mutationProb <- 0.05 #A mutation can occur during the mating, this is the probability of a mutation for individual chromes symbolLst <- c("^GDAXI","^FTSE","^GSPC","^NDX","AAPL","ARMH","JPM","GS") #symbolLst <- c("ADN.L","ADM.L","AGK.L","AMEC.L","AAL.L","ANTO.L","ARM.L","ASHM.L","ABF.L","AZN.L","AV.L","BA.L","BARC.L","BG.L","BLT.L","BP.L","BATS.L","BLND.L","BSY.L","BNZL.L","BRBY.L","CSCG.L","CPI.L","CCL.L","CNA.L","CPG.L","CRH.L","CRDA.L","DGE.L","ENRC.L","EXPN.L","FRES.L","GFS.L","GKN.L","GSK.L","HMSO.L","HL.L","HSBA.L","IAP.L","IMI.L","IMT.L","IHG.L","IAG.L","IPR.L","ITRK.L","ITV.L","JMAT.L","KAZ.L","KGF.L","LAND.L","LGEN.L","LLOY.L","EMG.L","MKS.L","MGGT.L","MRW.L","NG.L","NXT.L","OML.L","PSON.L","PFC.L","PRU.L","RRS.L","RB.L","REL.L","RSL.L","REX.L","RIO.L","RR.L","RBS.L","RDSA.L","RSA.L","SAB.L","SGE.L","SBRY.L","SDR.L","SRP.L","SVT.L","SHP.L","SN.L","SMIN.L","SSE.L","STAN.L","SL.L","TATE.L","TSCO.L","TLW.L","ULVR.L","UU.L","VED.L","VOD.L","WEIR.L","WTB.L","WOS.L","WPP.L","XTA.L") #END INPUTS #Stock gene stockGeneLength <- 3 #8stocks #stockGeneLength<-6 #Allows 2^6 stocks (64) #Strategy gene strateyGeneLength<-2 #Paramter lookback gene parameterLookbackGeneLength<-6 #Calculate the length of our chromozone, chromozone=[gene1,gene2,gene3...] chromozoneLength <- stockGeneLength+strateyGeneLength+parameterLookbackGeneLength #TradingStrategies signalMACross <- function(mktdata, paramA, paramB, avgFunc=SMA){ signal = avgFunc(mktdata,n=paramA)/avgFunc(mktdata,n=paramB) signal[is.na(signal)] <- 0 signal <- (signal>1)*1 #converts bools into ints signal[signal==0] <- (-1) return (signal) } signalBollingerReversion <- function(mktdata, paramA, paramB){ avg <- SMA(mktdata,paramB) std <- 1*rollapply(mktdata, paramB,sd,align="right") shortSignal <- (mktdata > avg+std)*-1 longSignal <- (mktdata < avg-std)*1 signal <- shortSignal+longSignal signal[is.na(signal)]<-0 return (signal) } signalRSIOverBoughtOrSold <- function(mktdata, paramA, paramB){ upperLim <- min(60*(1+paramB/100),90) lowerLim <- max(40*(1-paramB/100),10) rsisignal <- RSI(mktdata,paramB) signal <- ((rsisignal>upperLim)*-1)+((rsisignal<lowerLim)*1) return (signal) } #Gene = [StockGene,StrategyGene,ParamAGene,ParamBGene] #The following functions extract specific parts of the gene getStockGeneFromChromozone <- function(chrome){ return(chrome[,seq(1,stockGeneLength)]) } getStrategyGeneFromChromozone <- function(chrome){ return(chrome[,seq(stockGeneLength+1,stockGeneLength+strateyGeneLength)]) } getParameterLookbackGeneFromChromozone <- function(chrome){ return(chrome[,seq(stockGeneLength+strateyGeneLength+1,stockGeneLength+strateyGeneLength+parameterLookbackGeneLength)]) } #Once parts of the gene have been extracted they are then converted into #lookback values, what stocks to trade, or what strategy to use getStockDataFromChromozone<- function(chrome){ #Basically a binary number to decimal converter gene <- getStockGeneFromChromozone(chrome) index <-sum(chrome*(2^(seq(1,length(gene),1)-1)))+1 #The +1 is to stop 0 since not a valid index cleanName <- sub("^","",symbolLst[index],fixed=TRUE) return (get(cleanName,symbolData)) } getStrategyFromChromozone <- function(chrome){ gene <- matrix(getStrategyGeneFromChromozone(chrome)) if(all(gene==matrix(c(0,0)))){ return (signalMACross) } if(all(gene==matrix(c(0,1)))){ return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=EMA)}) } if(all(gene==matrix(c(1,0)))){ return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=ZLEMA)}) # return (signalBollingerReversion) } if(all(gene==matrix(c(1,1)))){ return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=WMA)}) # return (signalRSIOverBoughtOrSold) } print("nothing found") } getLookbackAFromChromozone <- function(chrome){ gene <- getParameterLookbackGeneFromChromozone(chrome) gene <- gene[,1:3] gene <- matrix(gene) if(all(gene==matrix(c(0,0,0)))){ return (10) } if(all(gene==matrix(c(0,0,1)))){ return (20) } if(all(gene==matrix(c(0,1,0)))){ return (30) } if(all(gene==matrix(c(0,1,1)))){ return (40) } if(all(gene==matrix(c(1,0,0)))){ return (50) } if(all(gene==matrix(c(1,0,1)))){ return (60) } if(all(gene==matrix(c(1,1,0)))){ return (70) } if(all(gene==matrix(c(1,1,1)))){ return (80) } } getLookbackBFromChromozone <- function(chrome){ gene <- getParameterLookbackGeneFromChromozone(chrome) gene <- gene[,4:6] gene <- matrix(gene) if(all(gene==matrix(c(0,0,0)))){ return (15) } if(all(gene==matrix(c(0,0,1)))){ return (25) } if(all(gene==matrix(c(0,1,0)))){ return (35) } if(all(gene==matrix(c(0,1,1)))){ return (45) } if(all(gene==matrix(c(1,0,0)))){ return (55) } if(all(gene==matrix(c(1,0,1)))){ return (65) } if(all(gene==matrix(c(1,1,0)))){ return (75) } if(all(gene==matrix(c(1,1,1)))){ return (85) } } #The more positive the fitness, the better the gene calculateGeneFitnessFromTradingReturns <- function(tradingRet){ tradingFitness <- SharpeRatio.annualized(tradingRet) #tradingFitness <- SharpeRatio.annualized(tradingRet) * (1/maxDrawdown(tradingRet)) #tradingFitness <- max(cumsum(tradingRet))/maxDrawdown(tradingRet) #tradingFitness <- sum((tradingRet>0)*1)/length(tradingRet) #% of trades profitable #tradingFitness <- -1*maxDrawdown(tradingRet) return(tradingFitness) } #This function performs the mating between two chromozones genetricMating <- function(chromozoneFitness,useTopNPerformers,mutationProb){ selectTopNPerformers <- function(chromozoneFitness,useTopNPerformers){ #Ranks the chromozones by their fitness and select the topNPerformers orderedChromozones <- order(chromozoneFitness[,"Fitness"],decreasing=TRUE) orderedChromozones <- chromozoneFitness[orderedChromozones,] ##Often there are lots of overlapping strategies with the same fitness ##We should filter by unique fitness to stop the overweighting of lucky high fitness orderedChromozones <- subset(orderedChromozones, !duplicated(Fitness)) print(orderedChromozones) return(orderedChromozones[seq(1,min(nrow(orderedChromozones),useTopNPerformers)),]) } hybridize <- function(topChromozones,mutationProb){ crossoverFunc <- function(chromeA,chromeB){ chromeA <- chromeA[,!colnames(chromeA) %in% c("Fitness")] chromeB <- chromeB[,!colnames(chromeB) %in% c("Fitness")] #Takes a number of chromes from B and swaps them in to A nCross <- runif(min=0,max=ncol(chromeA)-1,1) #the number of individual chromes to swap swapStartLocation = round(runif(min=1,max=ncol(chromeA),1)) swapLocations <- seq(swapStartLocation,swapStartLocation+nCross) #Can run over the end of our vector, need to wrap around back to start swapLocations <- swapLocations %% ncol(chromeA)+1 #Performs the wrapping chromeA[1,swapLocations] <- chromeB[1,swapLocations] #Performs the swap return (chromeA) } mutateFunc <- function(chrome,mutationProb){ return((round(runif(min=0,max=1,ncol(chrome))<mutationProb)+chrome) %% 2) } #Take each chromozone and mate it with all the others (and it's self) a <- topChromozones[rep(seq(1,nrow(topChromozones)),each=nrow(topChromozones)),] #Repeat each row nrow times b <- topChromozones[rep(seq(1,nrow(topChromozones)),nrow(topChromozones)),] #Repeat whole matrix nrow times #Can this be vectorised (not huge amounts of data anyway so probs not an issue)? res <- matrix(nrow=0,ncol=ncol(a)-1) #The minus 1 is to drop the "Fitness" column for(i in 1:nrow(a)){ res <- rbind(res,mutateFunc(crossoverFunc(a[i,],b[i,]),mutationProb)) } return (res) } topChromozones <- selectTopNPerformers(chromozoneFitness,useTopNPerformers) #return ((hybridize(topChromozones,mutationProb))) #You may want duplicates to give more weight to 'good' genes return (unique(hybridize(topChromozones,mutationProb))) #Remove duplicate genes } #This function takes a chrome/gene and does the according trades #It takes market data and a start and an end date #It does not take responsibility for the mating and ranking of genes doGeneticTrading <- function(mktdata,chrome, startDate, endDate){ signalFunc <-getStrategyFromChromozone(chrome) paramA <- getLookbackAFromChromozone(chrome) paramB <- getLookbackBFromChromozone(chrome) signal <- signalFunc(Op(mktdata),paramA,paramB) opClRet <- (Cl(mktdata)/Op(mktdata)) - 1 tradingReturns = opClRet * signal dataWin <- (paste(startDate,"::",endDate,sep="")) tradingReturns <- tradingReturns[dataWin] colnames(tradingReturns) <- c("TradingRet") return(tradingReturns) } #This function mates genes every month #It also passes those genes into the doGeneticTrading function doTrading <- function(chromelist){ #Function for taking a year and a month and spitting out a clean date cleanDate <- function(y,m){ if(m == 13){ m <- 1 y <- y+1 } if(m < 10){ return(paste(y,paste("0",m,sep=""),sep="-")) } else { return(paste(y,m,sep="-")) } } year <- 2002 month <- 1 totalRet <- 0 fitnessEvoltion <- 0 dev.new() par(mfrow=c(2,1)) #Loop through many years and months for(y in 2002:2010){ for(m in 1:12){ chromeFitness <- as.data.frame(matrix(nrow=0,ncol=ncol(chromelist))) startD <- cleanDate(y-2,m) #Subtracting off 2 years to ensure we pass enough data in(should really be calculated from MA lookback) liveStart <- cleanDate(y,m) liveEnd <- cleanDate(y,m+1) print(paste("Start",startD,"LiveStart",liveStart,"LiveEnd",liveEnd)) dataWin <- (paste(startD,"::",liveEnd,sep="")) monthReturn <- data.frame() #Look through all the active chromes and use them for trading for(cn in 1:nrow(chromelist)){ #USE a try catch just incase there are data issue etc... try({ mktdata <- getStockDataFromChromozone(chromelist[cn,]) tradingRet <- doGeneticTrading(mktdata[dataWin],chromelist[cn,],liveStart,liveEnd) tradingRet <- tradingRet*(1/nrow(chromelist)) #even money given to each strategy tradingFitness <- calculateGeneFitnessFromTradingReturns(tradingRet) if(!is.nan(tradingFitness) && !is.nan(max(tradingRet)) && !is.nan(min(tradingRet))){ if(length(monthReturn) == 0 ){ monthReturn <- tradingRet } else { monthReturn <- cbind(monthReturn,tradingRet) } res <- cbind(chromelist[cn,],tradingFitness) colnames(res) <- c(colnames(chromelist[cn,]),"Fitness") chromeFitness <- rbind(chromeFitness,res) } },silent=FALSE) } print("Month return") #Collapse all the trades from each chromozone into a single P&L for each day in the month monthReturn <- apply(monthReturn,1,sum,na.rm=TRUE) print(monthReturn) currentMonthFitness <- calculateGeneFitnessFromTradingReturns(monthReturn) #Update the running total of P&L totalRet <- c(totalRet,monthReturn) fitnessEvoltion <- c(fitnessEvoltion,currentMonthFitness) plot(cumsum(totalRet)) plot(fitnessEvoltion) #print(chromeFitness) #print(chromeFitness[,"Fitness"]) chromelist <- genetricMating(chromeFitness,topNToSelect,mutationProb) print(paste("There are",nrow(chromelist), "chromes active")) print(paste("Min Fitness:",min(chromeFitness[,"Fitness"]))) print(paste("Max Fitness:",max(chromeFitness[,"Fitness"]))) print(paste("Average Fitness:",mean(chromeFitness[,"Fitness"]))) print(paste("Current Month Fitness:",currentMonthFitness)) } } return (totalRet) } #Specify dates for downloading data, training models and running simulation startDate = as.Date("2000-01-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in getSymbols(symbolLst, env = symbolData, src = "yahoo", from = startDate) #Create some genes at random #Make a diag matrix so that each chrome gets activated atleast once startingChromozones <- diag(chromozoneLength) rownames(startingChromozones) <- apply(t(seq(1,chromozoneLength)),2,function(x) { paste("Chrome",x,sep="") } ) fitness <- matrix(runif(min=-1,max=1,nrow(startingChromozones)),nrow=nrow(startingChromozones),ncol=1) colnames(fitness) <- c("Fitness") startingChromozones <- as.data.frame(cbind(startingChromozones,fitness)) print("Before mating") print(startingChromozones) print("After mating") startingChromozones <- genetricMating(startingChromozones,topNToSelect,mutationProb) print(startingChromozones) tradingReturns <- doTrading(startingChromozones) tradingReturns <- as.data.frame((as.matrix(tradingReturns[-1]))) tradingReturns<-as.zoo(tradingReturns) dev.new() charts.PerformanceSummary(tradingReturns,main=paste("Arithmetic Genetic Trading Returns"),geometric=FALSE) print(table.Stats(tradingReturns)) cat("Sharpe Ratio") print(SharpeRatio.annualized(tradingReturns)) |

The code below calculates the rolling standard deviation of returns, ‘the risk’, for the FTSE 100 constituents. It then groups stocks into quartiles by this risk metric, the groups are updated daily. Quartile 1 is the lowest volatility stocks, quartile 2 the highest. An equally weighted ($ amt) index is created for each quartile. According to the above theory Q4 (high vol) should produce the highest cumulative returns.

When using a 1 month lookback for the stdev calculation there is a clear winning index, the lowest vol index (black). Interestingly the 2nd best index is the highest vol index (blue). The graph above is calculated using arithmetic returns.

When using a longer lookback of 250 days, a trading year, the highest vol index is the best performer and the lowest vol index the worst performer.

For short lookback (30days) low vol index was the best performer

For long lookback (250days) high vol index was the best performer

One possible explanation (untested) is that for a short lookback the volatility risk metric is more sensitive to moves in the stock and hence on a news announcement / earnings the stock has a higher likelihood of moving from it’s current index into a higher vol index. Perhaps it isn’t unreasonable to assume that the high vol index contains only the stocks that have had a recent announcement / temporary volatility and are in a period of consolidation or mean reversion. Or to put it another way for short lookbacks the high vol index doesn’t contain the stocks that are permanently highly vol, whereas for long lookbacks any temporary vol deviations are smoothed out.

Below are the same charts as above but for geometric returns.

On to the code:

library("quantmod") library("PerformanceAnalytics") library("zoo") #Script parameters symbolLst <- c("ADN.L","ADM.L","AGK.L","AMEC.L","AAL.L","ANTO.L","ARM.L","ASHM.L","ABF.L","AZN.L","AV.L","BA.L","BARC.L","BG.L","BLT.L","BP.L","BATS.L","BLND.L","BSY.L","BNZL.L","BRBY.L","CSCG.L","CPI.L","CCL.L","CNA.L","CPG.L","CRH.L","CRDA.L","DGE.L","ENRC.L","EXPN.L","FRES.L","GFS.L","GKN.L","GSK.L","HMSO.L","HL.L","HSBA.L","IAP.L","IMI.L","IMT.L","IHG.L","IAG.L","IPR.L","ITRK.L","ITV.L","JMAT.L","KAZ.L","KGF.L","LAND.L","LGEN.L","LLOY.L","EMG.L","MKS.L","MGGT.L","MRW.L","NG.L","NXT.L","OML.L","PSON.L","PFC.L","PRU.L","RRS.L","RB.L","REL.L","RSL.L","REX.L","RIO.L","RR.L","RBS.L","RDSA.L","RSA.L","SAB.L","SGE.L","SBRY.L","SDR.L","SRP.L","SVT.L","SHP.L","SN.L","SMIN.L","SSE.L","STAN.L","SL.L","TATE.L","TSCO.L","TLW.L","ULVR.L","UU.L","VED.L","VOD.L","WEIR.L","WTB.L","WOS.L","WPP.L","XTA.L") #Specify dates for downloading data startDate = as.Date("2000-01-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in clClRet <- new.env() downloadedSymbols <- list() for(i in 1:length(symbolLst)){ #Download one stock at a time print(paste(i,"/",length(symbolLst),"Downloading",symbolLst[i])) tryCatch({ getSymbols(symbolLst[i], env = symbolData, src = "yahoo", from = startDate) cleanName <- sub("^","",symbolLst[i],fixed=TRUE) mktData <- get(cleanName,symbolData) print(paste("-Calculating close close returns for:",cleanName)) ret <-(Cl(mktData)/Lag(Cl(mktData)))-1 if(max(abs(ret),na.rm=TRUE)>0.5){ print("-There is a abs(return) > 50% the data is odd lets not use this stock") next; } downloadedSymbols <- c(downloadedSymbols,symbolLst[i]) assign(cleanName,ret,envir = clClRet) }, error = function(e) { print(paste("Couldn't download: ", symbolLst[i])) }) } #Combine all the returns into a zoo object (joins the returns by date) #Not a big fan of this loop, think it's suboptimal zooClClRet <- zoo() for(i in 1:length(downloadedSymbols)){ cleanName <- sub("^","",downloadedSymbols[i],fixed=TRUE) print(paste("Combining the close close returns to the zoo:",cleanName)) if(length(zooClClRet)==0){ zooClClRet <- as.zoo(get(cleanName,clClRet)) } else { zooClClRet <- merge(zooClClRet,as.zoo(get(cleanName,clClRet))) } } print(head(zooClClRet)) #This will take inzoo or data frame #And convert each row into quantiles #Quantile 1 = 0-0.25 #Quantile 2 = 0.25-0.5 etc... quasiQuantileFunction <- function(dataIn){ quantileFun <- function(rowIn){ quant <- quantile(rowIn,na.rm=TRUE) #print(quant) a <- (rowIn<=quant[5]) b <- (rowIn<=quant[4]) c <- (rowIn<=quant[3]) d <- (rowIn<=quant[2]) rowIn[a] <- 4 rowIn[b] <- 3 rowIn[c] <- 2 rowIn[d] <- 1 return(rowIn) } return (apply(dataIn,2,quantileFun)) } avgReturnPerQuantile <- function(returnsData,quantileData){ q1index <- (clClQuantiles==1) q2index <- (clClQuantiles==2) q3index <- (clClQuantiles==3) q4index <- (clClQuantiles==4) q1dat <- returnsData q1dat[!q1index] <- NaN q2dat <- returnsData q2dat[!q2index] <- NaN q3dat <- returnsData q3dat[!q3index] <- NaN q4dat <- returnsData q4dat[!q4index] <- NaN avgFunc <- function(x) { #apply(x,1,median,na.rm=TRUE) #median is more resistant to outliers apply(x,1,mean,na.rm=TRUE) } res <- returnsData[,1:4] #just to maintain the time series (there must be a better way) res[,1] <- avgFunc(q1dat) res[,2] <- avgFunc(q2dat) res[,3] <- avgFunc(q3dat) res[,4] <- avgFunc(q4dat) colnames(res) <- c("Q1","Q2","Q3","Q4") return(res) } nLookback <- 250 #~1year trading calendar clClVol <- rollapply(zooClClRet,nLookback,sd,na.rm=TRUE) clClQuantiles <- quasiQuantileFunction(clClVol) returnPerVolQuantile <- avgReturnPerQuantile(zooClClRet,clClQuantiles) colnames(returnPerVolQuantile) <- c("Q1 min vol","Q2","Q3","Q4 max vol") returnPerVolQuantile[is.nan(returnPerVolQuantile)]<-0 #Assume if there is no return data that it's return is 0 #returnPerVolQuantile[returnPerVolQuantile>0.2] <- 0 #I was having data issues leading to days with 150% returns! This filters them out cumulativeReturnsByQuantile <- apply(returnPerVolQuantile,2,cumsum) dev.new() charts.PerformanceSummary(returnPerVolQuantile,main=paste("Arithmetic Cumulative Returns per Vol Quantile - Lookback=",nLookback),geometric=FALSE) print(table.Stats(returnPerVolQuantile)) cat("Sharpe Ratio") print(SharpeRatio.annualized(returnPerVolQuantile)) dev.new() par(oma=c(0,0,2,0)) par(mfrow=c(3,3)) for(i in seq(2012,2004,-1)){ print(as.Date(paste(i,"-01-01",sep=""))) print(as.Date(paste(i+1,"-01-01",sep=""))) windowedData <- window(as.zoo(returnPerVolQuantile),start=as.Date(paste(i,"-01-01",sep="")),end=as.Date(paste(i+1,"-01-01",sep=""))) chart.CumReturns(windowedData,main=paste("Year",i,"to",i+1),geometric=FALSE) } title(main=paste("Arithmetic Cumulative Returns per Vol Quantile - Lookback=",nLookback),outer=T) dev.new() charts.PerformanceSummary(returnPerVolQuantile,main=paste("Geometric Cumulative Returns per Vol Quantile - Lookback=",nLookback),geometric=TRUE) print(table.Stats(returnPerVolQuantile)) cat("Sharpe Ratio") print(SharpeRatio.annualized(returnPerVolQuantile)) dev.new() par(oma=c(0,0,2,0)) par(mfrow=c(3,3)) for(i in seq(2012,2004,-1)){ print(as.Date(paste(i,"-01-01",sep=""))) print(as.Date(paste(i+1,"-01-01",sep=""))) windowedData <- window(as.zoo(returnPerVolQuantile),start=as.Date(paste(i,"-01-01",sep="")),end=as.Date(paste(i+1,"-01-01",sep=""))) chart.CumReturns(windowedData,main=paste("Year",i,"to",i+1),geometric=TRUE) } title(main=paste("Geometric Cumulative Returns per Vol Quantile - Lookback=",nLookback),outer=T) |