This post will explore how to train hidden markov models in R. The previous posts in this series detailed the maths that power the HMM, fortunately all of this has been implemented for us in the RHmm package. HMMs can be used in two ways for regime detection, the first is to use a single HMM where each state in the HMM is considered a “regime”. The second method is to have multiple HMMs each designed to model an individual regime, the task is then to chose between models by looking at which is the most likely to have generated the data. I will explore both methods.
Method One – Single HMM Each State is a Regime
The credit for this section must go to the fantastic Systematic Investor blog http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/. The code is well commented and should be self explanatory. Essentially two markets regimes (bull and bear) are simulated, a 2 state HMM is then trained on the data. The forward backward algorithm is then used to calculate the probability of being in a given state at any given time.
Method Two – Multiple HMMs with multiple states – Each HMM a regime
Three market regimes are simulated; bull, bear and a sideways market. Three different 2 stage HMM models are trained on each regime. Model 1 is the HMM for the bull market, Model 2 is the HMM for the bear market, and Model 3 is the HMM for a side ways market. A rolling window of 50 days worth of data is passed into each HMM and a log likelihood score produced. The higher the log likelihood the more likely it is that the model generated the observed data.
As can be seen in the above chart, the log likelihood is fairly decent for determining the difference between the bull and bear markets. Sadly the side ways model seems very likely in both the bull and bear cases, it’s log likelihood is fairly stable and doesn’t change per regime.
Code for method 1:
library('RHmm') #Load HMM package #Code based upon http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/ bullMarketOne = rnorm( 100, 0.1/365, 0.05/sqrt(365) ) bearMarket = rnorm( 100, -0.2/365, 0.15/sqrt(365)) bullMarketTwo = rnorm( 100, 0.15/365, 0.07/sqrt(365) ) true.states = c(rep(1,100),rep(2,100),rep(1,100)) returns = c( bullMarketOne, bearMarket, bullMarketTwo ) y=returns ResFit = HMMFit(y, nStates=2) #Fit a HMM with 2 states to the data VitPath = viterbi(ResFit, y) #Use the viterbi algorithm to find the most likely state path (of the training data) fb = forwardBackward(ResFit, y) #Forward-backward procedure, compute probabilities # Plot probabilities and implied states layout(1:3) plot(cumsum(returns),ylab="Cumulative Market Return",type="l", main="Fake Market Data") plot(VitPath$states, type='s', main='Implied States', xlab='', ylab='State') matplot(fb$Gamma, type='l', main='Smoothed Probabilities', ylab='Probability') legend(x='topright', c('Bear Market - State 2','Bull Market - State 1'), fill=1:2, bty='n') |
Code for method 2:
library('RHmm') #Load HMM package library('zoo') #HMM model 1 (high vol and low vol upwards trend) model1ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.1,if(isHighVol){0.15}else{0.02})) } bullLowVol = model1ReturnsFunc(F) bullHighVol = model1ReturnsFunc(T) model1TrainingReturns = c(bullLowVol, bullHighVol) Model1Fit = HMMFit(model1TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 2 (high vol and low vol downwards trend) model2ReturnsFunc <- function(isHighVol){ return(rnorm( 100, -0.1,if(isHighVol){0.15}else{0.02})) } bearLowVol = model2ReturnsFunc(F) bearHighVol = model2ReturnsFunc(T) model2TrainingReturns = c(bearLowVol, bearHighVol) Model2Fit = HMMFit(model2TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 3 (sideways market) model3ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.0,if(isHighVol){0.16}else{0.08})) } sidewaysLowVol = model3ReturnsFunc(F) sidewaysHighVol = model3ReturnsFunc(T) model3TrainingReturns = c(sidewaysLowVol, sidewaysHighVol) Model3Fit = HMMFit(model3TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data generateDataFunc <- function(modelSequence,highVolSequence){ results <- c() if(length(modelSequence) != length(highVolSequence)){ print("Model Sequence and Vol Sequence must be the same length"); return(NULL)} for(i in 1:length(modelSequence)){ #Bit rubish having all these IFs here but its easy to understand for novice R users if(modelSequence[i] == 1){ results <- c(results,model1ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 2){ results <- c(results,model2ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 3){ results <- c(results,model3ReturnsFunc(highVolSequence[i])) } } return(results) } #Create some out of sample data actualModelSequence <- c(1,1,1,3,2,2,1) actualVolRegime <- c(T,T,T,T,T,T,T) outOfSampleData <- generateDataFunc(actualModelSequence,actualVolRegime) #Will take 50 days of data and calculate the rolling log likelihood for each HMM model model1Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model1Fit,x)$LLH}) model2Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model2Fit,x)$LLH}) model3Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model3Fit,x)$LLH}) layout(1:3) plot(cumsum(outOfSampleData),main="Fake Market Data",ylab="Cumulative Returns",type="l") plot(model1Likelihood,type="l",ylab="Log Likelihood of Each Model",main="Log Likelihood for each HMM Model") lines(model2Likelihood,type="l",col="red") lines(model3Likelihood,type="l",col="blue") plot(rep((actualModelSequence==3)*3,each=100),col="blue",type="o",ylim=c(0.8,3.1),ylab="Actual MODEL Number",main="Actual MODEL Sequence") lines(rep((actualModelSequence==2)*2,each=100),col="red",type="o") lines(rep((actualModelSequence==1)*1,each=100),col="black",type="o") legend(x='topright', c('Model 1 - Bull Mkt','Model 2 - Bear Mkt','Model 3 - Side ways Mkt'), col=c("black","red","blue"), bty='n',lty=c(1,1,1)) |