Animation in R – Bouncing Ball Simulation

This post fill focus on how to create an animation to visualise the simulation of a physical system, in this case a bouncing ball. Whilst this post is unrelated to trading it will form the basis of future articles. In my next post I will show how to simulate the classic pole balancing / inverted pendulum problem. Machine learning will then be applied to develop a control system for the dynamic systems.

A video of the simulation is below:

Creating Animations

The R package “animation” has been used to create videos of the simulated process. This package requires that FFMpeg is installed on your machine and added to your environmental path. To learn how to add items to your path follow this tutorial at Geeks With Blogs.

The code below demonstrated how to generate a video:

?View Code RSPLUS
oopt = ani.options(ani.width = 1200, ani.height = 800, other.opts = "-pix_fmt yuv420p -b 600k")
saveVideo(runSimulationFunc(),interval = simulation.timestep,ani.options=oopt,"bounce.mp4")
  • ani.width is the width of the video
  • ani.height is the height of the video
  • other.opts are command line arguments that are passed to ffmpeg and can be used to control the bitrate and other quality settings
  • interval specifies in seconds how long to wait between frames
  • runSimulationFunc() is a function that should run your simulation, and charts plotted during the simulation will be added to the video

Drawing Graphics

I have written some functions to make drawing basic graphics easy.

  • createSceneFunc(bottomLeftX, bottomLeftY, width,height) creates a brand new scene to draw objects on, bottomLeftX and bottomLeftY are Cartesian co-ordinates to specify the bottom left corner of the canvas. The width and height variables are used to specify the canvas dimensions.
  • createBoxFunc(topLeftX, topLeftY, width, height, fillColour) draws a box to the current canvas, topLeftX and topLeftY specify the Cartesian co-ordinate of the top left of the box, width and height specify the dimensions and fillColour specifies the colour that fills in the box.
  • createCircleFunc(centerX,centerY,radius,fillColour) draws a circle to the current canvas, centerX and centerY specify the Cartesian co-ordinate of the center on the circle, the radius specifies the radius of the circle and fillColour specifies the colour that fills in the circle.

Simulation Dynamics

The following single period update equations are used:
Position_{t+\Delta t}=Position_{t}+Velocity_{t}*\Delta t
Velocity_{t+\Delta t}=Velocity_{t}+Acceleration_{t}*\Delta t

When a collision is made between the ball and the platform the following update is used:
Velocity_{t+\Delta t}=-\kappa*Velocity_{t}
\kappa=\text{Coefficient of restitution}

Onto the code:

?View Code RSPLUS
library("animation")  #Library to save GIFs
#Function to create a blank canvas / scene for drawing objects onto later
createSceneFunc <- function(bottomLeftX, bottomLeftY, width,height,main="",xlab="",ylab="",ann=T,xaxt=NULL,yaxt=NULL){
        plot(c(bottomLeftX, width), c(bottomLeftY,height), type = "n",ann=ann, xaxt=xaxt, yaxt=yaxt,main=main,xlab=xlab,ylab=ylab )
#Function to draw a box on the scene
createBoxFunc <- function(topLeftX, topLeftY, width, height, fillColour=NA, borderColour="black"){
              col = fillColour, border=borderColour)
#Function to draw a circle on the scene
createCircleFunc <- function(centerX,centerY,radius,fillColour=NA, borderColour="black"){
#Parameters to control the simulation
simulation.timestep = 0.02
simulation.gravity = 1.8
simulation.numoftimesteps = 2000
#Define the size of the scene (used to visualise what is happening in the simulation)
scene.bottomLeftX = 0
scene.bottomLeftY = -1
scene.width = 10
#This is the object the bouncing ball is going to hit
platform.x = scene.bottomLeftX+1
platform.y = 0
platform.width= scene.width - 2
platform.colour = "red"
#This is just a box resting on the left end of the platform to practise drawing things
leftbluebox.x = platform.x
leftbluebox.y = platform.y+0.5
leftbluebox.colour = "blue"
#This is just a box resting on the right end of the platform to practise drawing things
rightbluebox.x = platform.x+platform.width-0.5
rightbluebox.y = platform.y+0.5
rightbluebox.colour = "blue"
#This is the ball that is going to be bouncing
ball.radius = 0.25
ball.yvelocity = 0
ball.yacceleration = 0
ball.coefficientofrestitution = 0.85 #This is a physics term to describe how much velocity as pct is kept after a bounce
#Some variables to store various time series values of the simulation
logger.yposition = rep(NA,simulation.numoftimesteps)
logger.yvelocity = rep(NA,simulation.numoftimesteps)
logger.yacceleration = rep(NA,simulation.numoftimesteps)
#Some settings to control the charts used to plot the logged variables
plotcontrol.yposition.ylim = c(0,10)
plotcontrol.yvelocity.ylim = c(-6,6)
plotcontrol.yacceleration.ylim = c(-simulation.gravity,400)
runSimulationFunc <- function(){
  #Main simulation loop
  for(i in seq(1,simulation.numoftimesteps)){
     #Equations of motion
     ball.yacceleration = -simulation.gravity
     ball.y = ball.y + ball.yvelocity*simulation.timestep
     ball.yvelocity = ball.yvelocity+ball.yacceleration*simulation.timestep
     #Logic to check is there has been a collision between the ball and the platform
     if(ball.y-ball.radius <= platform.y){
        #There has been a collision
        newyvelocity = -ball.yvelocity*ball.coefficientofrestitution
        ball.yacceleration = (newyvelocity - ball.yvelocity)/simulation.timestep
        ball.yvelocity = newyvelocity
        ball.y = ball.radius+platform.y
     #Log the results of the simulation
     logger.yposition[i] <- ball.y
     logger.yvelocity[i] <- ball.yvelocity
     logger.yacceleration[i] <- ball.yacceleration
    #Plot the simulation
    #The layout command arranges the charts
    layout(matrix(c(1,2,1,3,1,4,5,5), 4, 2, byrow = TRUE),heights=c(3,3,3,1))
    par(mar=c(3,4,2,2) + 0.1)
    #Create the scene and draw the various objects
    #Create the scene
                 main="Simulation of Bouncing Ball -",xlab="",ylab="Ball Height",xaxt="n")
    #Draw the platform the ball lands on
    #Draw a box on the left off the platform
    #Draw a box on the right of the platform
    #Draw the ball
    #Plot the logged variables
    plot(logger.yposition, type="l", ylim=plotcontrol.yposition.ylim, ylab="Y POSITION")
    plot(logger.yvelocity, type="l", ylim=plotcontrol.yvelocity.ylim, ylab="Y VELOCITY")
    plot(logger.yacceleration, type="l",ylim=plotcontrol.yacceleration.ylim, ylab="Y ACCELERATION")
    #Plot a progress bar
    plot(-5, xlim = c(1,simulation.numoftimesteps), ylim = c(0, .3), yaxt = "n", xlab = "", ylab = "", main = "Iteration")
    abline(v=i, lwd=5, col = rgb(0, 0, 255, 255, maxColorValue=255))
oopt = ani.options(ani.width = 1200, ani.height = 800, other.opts = "-pix_fmt yuv420p -b 600k")
saveVideo(runSimulationFunc(),interval = simulation.timestep,ani.options=oopt,"bounce.mp4")

Hidden Markov Models – Trend Following – Part 4 of 4

Update: There was a look forward bug in the code when calculating outOfSampleLongReturns and outOfSampleShortReturns, this has been corrected and sadly reduced the sharpe ratio from 3.08 to 0.857. This is why I always publish the code to provide greater transparency than other sites

Part 3 of this series demonstrated how to train a HMM on a toy model, this post will focus on how to actually go about modelling real life data. A trend following strategy will be developed for trading the S&P 500.

In most machine learning classification problems you need a set of training data which has class labels. Whilst we have the market data we don’t have class labels, the first task is therefor to generate the class labels for the training data.

We are wanting to develop a trend following strategy, we need to select parts of the S&P 500 time series and label them as either in an up trend or down trend (could also have no trend label). In theory you could do this by hand, however this would be not be feasible to do over the whole universe of stocks. Instead we can write a program to automatically classify the data for us, how you classify data as in a trend depends upon your definition of trend.

The labels are seen in the chart below, and price with green underneath means it’s been labelled as a long signal, any price with red above it means that it has the short label.


In the code provided I check to see if the current price is the lowest price for the next 10 periods (variable NDayLookforwardLowHigh), if it is then go long, if the variable is the highest over the next 10 periods then close the long. Do the reverse for shorts. It’s a little crude and I imagine there are significantly better ways to classify trends, might be useful to put a volatility constraint on the trends (should hopefully filter through and improve the sharpe ratio).

In addition to labeling the data a set of vector “features” must be generated, the features should contain variables that are beneficial to detecting trends. The feature vector I used has the ratios of open to close price, open to high price, open to low price and all the intermediate combinations. Often it is desirable to model the dynamics of these variables and put the one period change in these variables inside of the feature vector.


The above image shows the likelihood of each market regime given the HMM trained on the same data set. It is reassuring to see that the Long Regime became very unlikely during the crash of 2008.

One of the excellent properties of HMM is that they allow the modelling of situations that have different duration but are of the same class. For example a trend might last for 10 days, and another trend might last for 35 days, we can pass both of these examples into the HMM and it will try and model the duration difference using the internal state transition probabilities.

Out of sample results:


Sharpe Ratio of 0.857

Model Fit:

Long Model

HMMFit(obs = inSampleLongFeaturesList, nStates = 3)

3 states HMM with 7-d gaussian distribution

Baum-Welch algorithm status:
Number of iterations : 21
Last relative variation of LLH function: 0.000001


Initial probabilities:
          Pi 1      Pi 2      Pi 3
  1.690213e-47 0.3734194 0.6265806

Transition matrix:
          State 1   State 2    State 3
State 1 0.4126480 0.3419075 0.24544450
State 2 0.1116068 0.8352273 0.05316591
State 3 0.5475525 0.2303324 0.22211504

Conditionnal distribution parameters:

Distribution parameters:
  State 1
           mean    cov matrix                                                                                    
     0.01565943  1.922717e-04  1.724953e-04  1.785035e-04 -7.870798e-06 -1.764319e-04 -1.687845e-04  6.898374e-06
     1.02210441  1.724953e-04  1.920546e-04  1.736241e-04  2.327852e-06 -1.615346e-04 -1.768012e-04 -1.651055e-05
     1.01805768  1.785035e-04  1.736241e-04  1.777602e-04  2.663971e-06 -1.653924e-04 -1.595871e-04  5.067094e-06
     1.00264545 -7.870798e-06  2.327852e-06  2.663971e-06  1.095711e-05  8.195588e-06  8.473222e-06  3.178401e-07
     0.98502360 -1.764319e-04 -1.615346e-04 -1.653924e-04  8.195588e-06  1.644647e-04  1.589815e-04 -4.749521e-06
     0.98113485 -1.687845e-04 -1.768012e-04 -1.595871e-04  8.473222e-06  1.589815e-04  1.732240e-04  1.542815e-05
     0.99605695  6.898374e-06 -1.651055e-05  5.067094e-06  3.178401e-07 -4.749521e-06  1.542815e-05  2.064964e-05

  State 2
            mean    cov matrix                                                                                    
     0.001670502  3.103878e-05  3.555352e-06  1.781044e-05 -1.336108e-05 -3.092612e-05 -1.670114e-05  1.410578e-05
     1.009361131  3.555352e-06  1.249497e-05  9.181451e-06  5.644685e-06 -3.464184e-06 -6.714792e-06 -3.238512e-06
     1.005638565  1.781044e-05  9.181451e-06  1.714606e-05 -7.256446e-07 -1.770669e-05 -9.748050e-06  7.905439e-06
     1.003957940 -1.336108e-05  5.644685e-06 -7.256446e-07  1.271107e-05  1.335557e-05  7.009564e-06 -6.279198e-06
     0.998346405 -3.092612e-05 -3.464184e-06 -1.770669e-05  1.335557e-05  3.081873e-05  1.660615e-05 -1.409313e-05
     0.994653572 -1.670114e-05 -6.714792e-06 -9.748050e-06  7.009564e-06  1.660615e-05  1.353501e-05 -3.017033e-06
     0.996315167  1.410578e-05 -3.238512e-06  7.905439e-06 -6.279198e-06 -1.409313e-05 -3.017033e-06  1.101157e-05

  State 3
            mean    cov matrix                                                                                    
     -0.01296153  1.481273e-04 -8.848326e-05  2.231101e-05 -1.286390e-04 -1.503760e-04 -3.945685e-05  1.032046e-04
      1.02416584 -8.848326e-05  1.349494e-04  2.353928e-05  1.150009e-04  9.081677e-05 -1.853468e-05 -1.027771e-04
      1.00458706  2.231101e-05  2.353928e-05  3.596043e-05  1.384302e-05 -2.234068e-05 -9.215909e-06  1.259424e-05
      1.01746162 -1.286390e-04  1.150009e-04  1.384302e-05  1.485395e-04  1.338477e-04  3.324363e-05 -9.313984e-05
      1.01283801 -1.503760e-04  9.081677e-05 -2.234068e-05  1.338477e-04  1.555764e-04  4.225851e-05 -1.053565e-04
      0.99347206 -3.945685e-05 -1.853468e-05 -9.215909e-06  3.324363e-05  4.225851e-05  5.013390e-05  8.770049e-06
      0.98098355  1.032046e-04 -1.027771e-04  1.259424e-05 -9.313984e-05 -1.053565e-04  8.770049e-06  1.075952e-04

Log-likelihood: 64226.28
BIC criterium: -127633.2
AIC criterium: -128226.6

Short Model

HMMFit(obs = inSampleShortFeaturesList, nStates = 3)

3 states HMM with 7-d gaussian distribution

Baum-Welch algorithm status:
Number of iterations : 20
Last relative variation of LLH function: 0.000001


Initial probabilities:
          Pi 1      Pi 2      Pi 3
  3.784166e-15 0.1476967 0.8523033

Transition matrix:
          State 1    State 2   State 3
State 1 0.4760408 0.12434214 0.3996170
State 2 0.3068272 0.54976794 0.1434049
State 3 0.5243733 0.06315371 0.4124730

Conditionnal distribution parameters:

Distribution parameters:
  State 1
             mean    cov matrix                                                                                    
     -0.009827328  5.102050e-05 -2.137629e-05  1.047989e-05 -3.999886e-05 -5.043558e-05 -1.834830e-05  3.090145e-05
      1.016726031 -2.137629e-05  2.915402e-05  4.001168e-06  2.551672e-05  2.145395e-05 -3.484775e-06 -2.426317e-05
      1.002951135  1.047989e-05  4.001168e-06  1.070077e-05  4.343448e-07 -1.032399e-05 -3.489498e-06  6.607115e-06
      1.012808350 -3.999886e-05  2.551672e-05  4.343448e-07  4.073945e-05  4.019962e-05  1.502427e-05 -2.422162e-05
      1.009838504 -5.043558e-05  2.145395e-05 -1.032399e-05  4.019962e-05  5.047663e-05  1.847048e-05 -3.082551e-05
      0.996150195 -1.834830e-05 -3.484775e-06 -3.489498e-06  1.502427e-05  1.847048e-05  1.816514e-05 -3.045290e-08
      0.986475577  3.090145e-05 -2.426317e-05  6.607115e-06 -2.422162e-05 -3.082551e-05 -3.045290e-08  2.992073e-05

  State 2
             mean    cov matrix                                                                                    
     -0.005393441  0.0008501205 -1.231927e-04  3.413652e-04 -0.0004927836 -0.0008165848 -3.496732e-04  4.379093e-04
      1.037824136 -0.0001231927  3.546602e-04  7.820615e-05  0.0002160609  0.0001422901 -1.206772e-04 -2.508658e-04
      1.013889133  0.0003413652  7.820615e-05  2.198099e-04 -0.0001068837 -0.0003166324 -1.713360e-04  1.374691e-04
      1.019557602 -0.0004927836  2.160609e-04 -1.068837e-04  0.0003949026  0.0004952245  1.733093e-04 -2.998924e-04
      1.005903113 -0.0008165848  1.422901e-04 -3.166324e-04  0.0004952245  0.0007960268  3.345499e-04 -4.319633e-04
      0.982515481 -0.0003496732 -1.206772e-04 -1.713360e-04  0.0001733093  0.0003345499  2.726893e-04 -5.242376e-05
      0.977179046  0.0004379093 -2.508658e-04  1.374691e-04 -0.0002998924 -0.0004319633 -5.242376e-05  3.611514e-04

  State 3
            mean    cov matrix                                                                                    
     0.003909801  2.934983e-05  1.656072e-05  2.335206e-05 -5.814278e-06 -2.877715e-05 -2.196575e-05  6.761736e-06
     1.010126341  1.656072e-05  1.900251e-05  1.943331e-05  2.909245e-06 -1.625496e-05 -1.574899e-05  4.694645e-07
     1.007300746  2.335206e-05  1.943331e-05  2.554298e-05  2.309188e-06 -2.288733e-05 -1.676155e-05  6.090567e-06
     1.003413014 -5.814278e-06  2.909245e-06  2.309188e-06  8.147512e-06  5.796633e-06  5.191743e-06 -5.902698e-07
     0.996163167 -2.877715e-05 -1.625496e-05 -2.288733e-05  5.796633e-06  2.830213e-05  2.164898e-05 -6.603805e-06
     0.993369564 -2.196575e-05 -1.574899e-05 -1.676155e-05  5.191743e-06  2.164898e-05  2.055797e-05 -1.040800e-06
     0.997202266  6.761736e-06  4.694645e-07  6.090567e-06 -5.902698e-07 -6.603805e-06 -1.040800e-06  5.564118e-06

Log-likelihood: 47728.08
BIC criterium: -94666.79
AIC criterium: -95230.16

Onto the code:

?View Code RSPLUS
library('RHmm') #Load HMM package
#Specify dates for downloading data
startDate = as.Date("2000-01-01") #Specify what date to get the prices from
trainingEndDate = as.Date("2010-01-01") # Specify the date we take as our traning sample
NDayLookforwardLowHigh <- 10 #Parameter used when classifing in sample data as in a trend or not
HmmLikelihoodTestLength <- 5 #How many days of data to calculate the likehood ratio on to compare models
symbolData <- new.env() #Make a new environment for quantmod to store data in
symbol <- "^GSPC" #S&p 500
getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate)
mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE))))
mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo
inSampleMktData <-  window(mktdata,start=startDate ,end=trainingEndDate)
outOfSampleMktData <-  window(mktdata,start=trainingEndDate+1)
dailyRet <- Delt(Cl(mktdata),k=1,type="arithmetic") #Daily Returns
dailyRet[] <-0.00001
inSampleDailyRet <-  window(dailyRet,start=startDate ,end=trainingEndDate)
outOfSampleDailyRet <-  window(dailyRet,start=trainingEndDate+1)
ConvertTofullSignal <- function(signal){
    results <- rep(0,length(signal))
    intrade <- F
    for(i in seq(1,length(signal))){
         intrade <- F
        if(signal[i]==1 || intrade){
           intrade <- T
#Generate long trend signal
LongTrendSignal <- rep(0,nrow(inSampleMktData))
for(i in seq(1,nrow(inSampleMktData)-NDayLookforwardLowHigh)){
    dataBlock <- Cl(inSampleMktData[seq(i,i+NDayLookforwardLowHigh),])
    if(coredata(Cl(inSampleMktData[i,])) == min(coredata(dataBlock))){
      LongTrendSignal[i] <- 1
    if(coredata(Cl(inSampleMktData[i,])) == max(coredata(dataBlock))){
      LongTrendSignal[i] <- -1
LongTrendSignal <- ConvertTofullSignal(LongTrendSignal)
#Generate short trend signal
ShortTrendSignal <- rep(0,nrow(inSampleMktData))
for(i in seq(1,nrow(inSampleMktData)-NDayLookforwardLowHigh)){
    dataBlock <- Cl(inSampleMktData[seq(i,i+NDayLookforwardLowHigh),])
    if(coredata(Cl(inSampleMktData[i,])) == max(coredata(dataBlock))){
      ShortTrendSignal[i] <- 1
    if(coredata(Cl(inSampleMktData[i,])) == min(coredata(dataBlock))){
      ShortTrendSignal[i] <- -1
ShortTrendSignal <- ConvertTofullSignal(ShortTrendSignal)
#Plot our signals
LongTrendSignalForPlot <- LongTrendSignal
LongTrendSignalForPlot[LongTrendSignalForPlot==0] <- NaN
LongTrendSignalForPlot <- Cl(inSampleMktData)*LongTrendSignalForPlot - 100
inSampleLongTrendSignalForPlot <-LongTrendSignalForPlot
ShortTrendSignalForPlot <- ShortTrendSignal
ShortTrendSignalForPlot[ShortTrendSignalForPlot==0] <- NaN
ShortTrendSignalForPlot <- Cl(inSampleMktData)*ShortTrendSignalForPlot + 100
inSampleShortTrendSignalForPlot <- ShortTrendSignalForPlot
plot(Cl(inSampleMktData), main="S&P 500 Trend Follow In Sample Training Signals")
legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"),  fill=c("black","green","red"), bty='n')
#Calculate Returns
LongReturns <- Lag(LongTrendSignal)* (inSampleDailyRet)
LongReturns[] <- 0
ShortReturns <- Lag(-1*ShortTrendSignal)* (inSampleDailyRet)
ShortReturns[] <- 0
TotalReturns <- LongReturns + ShortReturns
plot(cumsum(TotalReturns),main="S&P 500 Trend Follow In Sample HMM Training Signals Strategy Returns")
legend(x='bottomright', c("Total Returns","Long Trend Returns","Short Trend Returns"),  fill=c("black","green","red"), bty='n')
#Extracts a list of varying length features for each signal/class label
CreateListOfMatrixFeatures <- function(features,signal){
      results <- list()
      extract <- F
      for(i in seq(1,length(signal))){
          if(signal[i]==1 && !extract){
                startIndex <- i
                extract <- T
          if(signal[i]==0 && extract){
              endIndex <- i-1
              dataBlock <- features[startIndex:endIndex,]
              extract <- F
              results[[length(results)+1]] <- as.matrix(dataBlock)
#HMM Training
#Generate the features that describe the data & split into training and out of sample sets
features <- cbind(dailyRet,Hi(mktdata)/Lo(mktdata),Hi(mktdata)/Op(mktdata),Hi(mktdata)/Cl(mktdata),Op(mktdata)/Cl(mktdata),Lo(mktdata)/Cl(mktdata),Lo(mktdata)/Op(mktdata))
inSampleTrainingFeatures <-  window(features,start=startDate ,end=trainingEndDate)
outOfSampleFeatures <- window(features,start=trainingEndDate+1)
#For each long / short position extract the corresponding features data and create list of them
inSampleLongFeaturesList <- CreateListOfMatrixFeatures(inSampleTrainingFeatures,LongTrendSignal)
inSampleShortFeaturesList <- CreateListOfMatrixFeatures(inSampleTrainingFeatures,ShortTrendSignal)
#Train the HMM models
LongModelFit = HMMFit(inSampleLongFeaturesList, nStates=3)
ShortModelFit = HMMFit(inSampleShortFeaturesList, nStates=3)
#Will take NDayLookforwardLowHigh days of data and calculate the rolling log likelihood for each HMM model
inSampleLongLikelihood <- rollapply(inSampleTrainingFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(LongModelFit,as.matrix(x))$LLH})
inSampleShortLikelihood <- rollapply(inSampleTrainingFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(ShortModelFit,as.matrix(x))$LLH})
outOfSampleLongLikelihood <- rollapply(outOfSampleFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(LongModelFit,as.matrix(x))$LLH})
outOfSampleShortLikelihood <- rollapply(outOfSampleFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(ShortModelFit,as.matrix(x))$LLH})
#Create signals for plot / trading
outOfSampleLongTrendSignalForPlot <- 1*(outOfSampleLongLikelihood > outOfSampleShortLikelihood)
outOfSampleLongTrendSignalForPlot[outOfSampleLongTrendSignalForPlot==0] <- NaN
outOfSampleLongTrendSignalForPlot <- outOfSampleLongTrendSignalForPlot*Cl(outOfSampleMktData)-100
outOfSampleShortTrendSignalForPlot <- 1*(outOfSampleLongLikelihood < outOfSampleShortLikelihood)
outOfSampleShortTrendSignalForPlot <- outOfSampleShortTrendSignalForPlot*Cl(outOfSampleMktData)+100
plot(Cl(inSampleMktData), main="S&P 500 Trend Follow In Sample HMM Training Signals")
legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"),  fill=c("black","green","red"), bty='n')
#tt <- Cl(inSampleMktData)
#tt[,1] <- inSampleLongLikelihood
plot(inSampleLongLikelihood,main="Log Likelihood of each HMM model - In Sample")
legend(x='bottomright', c("Long HMM Likelihood","Short HMM Likelihood"),  fill=c("green","red"), bty='n')
plot(Cl(outOfSampleMktData), main="S&P 500 HMM Trend Following Out of Sample")
legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"),  fill=c("black","green","red"), bty='n')
#tt <- Cl(outOfSampleMktData)
#tt[,1] <- outOfSampleLongLikelihood
plot(outOfSampleLongLikelihood,main="Log Likelihood of each HMM model - Out Of Sample")
legend(x='bottomright', c("Long HMM Likelihood","Short HMM Likelihood"),  fill=c("green","red"), bty='n')
#Calculate Out of Sample Returns
outOfSampleLongReturns <- Lag((1*(outOfSampleLongLikelihood > outOfSampleShortLikelihood)))* (outOfSampleDailyRet)
outOfSampleLongReturns[] <- 0
outOfSampleShortReturns <- Lag(-1*(1*(outOfSampleLongLikelihood < outOfSampleShortLikelihood)))* (outOfSampleDailyRet)
outOfSampleShortReturns[] <- 0
outOfSampleTotalReturns <- outOfSampleLongReturns + outOfSampleShortReturns
outOfSampleTotalReturns[] <- 0
plot(cumsum(outOfSampleTotalReturns),main="S&P 500 HMM Trend Following Out of Sample Strategy Returns")
legend(x='bottomright', c("Total Returns","Long Trend Returns","Short Trend Returns"),  fill=c("black","green","red"), bty='n')

Hidden Markov Models – Examples In R – Part 3 of 4

This post will explore how to train hidden markov models in R. The previous posts in this series detailed the maths that power the HMM, fortunately all of this has been implemented for us in the RHmm package. HMMs can be used in two ways for regime detection, the first is to use a single HMM where each state in the HMM is considered a “regime”. The second method is to have multiple HMMs each designed to model an individual regime, the task is then to chose between models by looking at which is the most likely to have generated the data. I will explore both methods.

Method One – Single HMM Each State is a Regime


The credit for this section must go to the fantastic Systematic Investor blog The code is well commented and should be self explanatory. Essentially two markets regimes (bull and bear) are simulated, a 2 state HMM is then trained on the data. The forward backward algorithm is then used to calculate the probability of being in a given state at any given time.

Method Two – Multiple HMMs with multiple states – Each HMM a regime



Three market regimes are simulated; bull, bear and a sideways market. Three different 2 stage HMM models are trained on each regime. Model 1 is the HMM for the bull market, Model 2 is the HMM for the bear market, and Model 3 is the HMM for a side ways market. A rolling window of 50 days worth of data is passed into each HMM and a log likelihood score produced. The higher the log likelihood the more likely it is that the model generated the observed data.

As can be seen in the above chart, the log likelihood is fairly decent for determining the difference between the bull and bear markets. Sadly the side ways model seems very likely in both the bull and bear cases, it’s log likelihood is fairly stable and doesn’t change per regime.

Code for method 1:

?View Code RSPLUS
library('RHmm') #Load HMM package
#Code based upon
bullMarketOne = rnorm( 100, 0.1/365, 0.05/sqrt(365) )
bearMarket  = rnorm( 100, -0.2/365, 0.15/sqrt(365))
bullMarketTwo = rnorm( 100, 0.15/365, 0.07/sqrt(365) )
true.states = c(rep(1,100),rep(2,100),rep(1,100))
returns = c( bullMarketOne, bearMarket, bullMarketTwo )
ResFit = HMMFit(y, nStates=2) #Fit a HMM with 2 states to the data
VitPath = viterbi(ResFit, y) #Use the viterbi algorithm to find the most likely state path (of the training data)
fb = forwardBackward(ResFit, y) #Forward-backward procedure, compute probabilities
# Plot probabilities and implied states
plot(cumsum(returns),ylab="Cumulative Market Return",type="l", main="Fake Market Data")
plot(VitPath$states, type='s', main='Implied States', xlab='', ylab='State')
matplot(fb$Gamma, type='l', main='Smoothed Probabilities', ylab='Probability')
legend(x='topright', c('Bear Market - State 2','Bull Market - State 1'),  fill=1:2, bty='n')

Code for method 2:

?View Code RSPLUS
library('RHmm') #Load HMM package
#HMM model 1 (high vol and low vol upwards trend)
model1ReturnsFunc <- function(isHighVol){
  return(rnorm( 100, 0.1,if(isHighVol){0.15}else{0.02}))
bullLowVol = model1ReturnsFunc(F)
bullHighVol  = model1ReturnsFunc(T)
model1TrainingReturns = c(bullLowVol, bullHighVol)
Model1Fit = HMMFit(model1TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data
#HMM model 2 (high vol and low vol downwards trend)
model2ReturnsFunc <- function(isHighVol){
  return(rnorm( 100, -0.1,if(isHighVol){0.15}else{0.02}))
bearLowVol = model2ReturnsFunc(F)
bearHighVol  = model2ReturnsFunc(T)
model2TrainingReturns = c(bearLowVol, bearHighVol)
Model2Fit = HMMFit(model2TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data
#HMM model 3 (sideways market)
model3ReturnsFunc <- function(isHighVol){
  return(rnorm( 100, 0.0,if(isHighVol){0.16}else{0.08}))
sidewaysLowVol = model3ReturnsFunc(F)
sidewaysHighVol  = model3ReturnsFunc(T)
model3TrainingReturns = c(sidewaysLowVol, sidewaysHighVol)
Model3Fit = HMMFit(model3TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data
generateDataFunc <- function(modelSequence,highVolSequence){
  results <- c()
  if(length(modelSequence) != length(highVolSequence)){ print("Model Sequence and Vol Sequence must be the same length"); return(NULL)}
  for(i in 1:length(modelSequence)){
    #Bit rubish having all these IFs here but its easy to understand for novice R users
    if(modelSequence[i] == 1){
       results <- c(results,model1ReturnsFunc(highVolSequence[i]))
    if(modelSequence[i] == 2){
       results <- c(results,model2ReturnsFunc(highVolSequence[i]))
    if(modelSequence[i] == 3){
       results <- c(results,model3ReturnsFunc(highVolSequence[i]))
#Create some out of sample data
actualModelSequence <- c(1,1,1,3,2,2,1)
actualVolRegime <- c(T,T,T,T,T,T,T)
outOfSampleData <- generateDataFunc(actualModelSequence,actualVolRegime)
#Will take 50 days of data and calculate the rolling log likelihood for each HMM model
model1Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model1Fit,x)$LLH})
model2Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model2Fit,x)$LLH})
model3Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model3Fit,x)$LLH})
plot(cumsum(outOfSampleData),main="Fake Market Data",ylab="Cumulative Returns",type="l")
plot(model1Likelihood,type="l",ylab="Log Likelihood of Each Model",main="Log Likelihood for each HMM Model")
plot(rep((actualModelSequence==3)*3,each=100),col="blue",type="o",ylim=c(0.8,3.1),ylab="Actual MODEL Number",main="Actual MODEL Sequence")
legend(x='topright', c('Model 1 - Bull Mkt','Model 2 - Bear Mkt','Model 3 - Side ways Mkt'), col=c("black","red","blue"), bty='n',lty=c(1,1,1))

Hidden Markov Models – Forward & Viterbi Algorithm Part 2 of 4

In the previous post the Hidden Markov Model was defined, however efficient algorithms are need to calculate some the probabilities / perform the marginalisation over hidden states. Two algorithms that can be used are the forward algorithm and the Viterbi algorithm.

The forward algorithm calculates the likelihood of the data given the model over all possible state sequences.

The Viterbi algorithm calculates the likelihood of the data given the model over the single most likely state sequence.

The forward algorithm

The forward algorithm allows for efficient calculation of the likelihood function p(\mathbf{O}|\lambda).

The forward variable is the likelihood of the HMM producing all the observations up to time t
and occupying state j at time t, it is defined as:

It is calculated recursively by calculating the forward variable for time t-1 being in state k and then calculating the probability of moving to state j at time t:

Where a_{kj} is the probability of a jump from state k to state j, and b_{j}(\mathbf{o_{t}}) is the probability of generating feature vector \mathbf{o_{t}} from state j.


0.1 Forward Algorithm Initialisation

\alpha_{1}(0)=1,\alpha_{j}(0)=0 for 1<j\leq N and \alpha_{1}(t)=0 for 1<t\leq T

0.2 Recursion

For t=1,2,...,T
…… For j=2,3,...,N-1

0.3 Termination


The Viterbi Algorithm

The forward algorithm calculated p(\mathbf{O}|\lambda) by summing over all state sequences, it is sometimes preferable to approximate p(\mathbf{O}|\lambda) which used all state sequences with \hat{p}(\mathbf{O}|\lambda) which will use the single most likely state sequence. This is known as the Viterbi algorithm, the algorithm finds the most likely state sequence.

\hat{p}(\mathbf{O}|\lambda)=max_{X}[p(\mathbf{O},X|\lambda)]\text{ Where \ensuremath{X} is the most likely state sequence}

The probability of the best partial path of length t through the HMM ended at state j is defined as: \phi_{j}(t)=max_{X^{(t-1)}}[p(\mathbf{o_{1},...,o_{t},}x(t)=j|\lambda)]. Where X^{(t-1)} is the best partial path / state sequence.

As with the forward variable \phi
can be calculated recursively \phi_{j}(t)=max_{i}[\phi_{i}(t-1)a_{ij}b_{j}(\mathbf{o_{t}})]

0.4 Viterbi Algorithm Initialisation

\phi_{1}(0)=1,\phi_{j}(0)=0 for 1<j<N and \phi_{1}(t)=0 for 1\leq t\leq T

0.5 Recursion

For t=1,2,...,T
…… For j=2,3,...,N-1
……………..\phi_{j}(t)=max_{1\leq k<N}[\phi_{k}(t-1)a_{kj}]b_{j}(\mathbf{o_{t}})
…………….. store the preceding node pred(j,t)=k

0.6 Termination

store the preceding node pred(N,T)=k

The most likely path is found by following the preceding node information backwards that is stored in pred(j,t)

Underflow Errors

The direct calculation of p(\mathbf{O}|\lambda) will most likely cause arithmetic underflow errors. The probabilities can become so small that the computer is unable to calculate them correctly. You should instead calculate the log likelihood e.g log(p(\mathbf{O}|\lambda))

Hidden Markov Models – Model Description Part 1 of 4

Hidden Markov Models

This post will develop a general framework for classification tasks using hidden markov models. The tutorial series will cover how to build and train a hidden markov models in R. Initially the maths will be explained, then an example in R provided and then an application on financial data will be explored.

General Pattern Recognition Framework

A set of features \mathbf{o} are derived from data set \mathbf{d} and a class \omega identified by finding the most likely class given the data \mathbf{o}


However P(\omega|\mathbf{o}) is unknown, so Bayes’ rule must be used.


Since the maximisation does not depend upon P(\mathbf{o}) we can ignore it. The terms P(\mathbf{o}|\omega) and P(\omega) , are the likelihood of the data given the class and prior probability of a class respective, both terms are defined by a model. The feature model P(\mathbf{o}|\omega) will be described by the hidden markov model (HMM), each class will have it’s own HMM.

The Task at Hand

First we need to generate a set of features \mathbf{o} from the raw data \mathbf{d}. I will skip this step for now because it is specific to the application of your hidden markov model, for example in finance \mathbf{d} may be various stock prices and \mathbf{o} could be a set of technical indicators / volatility calculations applied to the data \mathbf{d}. HMM’s are popular in speech recognition and typically \mathbf{o} is a vector describing the characteristics of the frequency spectrum of the speech.

Secondly the feature vector \mathbf{o} must then be assigned a class from the HMM. This is done the via maximum likelihood estimation, the HMM is a generative model, choose the class that is most likely to have generated the feature vector \mathbf{o}.
For finance the class might be a market regime (trending/mean reverting) or in speech recognition the class is a word.

Example HMM Specification

hidden markov model

N The number of states in the HMM

a_{ij} The probability of transitioning from state i to state j

b_{j}(\mathbf{o}) The probability of generating feature vector \mathbf{o} upon entering state j (provided j is not the entry or exit state)

The HMM \lambda may be written as \lambda=[N,a_{ij},b_{j}]

\mathbf{O}=[\mathbf{o_{1},}\mathbf{o_{2},o_{T}]} the observed feature vectors

X=[x_{1},x_{2},x_{T}] the specified state sequence

The joint probability is the probability of jumping from one state to the next multiplied by the prob of generating the feature vector in that state:


Where x_{0} is always the entry state 1, and x_{T+1} is always the exit state N.

Likelihood Calculation

In the above joint probability calculation we have assumed a state sequence X. However this is a latent variable, we do not know it, it is hidden (hence the name HIDDEN markov model)! However if we sum over all possible state sequences we can marginalise it out.


This can be problematic due to the number of possible state sequences (especially in a real-time application), luckily algorithms exist to effectively perform the calculation without needing to explore every state sequence. One such algorithm is the forward algorithm.

What is b_{j}(\mathbf{o})?

This is the output distribution for a given state j. The distribution can be anything you like however it should hopefully match the distribution of the data at state j, and it must be mathematically tractable. The most natural choice at this stage is to assume \mathbf{o} can be described by the multivariate Gaussian. As a word of caution if the elements of your feature vector are highly correlated then \Sigma, the covariance matrix, has a lot of parameters to measure. See if you can collapse \Sigma
to a diagonal matrix.

E.g b_{j}(\mathbf{o})\sim N(\mathbf{o};\mu_{j},\Sigma_{j})

How to train b_{j}(\mathbf{o}) / Viterbi Parameter Estimation

We already know how to fit a normal distribution, the MLE for \mu is the mean, and \Sigma the covariance of the feature vector. However we must only calculate the mean and covariance on feature vectors that came from state j, this is known as Viterbi Segmentation. Viterbi Segmentation means there is a hard assignment between feature vector and the state that generated it, an alternative method is called Balm-Welch which probabilistically assigns feature vectors to multiple states.

State j generated observations starting at t_{j}



It is not known in advance which state generated which observation vector, fortunately there is an algorithm called the Viterbi algorithm to approximately solve this problem.

hmm training outline

The forward algorithm for efficient calculation of p(\mathbf{O}|\lambda) and the Viterbi algorithm will be explored in my next post.