The first part of this tutorial can be found here.

**Tracking Gene History through Innovation Numbers**

Part 1 showed two mutations, link mutate and node mutate which both added new genes to the genome. Each time a new gene is created (through a topological innovation) a global innovation number is incremented and assigned to that gene.

The global innovation number is tracking the historical origin of each gene. If two genes have the same innovation number then they must represent the same topology (although the weights may be different). This is exploited during the gene crossover.

**Genome Crossover (Mating)**

Genomes crossover takes two parent genomes (lets call them A and B) and creates a new genome (lets call it the child) taking the strongest genes from A and B copying any topological structures along the way.

During the crossover genes from both genomes are lined up using their innovation number. For each innovation number the gene from the most fit parent is selected and inserted into the child genome. If both parent genomes are the same fitness then the gene is randomly selected from either parent with equal probability. If the innovation number is only present in one parent then this is known as a disjoint or excess gene and represents a topological innovation, it too is inserted into the child.

The image below shows the crossover process for two genomes of the same fitness.

**Speciation**

Speciation takes all the genomes in a given genome pool and attempts to split them into distinct groups known as species. The genomes in each species will have similar characteristics.

A way of measuring the similarity between two genomes is required, if two genomes are “similar” they are from the same species. A natural measure to use would be a weighted sum of the number of disjoint & excess genes (representing topological differences) and the difference in weights between matching genes. If the weighted sum is below some threshold then the genomes are of the same species.

The advantage of splitting the genomes into species is that during the genetic evolution step where genomes with low fitness are culled (removed entirely from the genome pool) **rather than having each genome fight for it’s place against every other genome in the entire genome pool we can make it fight for it’s place against genomes of the same species**. This way species that form from a new topological innovation that might not have a high fitness yet due to not having it’s weights optimised will survive the culling.

**Summary of whole process**

- Create a genome pool with n random genomes
- Take each genome and apply to problem / simulation and calculate the genome fitness
- Assign each genome to a species
- In each species cull the genomes removing some of the weaker genomes
- Breed each species (randomly select genomes in the species to either crossover or mutate)
- Repeat all of the above

]]>

This algorithm recently went viral in a video called MarI/O where a network was developed that was capable of completing the first level of super mario see the video below.

Typically when one chooses to use a neural network they have to decide how many hidden layers there are, the number of neurons in each layer and what connections exist between the neurons. Depending on the nature of the problem it can be very difficult to know what is a sensible topology. Once the topology is chosen it will most likely be trained using back-propagation or a genetic evolution approach and tested. The genetic evolution approach is essentially searching through the space of connection weights and selecting high performing networks and breeding them (this is known as fixed-topology evolution).

The above approach finds optimal connection weights, it’s then down to an “expert” to manually tweak the topology of the network in an attempt to iteratively find better performing networks.

This led to the development of variable-topology training, where both the connection space and structure space are explored. With this came a host of problems such as networks becoming incredibly bushy and complex slowing down the machine learning process. With the genetic approaches it was difficult to track genetic mutations and crossover structure in a meaningful way.

The NEAT algorithm aims to develop a genetic algorithm that searching through neural network weight and structure space that has the following properties:

- Have genetic representation that allows structure to be crossed over in a meaningful way
- Protect topological innovations that need a few evolutions to be optimised so that it doesn’t disappear from the gene pool prematurely
- Minimise topologies throughout training without specially contrived network complexity penalisation functions

A through treatment of the algorithm can be found in the paper **Evolving Neural Networks through**

**Augmenting Topologies** by Kenneth O. Stanley and Risto Miikkulainen (http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf).

The information about the network is represented by a genome, the genome contains node genes and connection genes. The node genes define nodes in the network, the nodes can be inputs (such as a technical indicator), outputs (such as a buy / sell recommendation), or hidden (used by the network for a calculation). The connection genes join nodes in the network together and have a weight attached to them.

Connection genes have an input node, an output node, a weight, an enabled/disabled flag and an innovation number. The innovation number is used to track the history of a genes evolution and will be explained in more detail in part two.

This post will look at some of the mutations that can happen to the network, it is worth noting that each genome has embedded inside it a mutation rate for each type of mutation that can occur. These mutation rates are also randomly increased or decreased as the evolution progresses.

** Point Mutate**

Randomly updates the weight of a randomly selected connection gene

The updates are either:

New Weight = Old Weight +/- Random number between 0 and genome$MutationRate[[“Step”]]

or

New Weight = Random number between -2 and 2

Randomly adds a new connection to the network with a random weight between -2 and 2

**Node Mutate**

This mutation adds a new node to the network by disabling a connection, replacing it with a connection of weight 1, a node and a connection with the same weight as the disabled connection. In essence it’s been replaced with an identically functioning equivalent.

**Enable Disable Mutate**

Randomly enables and disables connections

]]>

A video of the simulation can be found at:

The derivation of the maths follows the approach outlined in the following video, however I have decided to model the friction between the cart and track.

Resolve the forces on the free body diagrams and set equal to their acceleration

The acceleration of the pendulum is the acceleration of the cart plus the acceleration of the pendulum relative to the cart

Convert the co-ordinate system back into the and components

Substitute the accelerations into equation (1) and (2)

It is undesirable to have an unknown tension T so eliminate using a trick.

Substitute equation (1) into equation (0)

Rearranging equation (6) and (5) gives the system equations in known measurable variables

Both the acceleration terms and depend on each other which is undesirable, substitute the equation for into the equation for to remove the dependency

The system can then be simulated using Euler update equations:

On to the code:

library("animation") #Library to save GIFs #Function to create a blank canvas / scene for drawing objects onto later createSceneFunc <- function(bottomLeftX, bottomLeftY, width,height,main="",xlab="",ylab="",ann=T,xaxt=NULL,yaxt=NULL,xlim=NULL,ylim=NULL){ plot(c(bottomLeftX, width), c(bottomLeftY,height), type = "n",ann=ann, xaxt=xaxt, yaxt=yaxt,xlim=xlim,ylim=ylim,main=main,xlab=xlab,ylab=ylab ) } #Function to draw a box on the scene createBoxFunc <- function(topLeftX, topLeftY, width, height, fillColour=NA, borderColour="black"){ polygon(c(topLeftX,topLeftX+width,topLeftX+width,topLeftX), c(topLeftY,topLeftY,topLeftY-height,topLeftY-height), col = fillColour, border=borderColour) } #Function to draw a circle on the scene createCircleFunc <- function(centerX,centerY,radius,fillColour=NA, borderColour="black"){ symbols(centerX,centerY,circles=radius,inches=F,add=T,fg=borderColour,bg=fillColour) } drawPoleFunc <- function(fixedEnd.x,fixedEnd.y,poleLength, theta,fillColour=NA, borderColour="black"){ floatingEnd.x <- fixedEnd.x+poleLength * sin(theta) floatingEnd.y <- fixedEnd.y+poleLength * cos(theta) polygon(c(fixedEnd.x,floatingEnd.x,floatingEnd.x,fixedEnd.x), c(fixedEnd.y,floatingEnd.y,floatingEnd.y,fixedEnd.y), col = fillColour, border=borderColour) } drawPendulum <- function(fixedEnd.x,fixedEnd.y,poleLength, theta,radius,fillColour=NA, borderColour="black"){ floatingEnd.x <- fixedEnd.x+poleLength * sin(theta) floatingEnd.y <- fixedEnd.y+poleLength * cos(theta) createCircleFunc(floatingEnd.x,floatingEnd.y,radius,fillColour,borderColour) } #Parameters to control the simulation simulation.timestep = 0.02 simulation.gravity = 9.8 #meters per second^2 simulation.numoftimesteps = 2000 pole.length = 1 #meters, total pole length pole.width = 0.2 pole.theta = 1*pi/4 pole.thetaDot = 0 pole.thetaDotDot = 0 pole.colour = "purple" pendulum.centerX = NA pendulum.centerY = NA pendulum.radius = 0.1 pendulum.mass = 1 pendulum.colour = "purple" cart.width=0.5 cart.centerX = 0 cart.centerY = 0 cart.height=0.2 cart.colour="red" cart.centerXDot = 0 cart.centerXDotDot = 0 cart.mass = 1 cart.force = 0 cart.mu=2 track.limit= 2.4 #meters from center track.x = -track.limit track.height=0.01 track.y = 0.5*track.height track.colour = "blue" leftBuffer.width=0.1 leftBuffer.height=0.2 leftBuffer.x=-track.limit-0.5*cart.width-leftBuffer.width leftBuffer.y=0.5*leftBuffer.height leftBuffer.colour = "blue" rightBuffer.width=0.1 rightBuffer.height=0.2 rightBuffer.x=track.limit+0.5*cart.width rightBuffer.y=0.5*rightBuffer.height rightBuffer.colour = "blue" #Define the size of the scene (used to visualise what is happening in the simulation) scene.width = 2*max(rightBuffer.x+rightBuffer.width,track.limit+pole.length+pendulum.radius) scene.bottomLeftX = -0.5*scene.width scene.height=max(pole.length+pendulum.radius,scene.width) scene.bottomLeftY = -0.5*scene.height #Some variables to store various time series values of the simulation logger.trackposition = rep(NA,simulation.numoftimesteps) logger.force = rep(NA,simulation.numoftimesteps) logger.cartvelocity = rep(NA,simulation.numoftimesteps) logger.poletheta = rep(NA,simulation.numoftimesteps) #Some settings to control the charts used to plot the logged variables plotcontrol.trackposition.ylim = c(0,10) plotcontrol.force.ylim = c(-6,6) plotcontrol.yacceleration.ylim = c(-simulation.gravity,400) plotcontrol.poletheta.ylim = c(0,360) runSimulationFunc <- function(){ simulationAborted = FALSE #Main simulation loop for(i in seq(1,simulation.numoftimesteps)){ costheta = cos(pole.theta) sintheta = sin(pole.theta) totalmass = cart.mass+pendulum.mass masslength = pendulum.mass*pole.length pole.thetaDotDot = (simulation.gravity*totalmass*sintheta+costheta*(cart.force-masslength*pole.thetaDot^2*sintheta-cart.mu*cart.centerXDot))/(pole.length*(totalmass-pendulum.mass*costheta^2)) cart.centerXDotDot = (cart.force+masslength*(pole.thetaDotDot*costheta-pole.thetaDot^2*sintheta)-cart.mu*cart.centerXDot)/totalmass cart.centerX = cart.centerX+simulation.timestep*cart.centerXDot cart.centerXDot = cart.centerXDot+simulation.timestep*cart.centerXDotDot pole.theta = (pole.theta +simulation.timestep*pole.thetaDot ) pole.thetaDot = pole.thetaDot+simulation.timestep*pole.thetaDotDot if(cart.centerX <= track.x | cart.centerX >= (track.x+2*track.limit)){ cart.colour="black" simulationAborted = TRUE } #Log the results of the simulation logger.trackposition[i] <- cart.centerX logger.force[i] <- cart.force logger.cartvelocity[i] <- cart.centerXDot logger.poletheta[i] <- pole.theta #Plot the simulation #The layout command arranges the charts layout(matrix(c(1,2,1,3,1,4,1,5,6,6), 5, 2, byrow = TRUE),heights=c(2,2,2,2,1)) par(mar=c(3,4,2,2) + 0.1) #Create the scene and draw the various objects createSceneFunc(scene.bottomLeftX,scene.bottomLeftY,scene.width,scene.height, main="Simulation of Inverted Pendulum - www.gekkoquant.com",xlab="", ylab="",xlim=c(-0.5*scene.width,0.5*scene.width),ylim=c(-0.5*scene.height,0.5*scene.height)) createBoxFunc(track.x,track.y,track.limit*2,track.height,track.colour) createBoxFunc(leftBuffer.x,leftBuffer.y,leftBuffer.width,leftBuffer.height,leftBuffer.colour) createBoxFunc(rightBuffer.x,rightBuffer.y,rightBuffer.width,rightBuffer.height,rightBuffer.colour) createBoxFunc(cart.centerX-0.5*cart.width,cart.centerY+0.5*cart.height,cart.width,cart.height,cart.colour) drawPoleFunc(cart.centerX,cart.centerY,2*pole.length,pole.theta,pole.colour) drawPendulum(cart.centerX,cart.centerY,2*pole.length,pole.theta,pendulum.radius,pendulum.colour) #Plot the logged variables plot(logger.trackposition, type="l",ylab="Cart Position")#, ylim=plotcontrol.trackposition.ylim, ylab="Y POSITION") plot(logger.force, type="l", ylab="Cart FORCE") #, ylim=plotcontrol.force.ylim, ylab="Y VELOCITY") plot(logger.cartvelocity, type="l", ylab="Cart VELOCITY")#,ylim=plotcontrol.yacceleration.ylim, ylab="Y ACCELERATION") plot(logger.poletheta*360/(2*pi), type="l", ylab="Pole THETA")#,ylim=plotcontrol.yacceleration.ylim, ylab="Y ACCELERATION") #Plot a progress bar par(mar=c(2,1,1,1)) plot(-5, xlim = c(1,simulation.numoftimesteps), ylim = c(0, .3), yaxt = "n", xlab = "", ylab = "", main = "Iteration") abline(v=i, lwd=5, col = rgb(0, 0, 255, 255, maxColorValue=255)) if(simulationAborted){ break } } } #runSimulationFunc() oopt = ani.options(ani.width = 1200, ani.height = 800, other.opts = "-pix_fmt yuv420p -b 600k") saveVideo(runSimulationFunc(),interval = simulation.timestep,ani.options=oopt,video.name="inverted-pendulum.mp4") ani.options(oopt) |

A video of the simulation is below:

**Creating Animations**

The R package “animation” has been used to create videos of the simulated process. This package requires that FFMpeg is installed on your machine and added to your environmental path. To learn how to add items to your path follow this tutorial at Geeks With Blogs.

The code below demonstrated how to generate a video:

oopt = ani.options(ani.width = 1200, ani.height = 800, other.opts = "-pix_fmt yuv420p -b 600k") saveVideo(runSimulationFunc(),interval = simulation.timestep,ani.options=oopt,video.name="bounce.mp4") ani.options(oopt) |

- ani.width is the width of the video
- ani.height is the height of the video
- other.opts are command line arguments that are passed to ffmpeg and can be used to control the bitrate and other quality settings
- interval specifies in seconds how long to wait between frames
- runSimulationFunc() is a function that should run your simulation, and charts plotted during the simulation will be added to the video

**Drawing Graphics**

I have written some functions to make drawing basic graphics easy.

- createSceneFunc(bottomLeftX, bottomLeftY, width,height) creates a brand new scene to draw objects on, bottomLeftX and bottomLeftY are Cartesian co-ordinates to specify the bottom left corner of the canvas. The width and height variables are used to specify the canvas dimensions.
- createBoxFunc(topLeftX, topLeftY, width, height, fillColour) draws a box to the current canvas, topLeftX and topLeftY specify the Cartesian co-ordinate of the top left of the box, width and height specify the dimensions and fillColour specifies the colour that fills in the box.
- createCircleFunc(centerX,centerY,radius,fillColour) draws a circle to the current canvas, centerX and centerY specify the Cartesian co-ordinate of the center on the circle, the radius specifies the radius of the circle and fillColour specifies the colour that fills in the circle.

**Simulation Dynamics**

The following single period update equations are used:

*When a collision is made between the ball and the platform the following update is used:*

**Onto the code:**

library("animation") #Library to save GIFs #Function to create a blank canvas / scene for drawing objects onto later createSceneFunc <- function(bottomLeftX, bottomLeftY, width,height,main="",xlab="",ylab="",ann=T,xaxt=NULL,yaxt=NULL){ plot(c(bottomLeftX, width), c(bottomLeftY,height), type = "n",ann=ann, xaxt=xaxt, yaxt=yaxt,main=main,xlab=xlab,ylab=ylab ) } #Function to draw a box on the scene createBoxFunc <- function(topLeftX, topLeftY, width, height, fillColour=NA, borderColour="black"){ polygon(c(topLeftX,topLeftX+width,topLeftX+width,topLeftX), c(topLeftY,topLeftY,topLeftY-height,topLeftY-height), col = fillColour, border=borderColour) } #Function to draw a circle on the scene createCircleFunc <- function(centerX,centerY,radius,fillColour=NA, borderColour="black"){ symbols(centerX,centerY,circles=radius,inches=F,add=T,fg=borderColour,bg=fillColour) } #Parameters to control the simulation simulation.timestep = 0.02 simulation.gravity = 1.8 simulation.numoftimesteps = 2000 #Define the size of the scene (used to visualise what is happening in the simulation) scene.bottomLeftX = 0 scene.bottomLeftY = -1 scene.width = 10 scene.height=10 #This is the object the bouncing ball is going to hit platform.x = scene.bottomLeftX+1 platform.y = 0 platform.width= scene.width - 2 platform.height=0.5 platform.colour = "red" #This is just a box resting on the left end of the platform to practise drawing things leftbluebox.x = platform.x leftbluebox.y = platform.y+0.5 leftbluebox.width=0.5 leftbluebox.height=0.5 leftbluebox.colour = "blue" #This is just a box resting on the right end of the platform to practise drawing things rightbluebox.x = platform.x+platform.width-0.5 rightbluebox.y = platform.y+0.5 rightbluebox.width=0.5 rightbluebox.height=0.5 rightbluebox.colour = "blue" #This is the ball that is going to be bouncing ball.y=10 ball.x=5 ball.radius = 0.25 ball.yvelocity = 0 ball.yacceleration = 0 ball.coefficientofrestitution = 0.85 #This is a physics term to describe how much velocity as pct is kept after a bounce ball.colour="purple" #Some variables to store various time series values of the simulation logger.yposition = rep(NA,simulation.numoftimesteps) logger.yvelocity = rep(NA,simulation.numoftimesteps) logger.yacceleration = rep(NA,simulation.numoftimesteps) #Some settings to control the charts used to plot the logged variables plotcontrol.yposition.ylim = c(0,10) plotcontrol.yvelocity.ylim = c(-6,6) plotcontrol.yacceleration.ylim = c(-simulation.gravity,400) runSimulationFunc <- function(){ #Main simulation loop for(i in seq(1,simulation.numoftimesteps)){ #Equations of motion ball.yacceleration = -simulation.gravity ball.y = ball.y + ball.yvelocity*simulation.timestep ball.yvelocity = ball.yvelocity+ball.yacceleration*simulation.timestep #Logic to check is there has been a collision between the ball and the platform if(ball.y-ball.radius <= platform.y){ #There has been a collision newyvelocity = -ball.yvelocity*ball.coefficientofrestitution ball.yacceleration = (newyvelocity - ball.yvelocity)/simulation.timestep ball.yvelocity = newyvelocity ball.y = ball.radius+platform.y } #Log the results of the simulation logger.yposition[i] <- ball.y logger.yvelocity[i] <- ball.yvelocity logger.yacceleration[i] <- ball.yacceleration #Plot the simulation #The layout command arranges the charts layout(matrix(c(1,2,1,3,1,4,5,5), 4, 2, byrow = TRUE),heights=c(3,3,3,1)) par(mar=c(3,4,2,2) + 0.1) #Create the scene and draw the various objects #Create the scene createSceneFunc(scene.bottomLeftX,scene.bottomLeftY,scene.width,scene.height, main="Simulation of Bouncing Ball - www.gekkoquant.com",xlab="",ylab="Ball Height",xaxt="n") #Draw the platform the ball lands on createBoxFunc(platform.x,platform.y,platform.width,platform.height,platform.colour) #Draw a box on the left off the platform createBoxFunc(leftbluebox.x,leftbluebox.y,leftbluebox.width,leftbluebox.height,leftbluebox.colour) #Draw a box on the right of the platform createBoxFunc(rightbluebox.x,rightbluebox.y,rightbluebox.width,rightbluebox.height,rightbluebox.colour) #Draw the ball createCircleFunc(ball.x,ball.y,ball.radius,ball.colour,borderColour=NA) #Plot the logged variables plot(logger.yposition, type="l", ylim=plotcontrol.yposition.ylim, ylab="Y POSITION") plot(logger.yvelocity, type="l", ylim=plotcontrol.yvelocity.ylim, ylab="Y VELOCITY") plot(logger.yacceleration, type="l",ylim=plotcontrol.yacceleration.ylim, ylab="Y ACCELERATION") #Plot a progress bar par(mar=c(2,1,1,1)) plot(-5, xlim = c(1,simulation.numoftimesteps), ylim = c(0, .3), yaxt = "n", xlab = "", ylab = "", main = "Iteration") abline(v=i, lwd=5, col = rgb(0, 0, 255, 255, maxColorValue=255)) } } oopt = ani.options(ani.width = 1200, ani.height = 800, other.opts = "-pix_fmt yuv420p -b 600k") saveVideo(runSimulationFunc(),interval = simulation.timestep,ani.options=oopt,video.name="bounce.mp4") ani.options(oopt) |

Part 3 of this series demonstrated how to train a HMM on a toy model, this post will focus on how to actually go about modelling real life data. A trend following strategy will be developed for trading the S&P 500.

In most machine learning classification problems you need a set of training data which has class labels. Whilst we have the market data we don’t have class labels, the first task is therefor to generate the class labels for the training data.

We are wanting to develop a trend following strategy, we need to select parts of the S&P 500 time series and label them as either in an up trend or down trend (could also have no trend label). In theory you could do this by hand, however this would be not be feasible to do over the whole universe of stocks. Instead we can write a program to automatically classify the data for us, how you classify data as in a trend depends upon your definition of trend.

The labels are seen in the chart below, and price with green underneath means it’s been labelled as a long signal, any price with red above it means that it has the short label.

In the code provided I check to see if the current price is the lowest price for the next 10 periods (variable NDayLookforwardLowHigh), if it is then go long, if the variable is the highest over the next 10 periods then close the long. Do the reverse for shorts. It’s a little crude and I imagine there are significantly better ways to classify trends, might be useful to put a volatility constraint on the trends (should hopefully filter through and improve the sharpe ratio).

In addition to labeling the data a set of vector “features” must be generated, the features should contain variables that are beneficial to detecting trends. The feature vector I used has the ratios of open to close price, open to high price, open to low price and all the intermediate combinations. Often it is desirable to model the dynamics of these variables and put the one period change in these variables inside of the feature vector.

The above image shows the likelihood of each market regime given the HMM trained on the same data set. It is reassuring to see that the Long Regime became very unlikely during the crash of 2008.

One of the excellent properties of HMM is that they allow the modelling of situations that have different duration but are of the same class. For example a trend might last for 10 days, and another trend might last for 35 days, we can pass both of these examples into the HMM and it will try and model the duration difference using the internal state transition probabilities.

**Out of sample results:**

**Sharpe Ratio of 0.857**

**Model Fit:**

**Long Model**

HMMFit(obs = inSampleLongFeaturesList, nStates = 3) Model: ------ 3 states HMM with 7-d gaussian distribution Baum-Welch algorithm status: ---------------------------- Number of iterations : 21 Last relative variation of LLH function: 0.000001 Estimation: ----------- Initial probabilities: Pi 1 Pi 2 Pi 3 1.690213e-47 0.3734194 0.6265806 Transition matrix: State 1 State 2 State 3 State 1 0.4126480 0.3419075 0.24544450 State 2 0.1116068 0.8352273 0.05316591 State 3 0.5475525 0.2303324 0.22211504 Conditionnal distribution parameters: Distribution parameters: State 1 mean cov matrix 0.01565943 1.922717e-04 1.724953e-04 1.785035e-04 -7.870798e-06 -1.764319e-04 -1.687845e-04 6.898374e-06 1.02210441 1.724953e-04 1.920546e-04 1.736241e-04 2.327852e-06 -1.615346e-04 -1.768012e-04 -1.651055e-05 1.01805768 1.785035e-04 1.736241e-04 1.777602e-04 2.663971e-06 -1.653924e-04 -1.595871e-04 5.067094e-06 1.00264545 -7.870798e-06 2.327852e-06 2.663971e-06 1.095711e-05 8.195588e-06 8.473222e-06 3.178401e-07 0.98502360 -1.764319e-04 -1.615346e-04 -1.653924e-04 8.195588e-06 1.644647e-04 1.589815e-04 -4.749521e-06 0.98113485 -1.687845e-04 -1.768012e-04 -1.595871e-04 8.473222e-06 1.589815e-04 1.732240e-04 1.542815e-05 0.99605695 6.898374e-06 -1.651055e-05 5.067094e-06 3.178401e-07 -4.749521e-06 1.542815e-05 2.064964e-05 State 2 mean cov matrix 0.001670502 3.103878e-05 3.555352e-06 1.781044e-05 -1.336108e-05 -3.092612e-05 -1.670114e-05 1.410578e-05 1.009361131 3.555352e-06 1.249497e-05 9.181451e-06 5.644685e-06 -3.464184e-06 -6.714792e-06 -3.238512e-06 1.005638565 1.781044e-05 9.181451e-06 1.714606e-05 -7.256446e-07 -1.770669e-05 -9.748050e-06 7.905439e-06 1.003957940 -1.336108e-05 5.644685e-06 -7.256446e-07 1.271107e-05 1.335557e-05 7.009564e-06 -6.279198e-06 0.998346405 -3.092612e-05 -3.464184e-06 -1.770669e-05 1.335557e-05 3.081873e-05 1.660615e-05 -1.409313e-05 0.994653572 -1.670114e-05 -6.714792e-06 -9.748050e-06 7.009564e-06 1.660615e-05 1.353501e-05 -3.017033e-06 0.996315167 1.410578e-05 -3.238512e-06 7.905439e-06 -6.279198e-06 -1.409313e-05 -3.017033e-06 1.101157e-05 State 3 mean cov matrix -0.01296153 1.481273e-04 -8.848326e-05 2.231101e-05 -1.286390e-04 -1.503760e-04 -3.945685e-05 1.032046e-04 1.02416584 -8.848326e-05 1.349494e-04 2.353928e-05 1.150009e-04 9.081677e-05 -1.853468e-05 -1.027771e-04 1.00458706 2.231101e-05 2.353928e-05 3.596043e-05 1.384302e-05 -2.234068e-05 -9.215909e-06 1.259424e-05 1.01746162 -1.286390e-04 1.150009e-04 1.384302e-05 1.485395e-04 1.338477e-04 3.324363e-05 -9.313984e-05 1.01283801 -1.503760e-04 9.081677e-05 -2.234068e-05 1.338477e-04 1.555764e-04 4.225851e-05 -1.053565e-04 0.99347206 -3.945685e-05 -1.853468e-05 -9.215909e-06 3.324363e-05 4.225851e-05 5.013390e-05 8.770049e-06 0.98098355 1.032046e-04 -1.027771e-04 1.259424e-05 -9.313984e-05 -1.053565e-04 8.770049e-06 1.075952e-04 Log-likelihood: 64226.28 BIC criterium: -127633.2 AIC criterium: -128226.6

**Short Model**

HMMFit(obs = inSampleShortFeaturesList, nStates = 3) Model: ------ 3 states HMM with 7-d gaussian distribution Baum-Welch algorithm status: ---------------------------- Number of iterations : 20 Last relative variation of LLH function: 0.000001 Estimation: ----------- Initial probabilities: Pi 1 Pi 2 Pi 3 3.784166e-15 0.1476967 0.8523033 Transition matrix: State 1 State 2 State 3 State 1 0.4760408 0.12434214 0.3996170 State 2 0.3068272 0.54976794 0.1434049 State 3 0.5243733 0.06315371 0.4124730 Conditionnal distribution parameters: Distribution parameters: State 1 mean cov matrix -0.009827328 5.102050e-05 -2.137629e-05 1.047989e-05 -3.999886e-05 -5.043558e-05 -1.834830e-05 3.090145e-05 1.016726031 -2.137629e-05 2.915402e-05 4.001168e-06 2.551672e-05 2.145395e-05 -3.484775e-06 -2.426317e-05 1.002951135 1.047989e-05 4.001168e-06 1.070077e-05 4.343448e-07 -1.032399e-05 -3.489498e-06 6.607115e-06 1.012808350 -3.999886e-05 2.551672e-05 4.343448e-07 4.073945e-05 4.019962e-05 1.502427e-05 -2.422162e-05 1.009838504 -5.043558e-05 2.145395e-05 -1.032399e-05 4.019962e-05 5.047663e-05 1.847048e-05 -3.082551e-05 0.996150195 -1.834830e-05 -3.484775e-06 -3.489498e-06 1.502427e-05 1.847048e-05 1.816514e-05 -3.045290e-08 0.986475577 3.090145e-05 -2.426317e-05 6.607115e-06 -2.422162e-05 -3.082551e-05 -3.045290e-08 2.992073e-05 State 2 mean cov matrix -0.005393441 0.0008501205 -1.231927e-04 3.413652e-04 -0.0004927836 -0.0008165848 -3.496732e-04 4.379093e-04 1.037824136 -0.0001231927 3.546602e-04 7.820615e-05 0.0002160609 0.0001422901 -1.206772e-04 -2.508658e-04 1.013889133 0.0003413652 7.820615e-05 2.198099e-04 -0.0001068837 -0.0003166324 -1.713360e-04 1.374691e-04 1.019557602 -0.0004927836 2.160609e-04 -1.068837e-04 0.0003949026 0.0004952245 1.733093e-04 -2.998924e-04 1.005903113 -0.0008165848 1.422901e-04 -3.166324e-04 0.0004952245 0.0007960268 3.345499e-04 -4.319633e-04 0.982515481 -0.0003496732 -1.206772e-04 -1.713360e-04 0.0001733093 0.0003345499 2.726893e-04 -5.242376e-05 0.977179046 0.0004379093 -2.508658e-04 1.374691e-04 -0.0002998924 -0.0004319633 -5.242376e-05 3.611514e-04 State 3 mean cov matrix 0.003909801 2.934983e-05 1.656072e-05 2.335206e-05 -5.814278e-06 -2.877715e-05 -2.196575e-05 6.761736e-06 1.010126341 1.656072e-05 1.900251e-05 1.943331e-05 2.909245e-06 -1.625496e-05 -1.574899e-05 4.694645e-07 1.007300746 2.335206e-05 1.943331e-05 2.554298e-05 2.309188e-06 -2.288733e-05 -1.676155e-05 6.090567e-06 1.003413014 -5.814278e-06 2.909245e-06 2.309188e-06 8.147512e-06 5.796633e-06 5.191743e-06 -5.902698e-07 0.996163167 -2.877715e-05 -1.625496e-05 -2.288733e-05 5.796633e-06 2.830213e-05 2.164898e-05 -6.603805e-06 0.993369564 -2.196575e-05 -1.574899e-05 -1.676155e-05 5.191743e-06 2.164898e-05 2.055797e-05 -1.040800e-06 0.997202266 6.761736e-06 4.694645e-07 6.090567e-06 -5.902698e-07 -6.603805e-06 -1.040800e-06 5.564118e-06 Log-likelihood: 47728.08 BIC criterium: -94666.79 AIC criterium: -95230.16

**Onto the code:**

library("quantmod") library("PerformanceAnalytics") library('RHmm') #Load HMM package library('zoo') #Specify dates for downloading data startDate = as.Date("2000-01-01") #Specify what date to get the prices from trainingEndDate = as.Date("2010-01-01") # Specify the date we take as our traning sample NDayLookforwardLowHigh <- 10 #Parameter used when classifing in sample data as in a trend or not HmmLikelihoodTestLength <- 5 #How many days of data to calculate the likehood ratio on to compare models symbolData <- new.env() #Make a new environment for quantmod to store data in symbol <- "^GSPC" #S&p 500 getSymbols(symbol, env = symbolData, src = "yahoo", from = startDate) mktdata <- eval(parse(text=paste("symbolData$",sub("^","",symbol,fixed=TRUE)))) mktdata <- head(mktdata,-1) #Hack to fix some stupid duplicate date problem with yahoo inSampleMktData <- window(mktdata,start=startDate ,end=trainingEndDate) outOfSampleMktData <- window(mktdata,start=trainingEndDate+1) dailyRet <- Delt(Cl(mktdata),k=1,type="arithmetic") #Daily Returns dailyRet[is.na(dailyRet)] <-0.00001 inSampleDailyRet <- window(dailyRet,start=startDate ,end=trainingEndDate) outOfSampleDailyRet <- window(dailyRet,start=trainingEndDate+1) ConvertTofullSignal <- function(signal){ results <- rep(0,length(signal)) intrade <- F for(i in seq(1,length(signal))){ if(signal[i]==-1){ intrade <- F } if(signal[i]==1 || intrade){ results[i]<-1 intrade <- T } } return(results) } #Generate long trend signal LongTrendSignal <- rep(0,nrow(inSampleMktData)) for(i in seq(1,nrow(inSampleMktData)-NDayLookforwardLowHigh)){ dataBlock <- Cl(inSampleMktData[seq(i,i+NDayLookforwardLowHigh),]) if(coredata(Cl(inSampleMktData[i,])) == min(coredata(dataBlock))){ LongTrendSignal[i] <- 1 } if(coredata(Cl(inSampleMktData[i,])) == max(coredata(dataBlock))){ LongTrendSignal[i] <- -1 } } LongTrendSignal <- ConvertTofullSignal(LongTrendSignal) #Generate short trend signal ShortTrendSignal <- rep(0,nrow(inSampleMktData)) for(i in seq(1,nrow(inSampleMktData)-NDayLookforwardLowHigh)){ dataBlock <- Cl(inSampleMktData[seq(i,i+NDayLookforwardLowHigh),]) if(coredata(Cl(inSampleMktData[i,])) == max(coredata(dataBlock))){ ShortTrendSignal[i] <- 1 } if(coredata(Cl(inSampleMktData[i,])) == min(coredata(dataBlock))){ ShortTrendSignal[i] <- -1 } } ShortTrendSignal <- ConvertTofullSignal(ShortTrendSignal) #Plot our signals LongTrendSignalForPlot <- LongTrendSignal LongTrendSignalForPlot[LongTrendSignalForPlot==0] <- NaN LongTrendSignalForPlot <- Cl(inSampleMktData)*LongTrendSignalForPlot - 100 inSampleLongTrendSignalForPlot <-LongTrendSignalForPlot ShortTrendSignalForPlot <- ShortTrendSignal ShortTrendSignalForPlot[ShortTrendSignalForPlot==0] <- NaN ShortTrendSignalForPlot <- Cl(inSampleMktData)*ShortTrendSignalForPlot + 100 inSampleShortTrendSignalForPlot <- ShortTrendSignalForPlot dev.new() layout(1:2) plot(Cl(inSampleMktData), main="S&P 500 Trend Follow In Sample Training Signals") lines(inSampleLongTrendSignalForPlot,col="green",type="l") lines(inSampleShortTrendSignalForPlot,col="red",type="l") legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"), fill=c("black","green","red"), bty='n') #Calculate Returns LongReturns <- Lag(LongTrendSignal)* (inSampleDailyRet) LongReturns[is.na(LongReturns)] <- 0 ShortReturns <- Lag(-1*ShortTrendSignal)* (inSampleDailyRet) ShortReturns[is.na(ShortReturns)] <- 0 TotalReturns <- LongReturns + ShortReturns plot(cumsum(TotalReturns),main="S&P 500 Trend Follow In Sample HMM Training Signals Strategy Returns") lines(cumsum(LongReturns),col="green") lines(cumsum(ShortReturns),col="red") legend(x='bottomright', c("Total Returns","Long Trend Returns","Short Trend Returns"), fill=c("black","green","red"), bty='n') #Extracts a list of varying length features for each signal/class label CreateListOfMatrixFeatures <- function(features,signal){ results <- list() extract <- F for(i in seq(1,length(signal))){ if(signal[i]==1 && !extract){ startIndex <- i extract <- T } if(signal[i]==0 && extract){ endIndex <- i-1 dataBlock <- features[startIndex:endIndex,] extract <- F #print(dataBlock) results[[length(results)+1]] <- as.matrix(dataBlock) } } return(results) } #HMM Training #Generate the features that describe the data & split into training and out of sample sets features <- cbind(dailyRet,Hi(mktdata)/Lo(mktdata),Hi(mktdata)/Op(mktdata),Hi(mktdata)/Cl(mktdata),Op(mktdata)/Cl(mktdata),Lo(mktdata)/Cl(mktdata),Lo(mktdata)/Op(mktdata)) inSampleTrainingFeatures <- window(features,start=startDate ,end=trainingEndDate) outOfSampleFeatures <- window(features,start=trainingEndDate+1) #For each long / short position extract the corresponding features data and create list of them inSampleLongFeaturesList <- CreateListOfMatrixFeatures(inSampleTrainingFeatures,LongTrendSignal) inSampleShortFeaturesList <- CreateListOfMatrixFeatures(inSampleTrainingFeatures,ShortTrendSignal) #Train the HMM models LongModelFit = HMMFit(inSampleLongFeaturesList, nStates=3) ShortModelFit = HMMFit(inSampleShortFeaturesList, nStates=3) #Will take NDayLookforwardLowHigh days of data and calculate the rolling log likelihood for each HMM model inSampleLongLikelihood <- rollapply(inSampleTrainingFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(LongModelFit,as.matrix(x))$LLH}) inSampleShortLikelihood <- rollapply(inSampleTrainingFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(ShortModelFit,as.matrix(x))$LLH}) outOfSampleLongLikelihood <- rollapply(outOfSampleFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(LongModelFit,as.matrix(x))$LLH}) outOfSampleShortLikelihood <- rollapply(outOfSampleFeatures,HmmLikelihoodTestLength,align="right",na.pad=T,by.column=F,function(x) {forwardBackward(ShortModelFit,as.matrix(x))$LLH}) #Create signals for plot / trading outOfSampleLongTrendSignalForPlot <- 1*(outOfSampleLongLikelihood > outOfSampleShortLikelihood) outOfSampleLongTrendSignalForPlot[outOfSampleLongTrendSignalForPlot==0] <- NaN outOfSampleLongTrendSignalForPlot <- outOfSampleLongTrendSignalForPlot*Cl(outOfSampleMktData)-100 outOfSampleShortTrendSignalForPlot <- 1*(outOfSampleLongLikelihood < outOfSampleShortLikelihood) outOfSampleShortTrendSignalForPlot[outOfSampleShortTrendSignalForPlot==0]<-NaN outOfSampleShortTrendSignalForPlot <- outOfSampleShortTrendSignalForPlot*Cl(outOfSampleMktData)+100 dev.new() layout(1:2) plot(Cl(inSampleMktData), main="S&P 500 Trend Follow In Sample HMM Training Signals") lines(inSampleLongTrendSignalForPlot,col="green",type="l") lines(inSampleShortTrendSignalForPlot,col="red",type="l") legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"), fill=c("black","green","red"), bty='n') #tt <- Cl(inSampleMktData) #tt[,1] <- inSampleLongLikelihood plot(inSampleLongLikelihood,main="Log Likelihood of each HMM model - In Sample") lines(inSampleLongLikelihood,col="green") lines(inSampleShortLikelihood,col="red") legend(x='bottomright', c("Long HMM Likelihood","Short HMM Likelihood"), fill=c("green","red"), bty='n') dev.new() layout(1:3) plot(Cl(outOfSampleMktData), main="S&P 500 HMM Trend Following Out of Sample") lines(outOfSampleLongTrendSignalForPlot,col="green",type="l") lines(outOfSampleShortTrendSignalForPlot,col="red",type="l") legend(x='bottomright', c("S&P 500 Closing Price","Long Signal","Short Signal"), fill=c("black","green","red"), bty='n') #tt <- Cl(outOfSampleMktData) #tt[,1] <- outOfSampleLongLikelihood plot(outOfSampleLongLikelihood,main="Log Likelihood of each HMM model - Out Of Sample") lines(outOfSampleLongLikelihood,col="green") lines(outOfSampleShortLikelihood,col="red") legend(x='bottomright', c("Long HMM Likelihood","Short HMM Likelihood"), fill=c("green","red"), bty='n') #Calculate Out of Sample Returns outOfSampleLongReturns <- Lag((1*(outOfSampleLongLikelihood > outOfSampleShortLikelihood)))* (outOfSampleDailyRet) outOfSampleLongReturns[is.na(outOfSampleLongReturns)] <- 0 outOfSampleShortReturns <- Lag(-1*(1*(outOfSampleLongLikelihood < outOfSampleShortLikelihood)))* (outOfSampleDailyRet) outOfSampleShortReturns[is.na(outOfSampleShortReturns)] <- 0 outOfSampleTotalReturns <- outOfSampleLongReturns + outOfSampleShortReturns outOfSampleTotalReturns[is.na(outOfSampleTotalReturns)] <- 0 plot(cumsum(outOfSampleTotalReturns),main="S&P 500 HMM Trend Following Out of Sample Strategy Returns") lines(cumsum(outOfSampleLongReturns),col="green") lines(cumsum(outOfSampleShortReturns),col="red") legend(x='bottomright', c("Total Returns","Long Trend Returns","Short Trend Returns"), fill=c("black","green","red"), bty='n') print(SharpeRatio.annualized(outOfSampleTotalReturns)) |

**Method One – Single HMM Each State is a Regime**

The credit for this section must go to the fantastic Systematic Investor blog http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/. The code is well commented and should be self explanatory. Essentially two markets regimes (bull and bear) are simulated, a 2 state HMM is then trained on the data. The forward backward algorithm is then used to calculate the probability of being in a given state at any given time.

**Method Two – Multiple HMMs with multiple states – Each HMM a regime**

Three market regimes are simulated; bull, bear and a sideways market. Three different 2 stage HMM models are trained on each regime. Model 1 is the HMM for the bull market, Model 2 is the HMM for the bear market, and Model 3 is the HMM for a side ways market. A rolling window of 50 days worth of data is passed into each HMM and a log likelihood score produced. The higher the log likelihood the more likely it is that the model generated the observed data.

As can be seen in the above chart, the log likelihood is fairly decent for determining the difference between the bull and bear markets. Sadly the side ways model seems very likely in both the bull and bear cases, it’s log likelihood is fairly stable and doesn’t change per regime.

**Code for method 1:**

library('RHmm') #Load HMM package #Code based upon http://systematicinvestor.wordpress.com/2012/11/01/regime-detection/ bullMarketOne = rnorm( 100, 0.1/365, 0.05/sqrt(365) ) bearMarket = rnorm( 100, -0.2/365, 0.15/sqrt(365)) bullMarketTwo = rnorm( 100, 0.15/365, 0.07/sqrt(365) ) true.states = c(rep(1,100),rep(2,100),rep(1,100)) returns = c( bullMarketOne, bearMarket, bullMarketTwo ) y=returns ResFit = HMMFit(y, nStates=2) #Fit a HMM with 2 states to the data VitPath = viterbi(ResFit, y) #Use the viterbi algorithm to find the most likely state path (of the training data) fb = forwardBackward(ResFit, y) #Forward-backward procedure, compute probabilities # Plot probabilities and implied states layout(1:3) plot(cumsum(returns),ylab="Cumulative Market Return",type="l", main="Fake Market Data") plot(VitPath$states, type='s', main='Implied States', xlab='', ylab='State') matplot(fb$Gamma, type='l', main='Smoothed Probabilities', ylab='Probability') legend(x='topright', c('Bear Market - State 2','Bull Market - State 1'), fill=1:2, bty='n') |

**Code for method 2:**

library('RHmm') #Load HMM package library('zoo') #HMM model 1 (high vol and low vol upwards trend) model1ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.1,if(isHighVol){0.15}else{0.02})) } bullLowVol = model1ReturnsFunc(F) bullHighVol = model1ReturnsFunc(T) model1TrainingReturns = c(bullLowVol, bullHighVol) Model1Fit = HMMFit(model1TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 2 (high vol and low vol downwards trend) model2ReturnsFunc <- function(isHighVol){ return(rnorm( 100, -0.1,if(isHighVol){0.15}else{0.02})) } bearLowVol = model2ReturnsFunc(F) bearHighVol = model2ReturnsFunc(T) model2TrainingReturns = c(bearLowVol, bearHighVol) Model2Fit = HMMFit(model2TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data #HMM model 3 (sideways market) model3ReturnsFunc <- function(isHighVol){ return(rnorm( 100, 0.0,if(isHighVol){0.16}else{0.08})) } sidewaysLowVol = model3ReturnsFunc(F) sidewaysHighVol = model3ReturnsFunc(T) model3TrainingReturns = c(sidewaysLowVol, sidewaysHighVol) Model3Fit = HMMFit(model3TrainingReturns, nStates=2) #Fit a HMM with 2 states to the data generateDataFunc <- function(modelSequence,highVolSequence){ results <- c() if(length(modelSequence) != length(highVolSequence)){ print("Model Sequence and Vol Sequence must be the same length"); return(NULL)} for(i in 1:length(modelSequence)){ #Bit rubish having all these IFs here but its easy to understand for novice R users if(modelSequence[i] == 1){ results <- c(results,model1ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 2){ results <- c(results,model2ReturnsFunc(highVolSequence[i])) } if(modelSequence[i] == 3){ results <- c(results,model3ReturnsFunc(highVolSequence[i])) } } return(results) } #Create some out of sample data actualModelSequence <- c(1,1,1,3,2,2,1) actualVolRegime <- c(T,T,T,T,T,T,T) outOfSampleData <- generateDataFunc(actualModelSequence,actualVolRegime) #Will take 50 days of data and calculate the rolling log likelihood for each HMM model model1Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model1Fit,x)$LLH}) model2Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model2Fit,x)$LLH}) model3Likelihood <- rollapply(outOfSampleData,50,align="right",na.pad=T,function(x) {forwardBackward(Model3Fit,x)$LLH}) layout(1:3) plot(cumsum(outOfSampleData),main="Fake Market Data",ylab="Cumulative Returns",type="l") plot(model1Likelihood,type="l",ylab="Log Likelihood of Each Model",main="Log Likelihood for each HMM Model") lines(model2Likelihood,type="l",col="red") lines(model3Likelihood,type="l",col="blue") plot(rep((actualModelSequence==3)*3,each=100),col="blue",type="o",ylim=c(0.8,3.1),ylab="Actual MODEL Number",main="Actual MODEL Sequence") lines(rep((actualModelSequence==2)*2,each=100),col="red",type="o") lines(rep((actualModelSequence==1)*1,each=100),col="black",type="o") legend(x='topright', c('Model 1 - Bull Mkt','Model 2 - Bear Mkt','Model 3 - Side ways Mkt'), col=c("black","red","blue"), bty='n',lty=c(1,1,1)) |

The forward algorithm calculates the likelihood of the data given the model **over all possible** state sequences.

The Viterbi algorithm calculates the likelihood of the data given the model **over the single most likely** state sequence.

The forward algorithm allows for efficient calculation of the likelihood function .

The forward variable is the likelihood of the HMM producing all the observations up to time t

and occupying state at time , it is defined as:

It is calculated recursively by calculating the forward variable for time being in state and then calculating the probability of moving to state at time :

Where is the probability of a jump from state to state , and is the probability of generating feature vector from state .

for and for

For

…… For

……………..

The forward algorithm calculated by summing over all state sequences, it is sometimes preferable to approximate which used all state sequences with which will use the single most likely state sequence. This is known as the Viterbi algorithm, the algorithm finds the most likely state sequence.

The probability of the best partial path of length through the HMM ended at state is defined as: . Where is the best partial path / state sequence.

As with the forward variable

can be calculated recursively

for and for

For

…… For

……………..

…………….. store the preceding node

store the preceding node

The most likely path is found by following the preceding node information backwards that is stored in

The direct calculation of will most likely cause arithmetic underflow errors. The probabilities can become so small that the computer is unable to calculate them correctly. You should instead calculate the log likelihood e.g

]]>This post will develop a general framework for classification tasks using hidden markov models. The tutorial series will cover how to build and train a hidden markov models in R. Initially the maths will be explained, then an example in R provided and then an application on financial data will be explored.

A set of features are derived from data set and a class identified by finding the most likely class given the data

However is unknown, so Bayes’ rule must be used.

Since the maximisation does not depend upon we can ignore it. The terms and , are the likelihood of the data given the class and prior probability of a class respective, both terms are defined by a model. The feature model will be described by the hidden markov model (HMM), each class will have it’s own HMM.

First we need to generate a set of features from the raw data . I will skip this step for now because it is specific to the application of your hidden markov model, for example in finance may be various stock prices and could be a set of technical indicators / volatility calculations applied to the data . HMM’s are popular in speech recognition and typically is a vector describing the characteristics of the frequency spectrum of the speech.

Secondly the feature vector must then be assigned a class from the HMM. This is done the via maximum likelihood estimation, the HMM is a generative model, choose the class that is most likely to have generated the feature vector .

For finance the class might be a market regime (trending/mean reverting) or in speech recognition the class is a word.

The number of states in the HMM

The probability of transitioning from state i to state j

The probability of generating feature vector upon entering state j (provided j is not the entry or exit state)

The HMM may be written as

the observed feature vectors

the specified state sequence

The joint probability is the probability of jumping from one state to the next multiplied by the prob of generating the feature vector in that state:

Where is always the entry state 1, and is always the exit state N.

In the above joint probability calculation we have assumed a state sequence . However this is a latent variable, we do not know it, it is hidden (hence the name HIDDEN markov model)! However if we sum over all possible state sequences we can marginalise it out.

This can be problematic due to the number of possible state sequences (especially in a real-time application), luckily algorithms exist to effectively perform the calculation without needing to explore every state sequence. One such algorithm is the forward algorithm.

This is the output distribution for a given state j. The distribution can be anything you like however it should hopefully match the distribution of the data at state j, and it must be mathematically tractable. The most natural choice at this stage is to assume can be described by the multivariate Gaussian. As a word of caution if the elements of your feature vector are highly correlated then , the covariance matrix, has a lot of parameters to measure. See if you can collapse

to a diagonal matrix.

E.g

We already know how to fit a normal distribution, the MLE for is the mean, and the covariance of the feature vector. However we must only calculate the mean and covariance on feature vectors that came from state j, this is known as Viterbi Segmentation. Viterbi Segmentation means there is a hard assignment between feature vector and the state that generated it, an alternative method is called Balm-Welch which probabilistically assigns feature vectors to multiple states.

State j generated observations starting at

It is not known in advance which state generated which observation vector, fortunately there is an algorithm called the Viterbi algorithm to approximately solve this problem.

The forward algorithm for efficient calculation of and the Viterbi algorithm will be explored in my next post.

]]>I will show how to download Nonfarms Payroll numbers, although it is very easy to modify the code below to download GDP, CPI etc…

The top chart shows non-farms plotted with the S&P 500. It is interesting to note that in the % change charts there is a crash in the market around mid 08 this is then followed by a crash in the non-farms numbers. Although not a very rigorous analysis it looks like non-farms numbers LAG the market.

The second chart regress the % change in payrolls with the % change in the S&P for the month. It is seen in the scatter plot that there is no clear relationship between payroll change and S&P change.

The second regression on the right takes this months payroll change and regress it against next months S&P return, ie try and see if the numbers from this month can tell us anything about the return in the S&P for the coming month. Payrolls don’t look very predictive at the 1month time horizon. I think a more interesting analysis would look at payrolls on the market over 10,20,30min horizons intraday.

Onto the code:

library("quantmod") #To see what the datasets are available from the FED goto the link below #http://research.stlouisfed.org/fred2/ economicData <- new.env() #Make a new environment for quantmod to store data in startDate = as.Date("2000-01-01") #Specify what date to get the prices from getSymbols("PAYEMS",src="FRED",env=economicData,from=startDate) #Payems is non-farms payrolls getSymbols("^GSPC",env=economicData,from=startDate) #S&P 500 economicData$PAYEMS <- window(economicData$PAYEMS,start=startDate) #Window our data (FRED ignores the from parameter above) :@ economicData$GSPC <- window(economicData$GSPC,start=startDate) #Window our data mergedData <- merge(economicData$PAYEMS,Cl(economicData$GSPC),all=FALSE) #join the two datasets based on their SHARED dates #Calculate the % diff mergedPercDiff<- mergedData mergedPercDiff$PAYEMS <- diff(mergedData$PAYEMS)/Lag(mergedData$PAYEMS) mergedPercDiff$GSPC.Close <- diff(mergedData$GSPC.Close)/Lag(mergedData$GSPC.Close) dev.new() par(mfrow=c(2,2)) plot(mergedData$PAYEMS, main="Non-Farm Payrolls",ylab="Thousands of Persons") plot(mergedPercDiff$PAYEMS, main="Non-Farm Payrolls", ylab="% Change") plot(mergedData$GSPC.Close, main="S&P 500 Close",ylab="Close Price") plot(mergedPercDiff$GSPC.Close, main="&P 500 Close",ylab="% Change") #Function to plot data and add regression line doPlot <- function(x,y,title,xlabel,ylabel){ x<-as.vector(x) y<-as.vector(y) regression <- lm(y~x) print(regression) plot(y~x,main=title, xlab=xlabel,ylab=ylabel) abline(regression,col="red",lwd=1.5) legend("bottomleft",paste("y=",regression$coefficients[2],"x+",regression$coefficients[1],sep=""),bg="lightblue") } dev.new() par(mfrow=c(1,2)) doPlot(mergedPercDiff$PAYEMS,mergedPercDiff$GSPC.Close,"Regress Non-Farms Payroll with S&P Monthly Returns","Non-Farms Monthly % Change","S&P 500 Monthly % Change") doPlot(Lag(mergedPercDiff$PAYEMS),mergedPercDiff$GSPC.Close,"Regress Non-Farms Payroll with NEXT MONTH'S S&P Monthly Return","Non-Farms Monthly % Change","S&P 500 Monthly % Change") |

In the last post I speculated that the poor performance of the algo was potentially down to trying to compare the current day and find the most similar days in history, rather we should try to take the last N days and find the most similar period in history.

The code below does exactly that use windowSize to control how big the periods are that we compare.

**Sharpe ratio: -0.591864**

The performance is still poor, perhaps the similarity measure i’m using is rubbish. Maybe using implied vol would be good for identifying market regimes and should be used in the similarity measure.

Unfortunately this algo is very very slow (and gets worse over time since we have more history to look back over), this makes it difficult / time consuming to optimise variables.

Onto the code:

library("quantmod") library("PerformanceAnalytics") library("zoo") #INPUTS marketSymbol <- "^GSPC" nFastLookback <- 30 #The fast signal lookback used in linear regression curve nSlowLookback <- 50 #The slow signal lookback used in linear regression curve nFastVolLookback <- 30 #The fast signal lookback used to calculate the stdev nSlowVolLookback <- 50 #The slow signal lookback used calculate the stdev nFastRSILookback <- 30 #The fast signal lookback used to calculate the stdev nSlowRSILookback <- 50 #The slow signal lookback used calculate the stdev kNearestGroupSize <- 30 #How many neighbours to use normalisedStrengthVolWeight <- 2 #Make some signals more important than others in the MSE normalisedStrengthRegressionWeight <- 1 fastRSICurveWeight <- 2 slowRSICurveWeight <- 0.8 windowSize <- 10 #Compare the last 10 days with the most similar 10 day period in history #Specify dates for downloading data, training models and running simulation startDate = as.Date("2006-08-01") #Specify what date to get the prices from symbolData <- new.env() #Make a new environment for quantmod to store data in stockCleanNameFunc <- function(name){ return(sub("^","",name,fixed=TRUE)) } getSymbols(marketSymbol, env = symbolData, src = "yahoo", from = startDate) cleanName <- stockCleanNameFunc(marketSymbol) mktData <- get(cleanName,symbolData) linearRegressionCurve <- function(data,n){ regression <- function(dataBlock){ fit <-lm(dataBlock~seq(1,length(dataBlock),1)) return(last(fit$fitted.values)) } return (rollapply(data,width=n,regression,align="right",by.column=FALSE,na.pad=TRUE)) } volCurve <- function(data,n){ stdev <- function(dataBlock){ sd(dataBlock) } return (rollapply(data,width=n,stdev,align="right",by.column=FALSE,na.pad=TRUE))^2 } fastRegression <- linearRegressionCurve(Cl(mktData),nFastLookback) slowRegression <- linearRegressionCurve(Cl(mktData),nSlowLookback) normalisedStrengthRegression <- slowRegression / (slowRegression+fastRegression) fastVolCurve <- volCurve(Cl(mktData),nFastVolLookback) slowVolCurve <- volCurve(Cl(mktData),nSlowVolLookback) normalisedStrengthVol <- slowVolCurve / (slowVolCurve+fastVolCurve) fastRSICurve <-RSI(Cl(mktData),nFastRSILookback)/100 #rescale it to be in the same range as the other indicators slowRSICurve <-RSI(Cl(mktData),nSlowRSILookback)/100 #Lets plot the signals just to see what they look like dev.new() par(mfrow=c(2,2)) plot(normalisedStrengthVol,type="l") plot(normalisedStrengthRegression,type="l") plot(fastRSICurve,type="l") plot(slowRSICurve,type="l") #DataMeasure will be used to determine how similar other days are to today #It is used later on for calculate the days which are most similar to today according to MSE measure dataMeasure <- cbind(normalisedStrengthVol*normalisedStrengthVolWeight,normalisedStrengthRegression*normalisedStrengthRegression,fastRSICurve*fastRSICurveWeight,slowRSICurve*slowRSICurveWeight) colnames(dataMeasure) <- c("normalisedStrengthVol","normalisedStrengthRegression","fastRSICurve","slowRSICurve") #Finds the nearest neighbour and calculates the trade signal calculateNearestNeighbourTradeSignal <- function(dataMeasure,K,mktReturns,windowSize){ findKNearestNeighbours <- function(dataMeasure,K,windowSize){ calculateMSE <- function(dataA,dataB){ if(length(dataA) != length(dataB)){ return (Inf) } se <- ((as.vector(as.matrix(dataA)) - as.vector(as.matrix(dataB)))^2) res <- mean(se) if(is.na(res)){ res <- Inf } return (res) } mseScores <- as.data.frame(dataMeasure[,1]) mseScores[,1] <- Inf #Default all the mse scores to inf (we've not calculated them yet) colnames(mseScores) <- c("MSE") indexOfTheMostRecentWindowSizeDays <- seq(max(1,length(dataMeasure[,1])-windowSize),length(dataMeasure[,1])) mostRecentWindowDataMeasure <- dataMeasure[indexOfTheMostRecentWindowSizeDays,] for(i in seq(1,length(dataMeasure[,1]))){ indexHistoricalWindowDataMeasure <- seq(max(1,i-windowSize),i) historicalWindowDataMeasure <- dataMeasure[indexHistoricalWindowDataMeasure,] mseScores[i,1] <- calculateMSE(mostRecentWindowDataMeasure,historicalWindowDataMeasure) # print(paste("MSE is",mseScores[i,1])) } rowNum <- seq(1,length(dataMeasure[,1]),1) tmp <- c("MSE", colnames(dataMeasure)) dataMeasureWithMse <- as.data.frame(cbind(mseScores[,1],dataMeasure)) colnames(dataMeasureWithMse) <- tmp #print(head(mseScores)) #print(head(dataMeasureWithMse)) tmp <- c("rowNum", colnames(dataMeasureWithMse)) dataMeasureWithMse <- cbind(rowNum,dataMeasureWithMse) colnames(dataMeasureWithMse) <- tmp dataMeasureWithMse <- dataMeasureWithMse[order(dataMeasureWithMse[,"MSE"]),] #Starting from the 2nd item as the 1st is the current day (MSE will be 0) want to drop it return (dataMeasureWithMse[seq(2,min(K,length(dataMeasureWithMse[,1]))),]) } calculateTradeSignalFromKNeighbours <- function(mktReturns,kNearestNeighbours){ rowNums <- kNearestNeighbours[,"rowNum"] rowNums <- na.omit(rowNums) if(length(rowNums) <= 1) { return (0) } print("The kNearestNeighbours are:") print(rowNums) #So lets see what happened on the day AFTER our nearest match mktRet <- mktReturns[rowNums+1] # return (sign(sum(mktRet))) return (SharpeRatio.annualized(mktRet)) } kNearestNeighbours <- findKNearestNeighbours(dataMeasure,K,windowSize) tradeSignal <- calculateTradeSignalFromKNeighbours(mktReturns,kNearestNeighbours) return(tradeSignal) } ret <- (Cl(mktData)/Op(mktData))-1 signalLog <- as.data.frame(ret) signalLog[,1] <- 0 colnames(signalLog) <- c("TradeSignal") #Loop through all the days we have data for, and calculate a signal for them using nearest neighbour for(i in seq(1,length(ret))){ print (paste("Simulating trading for day",i,"out of",length(ret),"@",100*i/length(ret),"%")) index <- seq(1,i) signal <- calculateNearestNeighbourTradeSignal(dataMeasure[index,],kNearestGroupSize,ret,windowSize) signalLog[i,1] <- signal } dev.new() tradeRet <- Lag(signalLog[,1])*ret[,1] #Combine todays signal with tomorrows return (no lookforward issues) totalRet <- cbind(tradeRet,ret) colnames(totalRet) <- c("Algo",paste(marketSymbol," Long OpCl Returns")) charts.PerformanceSummary(totalRet,main=paste("K nearest trading algo for",marketSymbol,"kNearestGroupSize=",kNearestGroupSize,"windowSize=",windowSize),geometric=FALSE) print(SharpeRatio.annualized(tradeRet)) |