# Hidden Markov Models – Forward & Viterbi Algorithm Part 2 of 4

In the previous post the Hidden Markov Model was defined, however efficient algorithms are need to calculate some the probabilities / perform the marginalisation over hidden states. Two algorithms that can be used are the forward algorithm and the Viterbi algorithm.

The forward algorithm calculates the likelihood of the data given the model over all possible state sequences.

The Viterbi algorithm calculates the likelihood of the data given the model over the single most likely state sequence.

# The forward algorithm

The forward algorithm allows for efficient calculation of the likelihood function $p(\mathbf{O}|\lambda)$.

The forward variable is the likelihood of the HMM producing all the observations up to time t
and occupying state $j$ at time $t$, it is defined as: $\alpha_{j}(t)=p(\mathbf{o_{1},...,o_{t}},x(t)=j|\lambda)$

It is calculated recursively by calculating the forward variable for time $t-1$ being in state $k$ and then calculating the probability of moving to state $j$ at time $t$: $\alpha_{j}(t)=p(\mathbf{o_{1},...,o_{t}},x(t-1)=k,x(t)=j|\lambda)=\alpha_{k}(t-1)a_{kj}b_{j}(\mathbf{o_{t}})$

Where $a_{kj}$ is the probability of a jump from state $k$ to state $j$, and $b_{j}(\mathbf{o_{t}})$ is the probability of generating feature vector $\mathbf{o_{t}}$ from state $j$. $\alpha_{j}(t)=\sum_{k=1}^{N}p(\mathbf{o_{1},...,o_{t},}x(t-1)=k,x(t)=j|\lambda)$

## 0.1 Forward Algorithm Initialisation $\alpha_{1}(0)=1,\alpha_{j}(0)=0$ for $1 and $\alpha_{1}(t)=0$ for $1

## 0.2 Recursion

For $t=1,2,...,T$
…… For $j=2,3,...,N-1$
…………….. $\alpha_{j}(t)=b_{j}(\mathbf{o_{t}})[\sum_{k=1}^{N-1}\alpha_{k}(t-1)a_{kj}]$

## 0.3 Termination $p(\mathbf{O}|\lambda)=\sum_{k=2}^{N-1}\alpha_{k}(T)a_{kN}$

# The Viterbi Algorithm

The forward algorithm calculated $p(\mathbf{O}|\lambda)$ by summing over all state sequences, it is sometimes preferable to approximate $p(\mathbf{O}|\lambda)$ which used all state sequences with $\hat{p}(\mathbf{O}|\lambda)$ which will use the single most likely state sequence. This is known as the Viterbi algorithm, the algorithm finds the most likely state sequence. $\hat{p}(\mathbf{O}|\lambda)=max_{X}[p(\mathbf{O},X|\lambda)]\text{ Where \ensuremath{X} is the most likely state sequence}$

The probability of the best partial path of length $t$ through the HMM ended at state $j$ is defined as: $\phi_{j}(t)=max_{X^{(t-1)}}[p(\mathbf{o_{1},...,o_{t},}x(t)=j|\lambda)]$. Where $X^{(t-1)}$ is the best partial path / state sequence.

As with the forward variable $\phi$
can be calculated recursively $\phi_{j}(t)=max_{i}[\phi_{i}(t-1)a_{ij}b_{j}(\mathbf{o_{t}})]$

## 0.4 Viterbi Algorithm Initialisation $\phi_{1}(0)=1,\phi_{j}(0)=0$ for $1 and $\phi_{1}(t)=0$ for $1\leq t\leq T$

## 0.5 Recursion

For $t=1,2,...,T$
…… For $j=2,3,...,N-1$
…………….. $\phi_{j}(t)=max_{1\leq k
…………….. store the preceding node $pred(j,t)=k$

## 0.6 Termination $p(\mathbf{O},X|\lambda)=max_{1
store the preceding node $pred(N,T)=k$

The most likely path is found by following the preceding node information backwards that is stored in $pred(j,t)$

# Underflow Errors

The direct calculation of $p(\mathbf{O}|\lambda)$ will most likely cause arithmetic underflow errors. The probabilities can become so small that the computer is unable to calculate them correctly. You should instead calculate the log likelihood e.g $log(p(\mathbf{O}|\lambda))$