## 编程知识 cdmana.com

### Hidden Markov model (HMM) -- from theoretical proof, algorithm implementation to practical application

Catalog

A brief history

modeling

HMM General expression

Observation sequence generation mechanism

A forward algorithm for classical problems

Classic question two

The third classic question viterbi Algorithm

viterbi Algorithm solving case

viterbi Algorithm code implementation

HMM Model application

### A brief history

For Markov chains , Many people say that ： By the Russian mathematician Andrey · Markov （Андрей Андреевич Марков） stay 1906-1907 From a study published in , In order to prove that the independence between random variables is not a weak law of large numbers （weak law of large numbers） And the central limit theorem （central limit theorem） Necessary conditions for establishment , We construct a stochastic process which depends on each other according to conditional probability , It is proved that it converges to a set of vectors under certain conditions . After this study was proposed , It's really out of control , On this basis, later generations have put forward various models , Like Paul · Ellen foster （Paul Ehrenfest） and Tatiana Afanasyeva stay 1907 In, Markov chain was used to establish Ehrenfest diffusion model （Ehrenfest model of diffusion） .1912 Henry · Poincare （Jules Henri Poincaré） Markov chains on finite groups are studied, and Poincare inequalities are obtained （Poincaré inequality）  .1931 year , Andrea · Kolmogorov （Андрей Николаевич Колмогоров） In the study of diffusion problem, Markov chain is extended to continuous exponential set, and continuous time Markov chain is obtained , The calculation formula of its joint distribution function is derived   . Independent of Kolmogorov ,1926 year ,Sydney Chapman The formula is also obtained when studying Brownian motion , Later Chapman-Kolmogorov equation   . The twentieth century 50 years , Former Soviet mathematician Eugene Borisovich Dynkin Improved Kolmogorov's theory and passed Dynkin The formula （Dynkin formula） The stationary Markov process and martingale process （martingale process） Related to .

Unfortunately , today , Markov model is applied to machine learning , It bothers me …… , For historical authenticity , No research , Leave it in the corner of history ……

### modeling

In machine learning , Markov chain (Markov chain) It has been widely used . What is it ？ What kind of problems can be solved ？ Let's start with the simplest life case （ hereinafter referred to as “ Case study “）.

Xiaobao has 3 Days off , Every day from “ Outing ”、“ Shopping ” and “ Play the game ” Select an activity from . If it's sunny , Xiaobao is more likely to choose an outing , And if it rains , Xiaobao may stay at home and play games . Xiaobao knows the following information ：

（1） The weather of that day is only related to the weather of the previous day ;

（2） The probability of rain on the first day is 0.6, The probability of sunny days is 0.4;

（3） The probability of weather change is as follows ： If it rains the day before , Then the probability that it will still rain the next day is 0.7, The probability of turning Sunny is 0.3; But if the day before was sunny , Then the next day 0.6 The probability is sunny , But there is 0.4 The probability that it will rain ;

Xiaobao chooses activities according to the weather conditions as follows ：

（a） If it rains ,3 The probability of being selected for each activity is ：0.1,  0.4,  0.5;

（b） If it's sunny ,3 The selection probabilities of the two activities are ：0.6,  0.3,  0.1;

Now if you already know Xiaobao 3 The activities arranged for the three-day holiday are... In order ： Outing 、 Shopping 、 game , We will face the following three problems ：

（i） How likely is Xiaobao to arrange activities like this ？

（ii） above （1）~（3） Whether these assumed probability values lead Xiaobao to choose this 3 What about the best probability of an activity ？ In other words, are there other probability values that are more likely to cause Xiaobao to choose this 3 Activities ？

（iii） this 3 The weather conditions are ？

The following is a case-based modeling to solve these three problems . This model is called Markov model （Hidden Markov Model,HMM）. Let's start modeling .

（1） The weather —— Hidden state

Use Means continuous T Day weather , Every It can be regarded as a random variable , Possible values are , It's called the implicit state space , Because usually the weather is unobservable . For the case , The weather sequence is , The state space is S={ It's raining , a sunny day }, If the first day is sunny , be = a sunny day .

（2） Activities —— Observation status

Use Means continuous T Activities selected by tianxiaobao , Every It's also a random variable , Possible values are , It's called the observation state space . For the case , The activity sequence is , The observation state space is O={ Outing , Shopping , game }.

（3） Homogeneous Markov chain hypothesis

The probability of weather change can use conditional probability Express , It means in the second 1 To the first day t-1 After days of weather , The first t The probability of what kind of weather will happen in a day . According to... In the case （1）, The weather of that day is only related to the weather of the previous day , We have

This is the homogeneous Markov chain hypothesis . According to the possibility of the transition of various states in the implicit state space , The composition is as follows Order matrix ：

Indicates the hidden state Transferred to the Probability , It is called the implicit state transition matrix . For the case , transition matrix as follows ：

Note the implicit state transition matrix and time t irrelevant . You can use the following figure to represent ：

（4） Observation independence Hypothesis

At any time t Observation state of Only depends on the hidden state of the current moment , This is to simplify the model . If at the moment t The hidden state of , The corresponding observation state is , Then the time is hidden The lower observed value The probability of is recorded as , as follows ：

Consider all possible scenarios , Constitute a Order matrix , as follows ：

In the state of Under the circumstances , Observed Probability , It is called the observation state matrix . For the case , as follows ：

.

Note the observed state transition matrix and t It doesn't matter . The observation state matrix can be represented by the following figure ：

（5） Initial implicit state distribution

The information in the case （3） Is the initial state of the weather , That is to say , A probability distribution as follows ：

 It's raining a sunny day 0.6 0.4

This is called the implicit state initial distribution .

With these above , Markov model can be expressed in the following form ：

among F Is the initial distribution of the hidden state ,Q Is the implicit state transition matrix ,G Is the observation state matrix .

The Markov model of the case can be intuitively represented by the following figure ：

The diagram shows the relationship between the hidden state and the observed state for three days . The arrow indicates the dependency . Those who are familiar with probability diagrams know , This is it. HMM The probability graph of . Now model the three problems in the case , They are as follows ：

（1） Estimate the observation sequence X probability

Given model And the observation sequence , The calculation is in the model λ Next observation sequence X Probability of occurrence P(X|λ).

（2） The learning problem of model parameters

That is, given the observation sequence , Under the premise of satisfying Markov condition and independence assumption , Solve the model , Make the sequence observed under the model X Conditional probability of P(X|λ) Maximum , namely ：.

（3） Prediction problem , Also known as decoding problem

Given model And the observation sequence , Under the condition of given observation sequence , The most likely corresponding state sequence .

These three questions are HMM Three classical problems of the model . Fortunately, we can all solve .

### HMM General expression

Now we give HMM General expression of the model . First of all, let's assume Is a collection of all possible hidden states , Is a collection of all possible observation States , namely ：

,

among ,M Is the number of hidden states ,N Is the number of observation States .

For a length of T Sequence ,Y Is the corresponding state sequence , X It's the corresponding observation sequence , namely ：

,

among ,.

Let's explain the sequence mentioned here , stay t Time and t+1 The state sequence and observation sequence at time are ：

t moment ：

t+1 moment ：

There is a relationship between them , That is to say t+1 The moment is just t Added state based on time And corresponding observations . Because only experienced t Time to arrive t+1 moment , Then it appears And corresponding observations . Understanding this will be very helpful to the later theoretical proof .

### Observation sequence generation mechanism

The generation mechanism of observation sequence refers to HMM In a known model after , How to generate observation sequence step by step Of . It helps us understand HMM Why can it be expressed as form . On the surface of the above HMM Model representation diagram as an example ：

Suppose now you are standing Location , Because the initial distribution of the hidden state is known , And the probability from the hidden state to the observed state is also known , So at this point, you can go down one step in the direction of the arrow to generate the observation value , At this point, the first value of the sequence has been generated , because To The state transition probability is known , At this point you can get The probability of the hidden state , That is, you can learn from Go to the , Reached after , So now from Go to the From Go to the It's similar , A second... Is generated 2 An observation , Go down in turn , The whole observation sequence can be generated 了 . The following use cases are generated according to this idea .

Model Corresponding initial distribution of hidden states , Implicit state transition matrix , The observation state matrices are as follows ：

 It's raining a sunny day 0.6 0.4

Xiaobao should make the following choices ：

（1） The first 1 God

Calculate according to the following formula

The items in the last item are known , They are the observed state probability and the initial distribution of the hidden state , as follows ：

P(x1= Outing )=P( Outing | a sunny day )P( a sunny day )+P( Outing | It's raining )P( It's raining )=0.6*0.4+0.1*0.6=0.3

Empathy ,

P(x1= Shopping )=0.3*0.4+0.4+0.6=0.36,

P(x1= game )=0.1*0.4+0.5+0.6=0.34

because P(x1= Shopping ) Maximum probability , So Xiaobao is likely to choose shopping on the first day .

（2） The first 2 God

Calculate according to the following two steps ：

P(y2= a sunny day )=P(y2= a sunny day |y1= a sunny day )P(y1= a sunny day )+P(y2= a sunny day |y1= rain )P(y1= rain )

Similarly, calculate P(y2= It's raining ).

P(x2= Outing )=P(x2= Outing |y2= a sunny day )P(y2= a sunny day )+P(x2= Outing |y2= It's raining )P(y2= It's raining )

In the same way P(x2= Shopping ) and P(x2= game ), Finally, according to the calculation results , Make activity choices .

（3） The first 3 God

The calculation is the same as the next day .

above HMM The idea of generating observation sequence , It can be generally described as follows ：

（1） Generate x1

From the initial distribution F And observation matrix G, Get the first observation x1, The calculation formula is as follows ：

（2） Generate

Divided into two steps ：

a、 By the transfer matrix Q and State probability distribution Q Generate State probability distribution ;

b、 take The state probability distribution of is regarded as the initial distribution , Repeat step （1） Generate xi;

Next , Let's solve HMM Three classic problems .

### One of the classic questions Forward algorithm

problem ： Given the model And the observation sequence , seek

Explain ： For convenience , take Short for , In the process of solving this classical problem, about λ The conditional probabilities of are so short .

For any state sequence , The possible sequence of States is individual . Let's assume that It's one of them . that

The last step of the above formula is obtained from the assumption of observation independence , namely Only with Medium of . In the same way Pass it on , The resulting

From the implicit state transition matrix and the initial distribution of implicit States .

Let's calculate the sequence of hidden states Y Generate observation sequence under X Conditional probability of

The first 2 The first step is to assume the independence of observation , That is, the observed value Only with the current state of . From the observation matrix .

With and , We can calculate X and Y The joint probability of 了 , as follows ：

So in the end It can be calculated as follows ：

The right side of the equation represents under various possible states （ individual ）, therefore

That's it The calculation of .

The following is a brief description of the calculation complexity of the above method . according to Calculation formula , Complete an observation Probability Calculation , Need to compute A operation （ Calculate each Y Yes 2T A product , All in all Of Y）, Big use O It means complex . This violent calculation , The computational complexity increases with T and N The growth of is exponential , In practice, it is impossible to calculate . actually , We are < Observation sequence generation mechanism > This violent method is used in the section . Here is another simpler calculation method , be called “ Forward algorithm ”.

Suppose at the moment t, Status as , The observed value is The probability of is written as , It's called forward probability . Now suppose we have found t moment , In all possible states , The observed value is The forward probability of , Now let's use the recursive method to calculate at t+1 Forward probability of each state at time , among namely ：

remember Express t The state of the moment is , And t+1 The state of the moment is , The observation sequence is Probability , therefore

（ From the assumption of observation independence ）

So now we have and The recurrence relation of . in addition , The initial value of the recurrence relation , That is to say t=1 The forward probability of time is ：

In this way, we have the initial value and recurrence relationship of forward probability , You can calculate any time t The forward probability of 了 .

What is required in classic question 1 , In fact, at the moment T, The observed Probability , namely ：

（1）

The computational complexity of the algorithm is described below , Let's first calculate from To Amount of computation , namely ：

Total A product and addition operation , So from To You need to calculate A product and addition operation ,

According to （1） type , A total of , Big O Expressed as . In terms of complexity , Much better than the previous violent methods . actually , The method of violence is exhaustive , Forward probability is a recursive method .

A little .

### The third classic question viterbi Algorithm

problem ： Given the model =（F,Q,G） And the observation sequence , Predict the most likely hidden state sequence . The mathematical expression is as follows ：

Explain ： because , Has been given , therefore Is constant , The problem is equivalent to finding Make the joint probability The biggest one is .

Let's introduce... In combination with the figure below viterbi Algorithm , Pictured ：

Each solid circle in the figure represents the possible state at a certain time , Each square represents the corresponding observation value . Arrows between filled circles , Indicates an implicit state transition , For example, the red arrow in the figure , Express t-1 Time state Y1, stay t Time shifts to state Y2.

We noticed that , Any sequence of States It just corresponds to a path in the figure one by one , This path starts from t=1 The state of the moment is Start with a solid circle , Then follow the arrow to connect to t=2 The state of the moment is The solid circle of , Keep connecting , until t=T The state of the moment is The solid circle of . The corresponding path is marked as . So solve , In fact, it is to find an optimal path from the graph Make the joint probability Maximum . This is a viterbi Algorithm idea 1 . Let's look for such a path .

We know from the moment 1 Time of arrival t All in all Different paths , According to the time t The different states of are divided into class , That is, at the moment t The hidden state is All paths are classified into one class , moment t Reach state The paths of are divided into one kind , wait , The categories are . Now look for one of all paths that makes The largest optimal path problem is transformed from M First, find out one in each class to make The largest path , And then from here M Found in path The largest path . In this way, we decompose a problem into M A little question . This is a viterbi Algorithm idea 2 . Now fix t The state of the moment , Assuming that , In Category Chinese envoy The path to the maximum is recorded as , The maximum probability is recorded as . in addition , We also need to record at this time The state of , That is, the state of the penultimate node , Write it down as , namely . For the convenience of writing , Reference the previous token

,

according to The definition of , It can be expressed as follows ：

Empathy t+1 The moment corresponds to , As follows ：

Here's the derivation and The relationship between the two , be aware And irrelevant , Only with of , therefore

（2）

equation （2） On both sides, first pair the variables For maximum , And then for variables For maximum ：

（3）

The last maximum value is selected from the following values ：

such and Recursion is established .

This recursion shows ： If we have found it from the moment 1 To the moment t All possible states of Corresponding maximum probability , This probability is multiplied by the transition probability , Then find the maximum of their product , Finally, multiply by the observation probability , That's what I found from the moment 1 To the moment t+1 Status as The maximum probability of . It can be shown in the figure below ：

The optimal path is the path with the largest probability product . Suppose the largest of them is , So now we can at least determine the last two nodes of the optimal path , Namely and . How to find the first nodes in the optimal path ？ in fact , The recurrence formula above （3） Implies a property ：

The best path Before t A path of nodes yes The corresponding optimal path . because yes The best path , So the formula （3） Simplified as

When the State i and j After fixing , and In direct proportion to , that Namely The best path , If not , hypothesis The optimal path is , So the path It will be The best path , contradiction .

Based on the above introduction , We use The best path can be found by tracing back the recorded state . As follows ：

（1） bring Maximum , therefore Namely t+1 The state of the moment ;

（2） Recorded in t+1 At the moment , The first t The state of the node , This is the state of this node , Write it down as ;

（3） The value of is recorded in t moment ,t-1 The state of the moment , Write it down as

...

Recursively Go back to t=1 The state of the moment , You can get

viterbi The basic idea of the algorithm ：

（1） From the initial And recursive formulas , Calculate any time t Of , Record the status of the previous node at the same time ;

（2） find The biggest of all , Its corresponding path is what we are looking for , At this point, you can get the last two node states i and ;

（3） Use The previous status of the record , Go back to the state at all times , obtain

### viterbi Algorithm solving case

Now let's “ Xiao Bao travels ” Case to familiarize yourself with vertibi Algorithm .

（1） For the moment t=1, That's the first day , The possible states are { It's raining , a sunny day }, Xiao Bao chose an outing on his first day , in addition , from Start At first there was only one path to sunny days , therefore

( a sunny day )=P(x1= Outing ,y1= a sunny day )=P(x1= Outing |y1= a sunny day )P(y1= a sunny day )=0.6*04=0.24

( It's raining )=P(x1= Outing ,y1= It's raining )=P(x1= Outing |y1= It's raining )P(y1= It's raining )=0.1*0.6=0.06

Because there was no state the day before the first day , therefore non-existent .

（2） For the moment t=2, The next day , Xiaobao chose to go shopping , According to recursion, there are ：

, therefore ,

( a sunny day )=P(x2= Shopping |y2= a sunny day )*max{P(y2= a sunny day |y1= a sunny day )( a sunny day ),P(y2= a sunny day |y1= It's raining )( It's raining )}

=0.3*max{0.6*0.24,0.3*0.06}=0.3*max{0.144,0.018}=0.0432

( a sunny day )= a sunny day

Empathy ,

( It's raining )=P(x2= Shopping |y2= It's raining )*max{P(y2= It's raining |y1= a sunny day )( a sunny day ),P(y2= It's raining |y1= It's raining )( It's raining )}

=0.4*max{0.4*0.24,0.7*0.06}=0.4*max{0.096,0.042}=0.0384

( It's raining )= a sunny day

（3） For the moment t=3, That's the third day , Xiaobao chooses the game , According to recursion ：

, therefore ,

( a sunny day )=P(x3= game |y3= a sunny day )*max{P(y3= a sunny day |y2= a sunny day )( a sunny day ),P(y3= a sunny day |y2= It's raining )( It's raining )}

=0.1*max{0.6*0.0432,0.3*0.0384}=0.1*max{0.02592,0.01152}=0.002592

( a sunny day )= a sunny day

( It's raining )=P(x3= game |y3= It's raining )*max{P(y3= It's raining |y2= a sunny day )( a sunny day ),P(y3= It's raining |y2= It's raining )( It's raining )}

=0.5*max{0.4*0.0432,0.7*0.0384}=0.5*max{0.01728,0.02688}=0.01344

( It's raining )= It's raining

Let's go back ：

( It's raining )>( a sunny day ), So the first 3 It's raining , and ( It's raining )= It's raining , So it rained the next day , according to ( It's raining )= a sunny day , So the first day was sunny .

Extrapolate 3 The weather is ： a sunny day , It's raining , It's raining .

### viterbi Algorithm code implementation

import numpy as np

def viterbi(trans_prob, emit_prob, init_prob, views, states, obs):
"""
viterbi Algorithm

:param trans_prob:  State transition probability matrix
:param emit_prob:  Observe the probability matrix , Also known as the launch probability matrix
:param init_prob:  Initial state distribution
:param views:  All possible sets of observations
:param states:  All possible state sets
:param obs:  Sequence of actual observations
:return:
"""

state_num, obs_len = len(states), len(obs)
delta = np.array([[0] * state_num] * obs_len, dtype=np.float64)
phi = np.array([[0] * state_num] * obs_len, dtype=np.int64)
print('state_num=', state_num, 'obs_len=', obs_len)
print('delta=', delta)
print('phi=', phi)
#  initialization
for i in range(state_num):
delta[0, i] = init_prob[i] * emit_prob[i][views.index(obs[0])]
phi[0, i] = 0
print(' After the initialization delta=', delta)
print(' After the initialization phi=', phi)

#  Recursive computation
for i in range(1, obs_len):
for j in range(state_num):
tmp = [delta[i - 1, k] * trans_prob[k][j] for k in range(state_num)]
delta[i, j] = max(tmp) * emit_prob[j][views.index(obs[i])]
phi[i, j] = tmp.index(max(tmp))

#  Final probability and node
max_prob = max(delta[obs_len - 1, :])
last_state = int(np.argmax(delta[obs_len - 1, :]))

#  The best path path
path = [last_state]
for i in reversed(range(1, obs_len)):
end = path[-1]
path.append(phi[i, end])

hidden_states = [states[i] for i in reversed(path)]

return max_prob, hidden_states

def main():
#  All possible state sets
states = (' a sunny day ', ' It's raining ')
#  Observation set
views = [' Outing ', ' Shopping ', ' game ']
#  Transfer probability : Q -> Q
trans_prob = [[0.6, 0.4],
[0.3, 0.7]]
#  Observation probability , Q -> V
emit_prob = [[0.6, 0.3, 0.1],
[0.1, 0.4, 0.5]]
#  Initial probability
init_prob = [0.4, 0.6]
#  Observation sequence
obs = [' Outing ', ' Shopping ', ' game ']

max_prob, hidden_states = viterbi(trans_prob, emit_prob, init_prob, views, states, obs)
print(' The maximum probability is : %.5f.' % max_prob)
print(' The hidden sequence is ：%s.' % hidden_states)

if __name__ == '__main__':
main()


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.


### HMM Model application

Chinese word segmentation

The stock market analysis

https://cdmana.com/2021/07/20210716095409479t.html

Scroll to Top