编程知识 cdmana.com

Hidden Markov model (HMM) -- from theoretical proof, algorithm implementation to practical application

Catalog

A brief history

modeling

HMM General expression

Observation sequence generation mechanism

A forward algorithm for classical problems

Classic question two

The third classic question viterbi Algorithm

viterbi Algorithm solving case

viterbi Algorithm code implementation

HMM Model application


A brief history

        For Markov chains , Many people say that : By the Russian mathematician Andrey · Markov (Андрей Андреевич Марков) stay 1906-1907 From a study published in , In order to prove that the independence between random variables is not a weak law of large numbers (weak law of large numbers) And the central limit theorem (central limit theorem) Necessary conditions for establishment , We construct a stochastic process which depends on each other according to conditional probability , It is proved that it converges to a set of vectors under certain conditions . After this study was proposed , It's really out of control , On this basis, later generations have put forward various models , Like Paul · Ellen foster (Paul Ehrenfest) and Tatiana Afanasyeva stay 1907 In, Markov chain was used to establish Ehrenfest diffusion model (Ehrenfest model of diffusion) .1912 Henry · Poincare (Jules Henri Poincaré) Markov chains on finite groups are studied, and Poincare inequalities are obtained (Poincaré inequality)  .1931 year , Andrea · Kolmogorov (Андрей Николаевич Колмогоров) In the study of diffusion problem, Markov chain is extended to continuous exponential set, and continuous time Markov chain is obtained , The calculation formula of its joint distribution function is derived   . Independent of Kolmogorov ,1926 year ,Sydney Chapman The formula is also obtained when studying Brownian motion , Later Chapman-Kolmogorov equation   . The twentieth century 50 years , Former Soviet mathematician Eugene Borisovich Dynkin Improved Kolmogorov's theory and passed Dynkin The formula (Dynkin formula) The stationary Markov process and martingale process (martingale process) Related to .

        Unfortunately , today , Markov model is applied to machine learning , It bothers me …… , For historical authenticity , No research , Leave it in the corner of history ……


modeling

        In machine learning , Markov chain (Markov chain) It has been widely used . What is it ? What kind of problems can be solved ? Let's start with the simplest life case ( hereinafter referred to as “ Case study “).

        Xiaobao has 3 Days off , Every day from “ Outing ”、“ Shopping ” and “ Play the game ” Select an activity from . If it's sunny , Xiaobao is more likely to choose an outing , And if it rains , Xiaobao may stay at home and play games . Xiaobao knows the following information :

(1) The weather of that day is only related to the weather of the previous day ;

(2) The probability of rain on the first day is 0.6, The probability of sunny days is 0.4;

(3) The probability of weather change is as follows : If it rains the day before , Then the probability that it will still rain the next day is 0.7, The probability of turning Sunny is 0.3; But if the day before was sunny , Then the next day 0.6 The probability is sunny , But there is 0.4 The probability that it will rain ;

Xiaobao chooses activities according to the weather conditions as follows :

(a) If it rains ,3 The probability of being selected for each activity is :0.1,  0.4,  0.5;

(b) If it's sunny ,3 The selection probabilities of the two activities are :0.6,  0.3,  0.1;

Now if you already know Xiaobao 3 The activities arranged for the three-day holiday are... In order : Outing 、 Shopping 、 game , We will face the following three problems :

(i) How likely is Xiaobao to arrange activities like this ?

(ii) above (1)~(3) Whether these assumed probability values lead Xiaobao to choose this 3 What about the best probability of an activity ? In other words, are there other probability values that are more likely to cause Xiaobao to choose this 3 Activities ?

(iii) this 3 The weather conditions are ?

 

        The following is a case-based modeling to solve these three problems . This model is called Markov model (Hidden Markov Model,HMM). Let's start modeling .

(1) The weather —— Hidden state

        Use y_1,y_2,...,y_T Means continuous T Day weather , Every y_t It can be regarded as a random variable , Possible values are S=\{s_1,s_2,...,s_M\}, It's called the implicit state space , Because usually the weather is unobservable . For the case , The weather sequence is y_1,y_2,y_3, The state space is S={ It's raining , a sunny day }, If the first day is sunny , be y_1= a sunny day .

(2) Activities —— Observation status

        Use x_1,x_2,...,x_T Means continuous T Activities selected by tianxiaobao , Every x_t It's also a random variable , Possible values are O=\{o_1,o_2,...,o_N\}, It's called the observation state space . For the case , The activity sequence is x_1,x_2,x_3, The observation state space is O={ Outing , Shopping , game }.

(3) Homogeneous Markov chain hypothesis

        The probability of weather change can use conditional probability P(y_t|y_1,y_2,...,y_{t-1}) Express , It means in the second 1 To the first day t-1 After days of weather , The first t The probability of what kind of weather will happen in a day . According to... In the case (1), The weather of that day is only related to the weather of the previous day , We have

P(y_t|y_1,y_2,...,y_{t-1})=P(y_t|y_{t-1})

This is the homogeneous Markov chain hypothesis . According to the possibility of the transition of various states in the implicit state space , The composition is as follows M\times M Order matrix :

Q=\begin{bmatrix} q_{11} &q_{12} &... &q_{1M} \\ q_{21}&q_{22} &... &q_{2M} \\ \vdots& \vdots &\ddots & \vdots\\ q_{M1} &q_{M2} &... &q_{MM} \end{bmatrix}

q_{ij} Indicates the hidden state s_i Transferred to the s_j Probability ,Q It is called the implicit state transition matrix . For the case , transition matrix Q as follows :

Q=\begin{bmatrix} 0.7&0.3 \\ 0.4&0.6 \end{bmatrix}   

Note the implicit state transition matrix and time t irrelevant . You can use the following figure to represent :

(4) Observation independence Hypothesis

        At any time t Observation state of x_t Only depends on the hidden state of the current moment y_t, This is to simplify the model . If at the moment t The hidden state of y_t=s_i, The corresponding observation state is x_t=o_j, Then the time is hidden y_t The lower observed value x_t The probability of is recorded as g_{ij}, as follows :

P(x_t=o_j|y_t=s_i)

Consider all possible scenarios , Constitute a M\times N Order matrix G, as follows :

G=\begin{bmatrix} g_{11} &g_{12} &... &g_{1N} \\ g_{21}&g_{22} &... &g_{2N} \\ \vdots& \vdots &\ddots & \vdots\\ g_{M1} &g_{M2} &... &g_{MN} \end{bmatrix}

g_{ij} In the state of s_i Under the circumstances , Observed o_j Probability ,G It is called the observation state matrix . For the case ,G as follows :

G=\begin{bmatrix} 0.1 &0.4 &0.5 \\ 0.6&0.3 & 0.1 \end{bmatrix}.

Note the observed state transition matrix and t It doesn't matter . The observation state matrix can be represented by the following figure :

(5) Initial implicit state distribution

        The information in the case (3) Is the initial state of the weather , That is to say y_1, A probability distribution F(y_1) as follows :

y_1 It's raining a sunny day
P(y_1) 0.6  0.4

This is called the implicit state initial distribution .

        With these above , Markov model can be expressed in the following form :

\lambda=(F,Q,G)

among F Is the initial distribution of the hidden state ,Q Is the implicit state transition matrix ,G Is the observation state matrix .

The Markov model of the case can be intuitively represented by the following figure :

The diagram shows the relationship between the hidden state and the observed state for three days . The arrow indicates the dependency . Those who are familiar with probability diagrams know , This is it. HMM The probability graph of . Now model the three problems in the case , They are as follows :

(1) Estimate the observation sequence X probability

        Given model \lambda=(F,Q,G) And the observation sequence X=\{​{x_1,x_2,...,x_T}\}, The calculation is in the model λ Next observation sequence X Probability of occurrence P(X|λ).

(2) The learning problem of model parameters

        That is, given the observation sequence X=\{​{x_1,x_2,...,x_T}\}, Under the premise of satisfying Markov condition and independence assumption , Solve the model \lambda=(F,Q,G), Make the sequence observed under the model X Conditional probability of P(X|λ) Maximum , namely :\hat{\lambda}=argmax_{\lambda }P(X|\lambda).

(3) Prediction problem , Also known as decoding problem

        Given model \lambda=(F,Q,G) And the observation sequence X=\{​{x_1,x_2,...,x_T}\}, Under the condition of given observation sequence , The most likely corresponding state sequence Y=\{​{y_1,y_2,...,y_T}\}.

These three questions are HMM Three classical problems of the model . Fortunately, we can all solve .


HMM General expression

        Now we give HMM General expression of the model . First of all, let's assume S Is a collection of all possible hidden states ,O Is a collection of all possible observation States , namely :

S=\{s_1,s_2,...,s_M\},O=\{o_1,o_2,...,o_N\}

among ,M Is the number of hidden states ,N Is the number of observation States .

   For a length of T Sequence ,Y Is the corresponding state sequence , X It's the corresponding observation sequence , namely :

X=\{x_1,x_2,...,x_T\},Y=\{y_1,y_2,...,y_T\}

among x_t\in O,y_t\in S.

        Let's explain the sequence mentioned here , stay t Time and t+1 The state sequence and observation sequence at time are :

t moment :Y=\{​{y_1,y_2,...,y_t}\},X=\{​{x_1,x_2,...,x_t}\}

t+1 moment :Y=\{​{y_1,y_2,...,y_{t+1}\},X=\{​{x_1,x_2,...,x_{t+1}\}

There is a relationship between them , That is to say t+1 The moment is just t Added state based on time y_{t+1} And corresponding observations x_{t+1}. Because only experienced t Time to arrive t+1 moment , Then it appears y_{t+1} And corresponding observations x_{t+1}. Understanding this will be very helpful to the later theoretical proof .


Observation sequence generation mechanism

        The generation mechanism of observation sequence refers to HMM In a known model \lambda=(F,Q,G) after , How to generate observation sequence step by step X=\{​{x_1,x_2,...,x_T}\} Of . It helps us understand HMM Why can it be expressed as \lambda=(F,Q,G) form . On the surface of the above HMM Model representation diagram as an example :

Suppose now you are standing y_1 Location , Because the initial distribution of the hidden state is known , And the probability from the hidden state to the observed state is also known , So at this point, you can go down one step in the direction of the arrow to generate the observation value x_1, At this point, the first value of the sequence has been generated , because y_1 To y_2 The state transition probability is known , At this point you can get y_2 The probability of the hidden state , That is, you can learn from y_1 Go to the y_2, Reached y_2 after , So now from y_2 Go to the x_2 From y_1 Go to the x_1 It's similar , A second... Is generated 2 An observation , Go down in turn , The whole observation sequence can be generated X=\{​{x_1,x_2,...,x_T}\} 了 . The following use cases are generated according to this idea .

Model \lambda=(F,Q,G) Corresponding initial distribution of hidden states , Implicit state transition matrix , The observation state matrices are as follows :

y_1 It's raining a sunny day
P(y_1) 0.6  0.4

Q=\begin{bmatrix} 0.7&0.3 \\ 0.4&0.6 \end{bmatrix}

G=\begin{bmatrix} 0.1 &0.4 &0.5 \\ 0.6&0.3 & 0.1 \end{bmatrix}

Xiaobao should make the following choices :

(1) The first 1 God

Calculate according to the following formula

P(x_1)=\sum_{y_1}P(x_1,y_1)=\sum_{y_1}P(x_1|y_1)P(y_1)

The items in the last item are known ,P(x_1|y_1),P(y_1) They are the observed state probability and the initial distribution of the hidden state , as follows :

P(x1= Outing )=P( Outing | a sunny day )P( a sunny day )+P( Outing | It's raining )P( It's raining )=0.6*0.4+0.1*0.6=0.3

Empathy ,

P(x1= Shopping )=0.3*0.4+0.4+0.6=0.36,

P(x1= game )=0.1*0.4+0.5+0.6=0.34

because P(x1= Shopping ) Maximum probability , So Xiaobao is likely to choose shopping on the first day .

(2) The first 2 God

Calculate according to the following two steps :

P(y_2)=\sum_{y_1}P(y_1,y_2)=\sum_{y_1}P(y_2|y_1)P(y_1)

P(x_2)=\sum_{y_2}P(x_2,y_2)=\sum_{y_2}P(x_2|y_2)P(y_2)

P(y2= a sunny day )=P(y2= a sunny day |y1= a sunny day )P(y1= a sunny day )+P(y2= a sunny day |y1= rain )P(y1= rain )

Similarly, calculate P(y2= It's raining ).

P(x2= Outing )=P(x2= Outing |y2= a sunny day )P(y2= a sunny day )+P(x2= Outing |y2= It's raining )P(y2= It's raining )

In the same way P(x2= Shopping ) and P(x2= game ), Finally, according to the calculation results , Make activity choices .

(3) The first 3 God

The calculation is the same as the next day .

        above HMM The idea of generating observation sequence , It can be generally described as follows :

(1) Generate x1

From the initial distribution F And observation matrix G, Get the first observation x1, The calculation formula is as follows :

P(x_1)=\sum_{y_1}P(x_1,y_1)=\sum_{y_1}P(x_1|y_1)P(y_1)

(2) Generate x_tt = 2,...,T

Divided into two steps :

a、 By the transfer matrix Q and y_{t-1} State probability distribution Q Generate y_t State probability distribution ;

P(y_t)=\sum_{y_{t-1}}P(y_{t-1},y_t)=\sum_{y_{t-1}}P(y_t|y_{t-1})P(y_{t-1})

b、 take y_i The state probability distribution of is regarded as the initial distribution , Repeat step (1) Generate xi;

P(x_t)=\sum_{y_t}P(x_t,y_t)=\sum_{y_t}P(x_t|y_t)P(y_t)


        Next , Let's solve HMM Three classic problems .

One of the classic questions Forward algorithm

problem : Given the model \lambda=(F,Q,G) And the observation sequence X=\{​{x_1,x_2,...,x_T}\}, seek P(X|\lambda )

Explain : For convenience , take P(X|\lambda ) Short for P(X), In the process of solving this classical problem, about λ The conditional probabilities of are so short .

        For any state sequence Y=\{​{y_1,y_2,...,y_T}\}, The possible sequence of States is M^T individual . Let's assume that Y=\{​{y_1,y_2,...,y_T}\} It's one of them . that

P(Y)=P(y_1,y_2,...,y_T)=P(y_T|y_1,..,y_{T-1})P(y_1,..,y_{T-1})=P(y_T|y_{T-1})P(y_1,..,y_{T-1})

The last step of the above formula is obtained from the assumption of observation independence , namely y_T Only with y_1,..,y_{T-1} Medium y_{T-1} of . In the same way P(y_1,..,y_{T-1}) Pass it on , The resulting

P(Y)=P(y_T|y_{T-1})P(y_{T-1}|y_{T-2})...P(y_2|y_1)P(y1)

From the implicit state transition matrix and the initial distribution of implicit States P(Y).

Let's calculate the sequence of hidden states Y Generate observation sequence under X Conditional probability of P(X|Y)

P(X|Y)\\ =P(x_1,x_2,...,x_T|y_1,y_2,...,y_T)\\ =P(x_1|y_1,y_2,...,y_T)P(x_2|y_1,y_2,...,y_T)...P(x_T|y_1,y_2,...,y_T)\\ =P(x_1|y_1)P(x_2|y_2)...P(x_T|y_T)

The first 2 The first step is to assume the independence of observation , That is, the observed value x_t Only with the current state y_t of . From the observation matrix P(X|Y).

With P(Y) and P(X|Y), We can calculate X and Y The joint probability of P(X,Y) 了 , as follows : 

P(X,Y)=P(X|Y)P(Y)

So in the end P(X) It can be calculated as follows :

P(X)=\sum_{Y}P(X,Y)

The right side of the equation represents under various possible states (N^T individual ), therefore

P(X)=\sum_{Y}P(X,Y)=\sum_{Y}P(X|Y)P(Y)\\ =\sum_{Y}P(x_1|y_1)P(x_2|y_2)...P(x_T|y_T)\cdot P(y_T|y_{T-1})P(y_{T-1}|y_{T-2})...P(y_2|y_1)P(y1)

That's it P(X) The calculation of .

        The following is a brief description of the calculation complexity of the above method . according to P(X) Calculation formula , Complete an observation X=\{​{x_1,x_2,...,x_T}\} Probability P(X) Calculation , Need to compute 2TM^T A operation ( Calculate each Y Yes 2T A product , All in all M^T Of Y), Big use O It means complex O(TM^T). This violent calculation , The computational complexity increases with T and N The growth of is exponential , In practice, it is impossible to calculate . actually , We are < Observation sequence generation mechanism > This violent method is used in the section . Here is another simpler calculation method , be called “ Forward algorithm ”.

        Suppose at the moment t, Status as y_t=s_i, The observed value is X_t=\{x_1,x_2,...,x_t\} The probability of is written as P(X_t,y_t), It's called forward probability . Now suppose we have found t moment , In all possible states , The observed value is X The forward probability of , Now let's use the recursive method to calculate at t+1 Forward probability of each state at time P(X_{t+1},y_{t+1}), among X_{t+1}=\{x_1,x_2,...,x_t,x_{t+1}\} namely :

remember P(X_{t+1},y_t,y_{t+1}) Express t The state of the moment is y_j, And t+1 The state of the moment is y_i, The observation sequence is X Probability , therefore

P(X_{t+1},y_{t+1})

=\sum_{y_t}P(X_{t+1},y_t,y_{t+1})

=\sum_{y_t}P(X_{t+1}|y_t,y_{t+1})P(y_t,y_{t+1})

=\sum_{y_t}P(X_{t},x_{t+1}|y_t,y_{t+1})P(y_t,y_{t+1})

=\sum_{y_t}P(X_{t}|y_t,y_{t+1})P(x_{t+1}|y_t,y_{t+1})P(y_t,y_{t+1})       ( From the assumption of observation independence )

=\sum_{y_t}P(X_{t}|y_t)P(x_{t+1}|y_{t+1})P(y_t,y_{t+1})                                   

=\sum_{y_t}P(X_{t}|y_t)P(x_{t+1}|y_{t+1})P(y_{t+1}|y_t)P(y_t)

=\sum_{y_t}P(X_{t},y_t)P(x_{t+1}|y_{t+1})P(y_{t+1}|y_t)

=P(x_{t+1}|y_{t+1})\sum_{y_t}P(X_{t},y_t)P(y_{t+1}|y_t)

So now we have P(X_{t+1},y_{t+1}) and P(X_t,y_t) The recurrence relation of . in addition , The initial value of the recurrence relation , That is to say t=1 The forward probability of time is :

P(X_1,y_1)=P(x_1,y_1)=P(x_1|y_1)P(y_1)

In this way, we have the initial value and recurrence relationship of forward probability , You can calculate any time t The forward probability of P(X_t,y_t) 了 .

What is required in classic question 1 P(X), In fact, at the moment T, The observed X_T=\{x_1,x_2,...,x_T\} Probability , namely :

P(X)=P(X_T)=\sum_{y_T}P(X_T,y_T)       (1)

        The computational complexity of the algorithm is described below , Let's first calculate from P(X_t,y_t) To P(X_{t+1},y_{t+1}) Amount of computation , namely :

P(X_{t+1},y_{t+1})=P(x_{t+1}|y_{t+1})\sum_{y_t}P(X_{t},y_t)P(y_{t+1}|y_t)

Total 2M+1 A product and addition operation , So from P(X_{1},y_{1}) To P(X_{T},y_{T}) You need to calculate (T-1)(2M+1) A product and addition operation ,

According to (1) type , A total of (T-1)(2M+1)M, Big O Expressed as O(TM^2). In terms of complexity , Much better than the previous violent methods . actually , The method of violence is exhaustive , Forward probability is a recursive method .

 

Classic question two

A little .

 

The third classic question viterbi Algorithm

problem : Given the model \lambda=(F,Q,G) And the observation sequence X=\{​{x_1,x_2,...,x_T}\}, Predict the most likely hidden state sequence Y=\{​{y_1,y_2,...,y_T}\}. The mathematical expression is as follows :

\hat{Y}=arg\max_YP(Y|X)

Explain : because P(Y|X)=P(X,Y)/P(X),X Has been given , therefore P(X) Is constant , The problem is equivalent to finding Y Make the joint probability P(X,Y) The biggest one is .

Let's introduce... In combination with the figure below viterbi Algorithm , Pictured :

Each solid circle in the figure represents the possible state at a certain time , Each square represents the corresponding observation value . Arrows between filled circles , Indicates an implicit state transition , For example, the red arrow in the figure , Express t-1 Time state Y1, stay t Time shifts to state Y2.

          We noticed that , Any sequence of States Y=\{​{y_1,y_2,...,y_T}\} It just corresponds to a path in the figure one by one , This path starts from t=1 The state of the moment is y_1 Start with a solid circle , Then follow the arrow to connect to t=2 The state of the moment is y_2 The solid circle of , Keep connecting , until t=T The state of the moment is y_T The solid circle of . The corresponding path is marked as p=(y_1,y_2,...,y_T). So solve Y, In fact, it is to find an optimal path from the graph p Make the joint probability P(X,Y) Maximum . This is a viterbi Algorithm idea 1 . Let's look for such a path .

        We know from the moment 1 Time of arrival t All in all M^t Different paths , According to the time t The different states of are divided into M class , That is, at the moment t The hidden state is s_1 All paths are classified into one class , moment t Reach state s_2 The paths of are divided into one kind , wait , The categories are C_i(i=1,2,...,M). Now look for one of all paths that makes P(X,Y) The largest optimal path problem is transformed from M First, find out one in each class to make P(X,Y) The largest path , And then from here M Found in path P(X,Y) The largest path . In this way, we decompose a problem into M A little question . This is a viterbi Algorithm idea 2 . Now fix t The state of the moment , Assuming that s_i, In Category C_i Chinese envoy P(X,Y) The path to the maximum is recorded as \hat{p}=(\hat{y_1},\hat{y_2},...,\hat{y_{t-1}},s_i), The maximum probability is recorded as \delta_t(i). in addition , We also need to record at this time \hat{y}_{t-1}} The state of , That is, the state of the penultimate node , Write it down as \varphi_t(i), namely \hat{y}_{t-1}}=\varphi_t(i). For the convenience of writing , Reference the previous token

X_t=\{x_1,x_2,...,x_t\},Y_t=\{y_1,y_2,...,y_t\}

according to \delta_t(i) The definition of , It can be expressed as follows :

\delta_t(i)=\max_{Y_{t-1}}P(Y_{t-1},y_t=s_i,X_t)

Empathy t+1 The moment corresponds to \delta_{t+1}(i), As follows :

\delta_{t+1}(i)=\max_{Y_{t}}P(Y_{t},y_{t+1}=s_i,X_{t+1})

Here's the derivation \delta_t(i) and \delta_{t+1}(i) The relationship between the two , be aware x_{t+1} And x_1,...x_t,y_1,...,y_t irrelevant ,y_{t+1} Only with y_{t} of , therefore

P(Y_{t},y_{t+1}=s_i,X_{t+1})

=P(Y_{t},y_{t+1}=s_i,X_{t},x_{t+1})

=P(y_{t+1}=s_i,x_{t+1}|X_t,Y_{t})P(X_t,Y_{t})

=P(x_{t+1}|X_t,Y_{t},y_{t+1}=s_i)P(y_{t+1}=s_i|X_t,Y_{t})P(X_t,Y_{t})

=P(x_{t+1}|y_{t+1}=s_i)P(y_{t+1}=s_i|y_{t})P(X_t,Y_{t})   (2)

equation (2) On both sides, first pair the variables Y_{t-1} For maximum , And then for variables y_t For maximum :

\delta_{t+1}(i)=\max_{Y_{t}}P(Y_{t},y_{t+1}=s_i,X_{t+1})

=\max_{y_{t}}\max_{Y_{t-1}}P(Y_{t},y_{t+1}=s_i,X_{t+1})

=\max_{y_{t}}\max_{Y_{t-1}}P(x_{t+1}|y_{t+1}=s_i)P(y_{t+1}=s_i|y_{t})P(X_t,Y_{t})

=P(x_{t+1}|y_{t+1}=s_i)\max_{y_{t}}\max_{Y_{t-1}}P(y_{t+1}=s_i|y_{t})P(X_t,Y_{t})

=P(x_{t+1}|y_{t+1}=s_i)\max_{y_{t}}[P(y_{t+1}=s_i|y_{t})\max_{Y_{t-1}}P(X_t,Y_{t})]

=P(x_{t+1}|y_{t+1}=s_i)\max_{y_{t}}[P(y_{t+1}=s_i|y_{t})\delta_t(y_t)]                            (3)

The last maximum value is selected from the following values :

\\P(y_{t+1}=s_i|y_{t}=s_1)\delta_t(1),\\P(y_{t+1}=s_i|y_{t}=s_2)\delta_t(2),\\...,\\P(y_{t+1}=s_i|y_{t}=s_M)\delta_t(M)

such \delta_t(i) and \delta_{t+1}(i) Recursion is established .

This recursion shows : If we have found it from the moment 1 To the moment t All possible states of s_j Corresponding maximum probability \delta_t(j),j=1,2,...,M, This probability is multiplied by the transition probability \\P(y_{t+1}=s_i|y_{t}=s_j), Then find the maximum of their product , Finally, multiply by the observation probability =P(x_{t+1}|y_{t+1}=s_i), That's what I found from the moment 1 To the moment t+1 Status as s_i The maximum probability of \delta_{t+1}(i). It can be shown in the figure below :

The optimal path is the path with the largest probability product . Suppose the largest of them is \delta_{t+1}(\hat{y}_{t+1}), So now we can at least determine the last two nodes of the optimal path , Namely \hat{y}_t=\varphi (\hat{y}_{t+1}) and \hat{y}_{t+1}. How to find the first nodes in the optimal path ? in fact , The recurrence formula above (3) Implies a property :

\delta_{t+1}(i) The best path p=(\hat{y_1},...,\hat{y}_{t-1},\hat{y_t}=j,\hat{y}_{t+1}=i) Before t A path of nodes q=(\hat{y_1},...,\hat{y}_{t-1},\hat{y_t}=j) yes \delta_{t}(j) The corresponding optimal path . because p yes \delta_{t+1}(i) The best path , So the formula (3) Simplified as

\delta_{t+1}(i)=P(x_{t+1}|y_{t+1}=i)P(y_{t+1}=i|y_t=j)\delta_t(j)

When the State i and j After fixing ,\delta_{t+1}(i) and \delta_{t}(j) In direct proportion to , that q Namely \delta_{t}(j) The best path , If not , hypothesis \delta_{t}(j) The optimal path is m=(\hat{k_1},...,\hat{k}_{t-1},\hat{y_t}=j)(m\neq q), So the path n=(\hat{k_1},...,\hat{k}_{t-1},\hat{y_t}=j,\hat{y}_{t+1}=i)(n\neq p) It will be \delta_{t+1}(i) The best path , contradiction .

 

Based on the above introduction , We use \varphi_{t+1}(i) The best path can be found by tracing back the recorded state . As follows :

(1)\delta_{t+1}(\hat{y}_{t+1}) bring P(Y^{(t+1)},X^{(t+1)}) Maximum , therefore \hat{y}_{t+1} Namely t+1 The state of the moment ;

(2)\varphi_{t+1}(i) Recorded in t+1 At the moment , The first t The state of the node ,\varphi_{t+1}(\hat{y}_{t+1}) This is the state of this node , Write it down as \hat{y}_{t}=\varphi_{t+1}(\hat{y}_{t+1});

(3)\varphi_{t}(\hat{y}_{t}) The value of is recorded in t moment ,t-1 The state of the moment , Write it down as \hat{y}_{t-1}=\varphi_{t}(\hat{y}_{t})

...

Recursively \hat{y}_{t}=\varphi_{t+1}(\hat{y}_{t+1}) Go back to t=1 The state of the moment , You can get

\hat{Y}=\{​{\hat{y_1},\hat{y_2},...,\hat{y}_{t+1}}\}

 

viterbi The basic idea of the algorithm :

(1) From the initial \delta_1(i) And recursive formulas , Calculate any time t Of \delta_t(i), Record the status of the previous node at the same time \varphi_t (i);

(2) find \delta_t(i) The biggest of all , Its corresponding path is what we are looking for , At this point, you can get the last two node states i and \varphi_t (i);

(3) Use \varphi_t(i) The previous status of the record , Go back to the state at all times , obtain \hat{Y}=\{​{\hat{y_1},\hat{y_2},...,\hat{y}_{t+1}}\}

 

viterbi Algorithm solving case

Now let's “ Xiao Bao travels ” Case to familiarize yourself with vertibi Algorithm .

(1) For the moment t=1, That's the first day , The possible states are { It's raining , a sunny day }, Xiao Bao chose an outing on his first day , in addition , from Start At first there was only one path to sunny days , therefore

\delta_1( a sunny day )=P(x1= Outing ,y1= a sunny day )=P(x1= Outing |y1= a sunny day )P(y1= a sunny day )=0.6*04=0.24

\delta_1( It's raining )=P(x1= Outing ,y1= It's raining )=P(x1= Outing |y1= It's raining )P(y1= It's raining )=0.1*0.6=0.06

Because there was no state the day before the first day , therefore \varphi_1(i) non-existent .

 

(2) For the moment t=2, The next day , Xiaobao chose to go shopping , According to recursion, there are :

\delta_2(i)=P(x_{2}|y_{2}=i)max_{j\in YY}P(y_{2}=i|y_1=j)\delta_1(j), therefore ,

\delta_2( a sunny day )=P(x2= Shopping |y2= a sunny day )*max{P(y2= a sunny day |y1= a sunny day )\delta_1( a sunny day ),P(y2= a sunny day |y1= It's raining )\delta_1( It's raining )}

            =0.3*max{0.6*0.24,0.3*0.06}=0.3*max{0.144,0.018}=0.0432

\varphi_2( a sunny day )= a sunny day

Empathy ,

\delta_2( It's raining )=P(x2= Shopping |y2= It's raining )*max{P(y2= It's raining |y1= a sunny day )\delta_1( a sunny day ),P(y2= It's raining |y1= It's raining )\delta_1( It's raining )}

             =0.4*max{0.4*0.24,0.7*0.06}=0.4*max{0.096,0.042}=0.0384

\varphi_2( It's raining )= a sunny day

 

(3) For the moment t=3, That's the third day , Xiaobao chooses the game , According to recursion :

\delta_3(i)=P(x_{3}|y_{3}=i)max_{j\in YY}P(y_{3}=i|y_2=j)\delta_2(j), therefore ,

\delta_3( a sunny day )=P(x3= game |y3= a sunny day )*max{P(y3= a sunny day |y2= a sunny day )\delta_2( a sunny day ),P(y3= a sunny day |y2= It's raining )\delta_2( It's raining )}

             =0.1*max{0.6*0.0432,0.3*0.0384}=0.1*max{0.02592,0.01152}=0.002592

\varphi_3( a sunny day )= a sunny day

\delta_3( It's raining )=P(x3= game |y3= It's raining )*max{P(y3= It's raining |y2= a sunny day )\delta_2( a sunny day ),P(y3= It's raining |y2= It's raining )\delta_2( It's raining )}

             =0.5*max{0.4*0.0432,0.7*0.0384}=0.5*max{0.01728,0.02688}=0.01344

\varphi_3( It's raining )= It's raining

 

Let's go back :

\delta_3( It's raining )>\delta_3( a sunny day ), So the first 3 It's raining , and \varphi_3( It's raining )= It's raining , So it rained the next day , according to \varphi_2( It's raining )= a sunny day , So the first day was sunny .

Extrapolate 3 The weather is : a sunny day , It's raining , It's raining .

 

viterbi Algorithm code implementation

import numpy as np


def viterbi(trans_prob, emit_prob, init_prob, views, states, obs):
    """
    viterbi Algorithm 

    :param trans_prob:  State transition probability matrix 
    :param emit_prob:  Observe the probability matrix , Also known as the launch probability matrix 
    :param init_prob:  Initial state distribution 
    :param views:  All possible sets of observations 
    :param states:  All possible state sets 
    :param obs:  Sequence of actual observations 
    :return:
    """

    state_num, obs_len = len(states), len(obs)
    delta = np.array([[0] * state_num] * obs_len, dtype=np.float64)
    phi = np.array([[0] * state_num] * obs_len, dtype=np.int64)
    print('state_num=', state_num, 'obs_len=', obs_len)
    print('delta=', delta)
    print('phi=', phi)
    #  initialization 
    for i in range(state_num):
        delta[0, i] = init_prob[i] * emit_prob[i][views.index(obs[0])]
        phi[0, i] = 0
    print(' After the initialization delta=', delta)
    print(' After the initialization phi=', phi)

    #  Recursive computation 
    for i in range(1, obs_len):
        for j in range(state_num):
            tmp = [delta[i - 1, k] * trans_prob[k][j] for k in range(state_num)]
            delta[i, j] = max(tmp) * emit_prob[j][views.index(obs[i])]
            phi[i, j] = tmp.index(max(tmp))

    #  Final probability and node 
    max_prob = max(delta[obs_len - 1, :])
    last_state = int(np.argmax(delta[obs_len - 1, :]))

    #  The best path path
    path = [last_state]
    for i in reversed(range(1, obs_len)):
        end = path[-1]
        path.append(phi[i, end])

    hidden_states = [states[i] for i in reversed(path)]

    return max_prob, hidden_states


def main():
    #  All possible state sets 
    states = (' a sunny day ', ' It's raining ')
    #  Observation set 
    views = [' Outing ', ' Shopping ', ' game ']
    #  Transfer probability : Q -> Q
    trans_prob = [[0.6, 0.4],
                  [0.3, 0.7]]
    #  Observation probability , Q -> V
    emit_prob = [[0.6, 0.3, 0.1],
                 [0.1, 0.4, 0.5]]
    #  Initial probability 
    init_prob = [0.4, 0.6]
    #  Observation sequence 
    obs = [' Outing ', ' Shopping ', ' game ']

    max_prob, hidden_states = viterbi(trans_prob, emit_prob, init_prob, views, states, obs)
    print(' The maximum probability is : %.5f.' % max_prob)
    print(' The hidden sequence is :%s.' % hidden_states)


if __name__ == '__main__':
    main()

      
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.

 

HMM Model application

Chinese word segmentation

 

The stock market analysis

 

版权声明
本文为[wx5af853e4b9fed]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/07/20210716095409479t.html

Scroll to Top