Catalog
A brief history
modeling
HMM General expression
Observation sequence generation mechanism
A forward algorithm for classical problems
Classic question two
The third classic question viterbi Algorithm
viterbi Algorithm solving case
viterbi Algorithm code implementation
HMM Model application
A brief history
For Markov chains , Many people say that : By the Russian mathematician Andrey · Markov (Андрей Андреевич Марков) stay 1906-1907 From a study published in , In order to prove that the independence between random variables is not a weak law of large numbers (weak law of large numbers) And the central limit theorem (central limit theorem) Necessary conditions for establishment , We construct a stochastic process which depends on each other according to conditional probability , It is proved that it converges to a set of vectors under certain conditions . After this study was proposed , It's really out of control , On this basis, later generations have put forward various models , Like Paul · Ellen foster (Paul Ehrenfest) and Tatiana Afanasyeva stay 1907 In, Markov chain was used to establish Ehrenfest diffusion model (Ehrenfest model of diffusion) .1912 Henry · Poincare (Jules Henri Poincaré) Markov chains on finite groups are studied, and Poincare inequalities are obtained (Poincaré inequality) .1931 year , Andrea · Kolmogorov (Андрей Николаевич Колмогоров) In the study of diffusion problem, Markov chain is extended to continuous exponential set, and continuous time Markov chain is obtained , The calculation formula of its joint distribution function is derived . Independent of Kolmogorov ,1926 year ,Sydney Chapman The formula is also obtained when studying Brownian motion , Later Chapman-Kolmogorov equation . The twentieth century 50 years , Former Soviet mathematician Eugene Borisovich Dynkin Improved Kolmogorov's theory and passed Dynkin The formula (Dynkin formula) The stationary Markov process and martingale process (martingale process) Related to .
Unfortunately , today , Markov model is applied to machine learning , It bothers me …… , For historical authenticity , No research , Leave it in the corner of history ……
modeling
In machine learning , Markov chain (Markov chain) It has been widely used . What is it ? What kind of problems can be solved ? Let's start with the simplest life case ( hereinafter referred to as “ Case study “).
Xiaobao has 3 Days off , Every day from “ Outing ”、“ Shopping ” and “ Play the game ” Select an activity from . If it's sunny , Xiaobao is more likely to choose an outing , And if it rains , Xiaobao may stay at home and play games . Xiaobao knows the following information :
(1) The weather of that day is only related to the weather of the previous day ;
(2) The probability of rain on the first day is 0.6, The probability of sunny days is 0.4;
(3) The probability of weather change is as follows : If it rains the day before , Then the probability that it will still rain the next day is 0.7, The probability of turning Sunny is 0.3; But if the day before was sunny , Then the next day 0.6 The probability is sunny , But there is 0.4 The probability that it will rain ;
Xiaobao chooses activities according to the weather conditions as follows :
(a) If it rains ,3 The probability of being selected for each activity is :0.1, 0.4, 0.5;
(b) If it's sunny ,3 The selection probabilities of the two activities are :0.6, 0.3, 0.1;
Now if you already know Xiaobao 3 The activities arranged for the three-day holiday are... In order : Outing 、 Shopping 、 game , We will face the following three problems :
(i) How likely is Xiaobao to arrange activities like this ?
(ii) above (1)~(3) Whether these assumed probability values lead Xiaobao to choose this 3 What about the best probability of an activity ? In other words, are there other probability values that are more likely to cause Xiaobao to choose this 3 Activities ?
(iii) this 3 The weather conditions are ?
The following is a case-based modeling to solve these three problems . This model is called Markov model (Hidden Markov Model,HMM). Let's start modeling .
(1) The weather —— Hidden state
Use Means continuous T Day weather , Every
It can be regarded as a random variable , Possible values are
, It's called the implicit state space , Because usually the weather is unobservable . For the case , The weather sequence is
, The state space is
S={ It's raining , a sunny day }, If the first day is sunny , be
= a sunny day .
(2) Activities —— Observation status
Use Means continuous T Activities selected by tianxiaobao , Every
It's also a random variable , Possible values are
, It's called the observation state space . For the case , The activity sequence is
, The observation state space is O={ Outing , Shopping , game }.
(3) Homogeneous Markov chain hypothesis
The probability of weather change can use conditional probability Express , It means in the second 1 To the first day t-1 After days of weather , The first t The probability of what kind of weather will happen in a day . According to... In the case (1), The weather of that day is only related to the weather of the previous day , We have
This is the homogeneous Markov chain hypothesis . According to the possibility of the transition of various states in the implicit state space , The composition is as follows Order matrix :
Indicates the hidden state
Transferred to the
Probability ,
It is called the implicit state transition matrix . For the case , transition matrix
as follows :
Note the implicit state transition matrix and time t irrelevant . You can use the following figure to represent :
(4) Observation independence Hypothesis
At any time t Observation state of Only depends on the hidden state of the current moment
, This is to simplify the model . If at the moment t The hidden state of
, The corresponding observation state is
, Then the time is hidden
The lower observed value
The probability of is recorded as
, as follows :
Consider all possible scenarios , Constitute a Order matrix
, as follows :
In the state of
Under the circumstances , Observed
Probability ,
It is called the observation state matrix . For the case ,
as follows :
.
Note the observed state transition matrix and t It doesn't matter . The observation state matrix can be represented by the following figure :
(5) Initial implicit state distribution
The information in the case (3) Is the initial state of the weather , That is to say , A probability distribution
as follows :
![]() |
It's raining | a sunny day |
![]() |
0.6 | 0.4 |
This is called the implicit state initial distribution .
With these above , Markov model can be expressed in the following form :
among F Is the initial distribution of the hidden state ,Q Is the implicit state transition matrix ,G Is the observation state matrix .
The Markov model of the case can be intuitively represented by the following figure :
The diagram shows the relationship between the hidden state and the observed state for three days . The arrow indicates the dependency . Those who are familiar with probability diagrams know , This is it. HMM The probability graph of . Now model the three problems in the case , They are as follows :
(1) Estimate the observation sequence X probability
Given model And the observation sequence
, The calculation is in the model λ Next observation sequence X Probability of occurrence P(X|λ).
(2) The learning problem of model parameters
That is, given the observation sequence , Under the premise of satisfying Markov condition and independence assumption , Solve the model
, Make the sequence observed under the model X Conditional probability of P(X|λ) Maximum , namely :
.
(3) Prediction problem , Also known as decoding problem
Given model And the observation sequence
, Under the condition of given observation sequence , The most likely corresponding state sequence
.
These three questions are HMM Three classical problems of the model . Fortunately, we can all solve .
HMM General expression
Now we give HMM General expression of the model . First of all, let's assume Is a collection of all possible hidden states ,
Is a collection of all possible observation States , namely :
,
among ,M Is the number of hidden states ,N Is the number of observation States .
For a length of T Sequence ,Y Is the corresponding state sequence , X It's the corresponding observation sequence , namely :
,
among ,
.
Let's explain the sequence mentioned here , stay t Time and t+1 The state sequence and observation sequence at time are :
t moment :
t+1 moment :
There is a relationship between them , That is to say t+1 The moment is just t Added state based on time And corresponding observations
. Because only experienced t Time to arrive t+1 moment , Then it appears
And corresponding observations
. Understanding this will be very helpful to the later theoretical proof .
Observation sequence generation mechanism
The generation mechanism of observation sequence refers to HMM In a known model after , How to generate observation sequence step by step
Of . It helps us understand HMM Why can it be expressed as
form . On the surface of the above HMM Model representation diagram as an example :
Suppose now you are standing Location , Because the initial distribution of the hidden state is known , And the probability from the hidden state to the observed state is also known , So at this point, you can go down one step in the direction of the arrow to generate the observation value
, At this point, the first value of the sequence has been generated , because
To
The state transition probability is known , At this point you can get
The probability of the hidden state , That is, you can learn from
Go to the
, Reached
after , So now from
Go to the
From
Go to the
It's similar , A second... Is generated 2 An observation , Go down in turn , The whole observation sequence can be generated
了 . The following use cases are generated according to this idea .
Model Corresponding initial distribution of hidden states , Implicit state transition matrix , The observation state matrices are as follows :
![]() |
It's raining | a sunny day |
![]() |
0.6 | 0.4 |
Xiaobao should make the following choices :
(1) The first 1 God
Calculate according to the following formula
The items in the last item are known , They are the observed state probability and the initial distribution of the hidden state , as follows :
P(x1= Outing )=P( Outing | a sunny day )P( a sunny day )+P( Outing | It's raining )P( It's raining )=0.6*0.4+0.1*0.6=0.3
Empathy ,
P(x1= Shopping )=0.3*0.4+0.4+0.6=0.36,
P(x1= game )=0.1*0.4+0.5+0.6=0.34
because P(x1= Shopping ) Maximum probability , So Xiaobao is likely to choose shopping on the first day .
(2) The first 2 God
Calculate according to the following two steps :
P(y2= a sunny day )=P(y2= a sunny day |y1= a sunny day )P(y1= a sunny day )+P(y2= a sunny day |y1= rain )P(y1= rain )
Similarly, calculate P(y2= It's raining ).
P(x2= Outing )=P(x2= Outing |y2= a sunny day )P(y2= a sunny day )+P(x2= Outing |y2= It's raining )P(y2= It's raining )
In the same way P(x2= Shopping ) and P(x2= game ), Finally, according to the calculation results , Make activity choices .
(3) The first 3 God
The calculation is the same as the next day .
above HMM The idea of generating observation sequence , It can be generally described as follows :
(1) Generate x1
From the initial distribution F And observation matrix G, Get the first observation x1, The calculation formula is as follows :
(2) Generate (
)
Divided into two steps :
a、 By the transfer matrix Q and State probability distribution Q Generate
State probability distribution ;
b、 take The state probability distribution of is regarded as the initial distribution , Repeat step (1) Generate xi;
Next , Let's solve HMM Three classic problems .
One of the classic questions Forward algorithm
problem : Given the model And the observation sequence
, seek
?
Explain : For convenience , take Short for
, In the process of solving this classical problem, about λ The conditional probabilities of are so short .
For any state sequence , The possible sequence of States is
individual . Let's assume that
It's one of them . that
The last step of the above formula is obtained from the assumption of observation independence , namely Only with
Medium
of . In the same way
Pass it on , The resulting
From the implicit state transition matrix and the initial distribution of implicit States .
Let's calculate the sequence of hidden states Y Generate observation sequence under X Conditional probability of :
The first 2 The first step is to assume the independence of observation , That is, the observed value Only with the current state
of . From the observation matrix
.
With and
, We can calculate X and Y The joint probability of
了 , as follows :
So in the end It can be calculated as follows :
The right side of the equation represents under various possible states ( individual ), therefore
That's it The calculation of .
The following is a brief description of the calculation complexity of the above method . according to Calculation formula , Complete an observation
Probability
Calculation , Need to compute
A operation ( Calculate each Y Yes 2T A product , All in all
Of Y), Big use O It means complex
. This violent calculation , The computational complexity increases with T and N The growth of is exponential , In practice, it is impossible to calculate . actually , We are < Observation sequence generation mechanism > This violent method is used in the section . Here is another simpler calculation method , be called “ Forward algorithm ”.
Suppose at the moment t, Status as , The observed value is
The probability of is written as
, It's called forward probability . Now suppose we have found t moment , In all possible states , The observed value is
The forward probability of , Now let's use the recursive method to calculate at t+1 Forward probability of each state at time
, among
namely :
remember Express t The state of the moment is
, And t+1 The state of the moment is
, The observation sequence is
Probability , therefore
( From the assumption of observation independence )
So now we have and
The recurrence relation of . in addition , The initial value of the recurrence relation , That is to say t=1 The forward probability of time is :
In this way, we have the initial value and recurrence relationship of forward probability , You can calculate any time t The forward probability of 了 .
What is required in classic question 1 , In fact, at the moment T, The observed
Probability , namely :
(1)
The computational complexity of the algorithm is described below , Let's first calculate from To
Amount of computation , namely :
Total A product and addition operation , So from
To
You need to calculate
A product and addition operation ,
According to (1) type , A total of , Big O Expressed as
. In terms of complexity , Much better than the previous violent methods . actually , The method of violence is exhaustive , Forward probability is a recursive method .
Classic question two
A little .
The third classic question viterbi Algorithm
problem : Given the model =(F,Q,G) And the observation sequence
, Predict the most likely hidden state sequence
. The mathematical expression is as follows :
Explain : because ,
Has been given , therefore
Is constant , The problem is equivalent to finding
Make the joint probability
The biggest one is .
Let's introduce... In combination with the figure below viterbi Algorithm , Pictured :
Each solid circle in the figure represents the possible state at a certain time , Each square represents the corresponding observation value . Arrows between filled circles , Indicates an implicit state transition , For example, the red arrow in the figure , Express t-1 Time state Y1, stay t Time shifts to state Y2.
We noticed that , Any sequence of States It just corresponds to a path in the figure one by one , This path starts from t=1 The state of the moment is
Start with a solid circle , Then follow the arrow to connect to t=2 The state of the moment is
The solid circle of , Keep connecting , until t=T The state of the moment is
The solid circle of . The corresponding path is marked as
. So solve
, In fact, it is to find an optimal path from the graph
Make the joint probability
Maximum . This is a viterbi Algorithm idea 1 . Let's look for such a path .
We know from the moment 1 Time of arrival t All in all Different paths , According to the time t The different states of are divided into
class , That is, at the moment t The hidden state is
All paths are classified into one class , moment t Reach state
The paths of are divided into one kind , wait , The categories are
. Now look for one of all paths that makes
The largest optimal path problem is transformed from M First, find out one in each class to make
The largest path , And then from here M Found in path
The largest path . In this way, we decompose a problem into M A little question . This is a viterbi Algorithm idea 2 . Now fix t The state of the moment , Assuming that
, In Category
Chinese envoy
The path to the maximum is recorded as
, The maximum probability is recorded as
. in addition , We also need to record at this time
The state of , That is, the state of the penultimate node , Write it down as
, namely
. For the convenience of writing , Reference the previous token
,
according to The definition of , It can be expressed as follows :
Empathy t+1 The moment corresponds to , As follows :
Here's the derivation and
The relationship between the two , be aware
And
irrelevant ,
Only with
of , therefore
(2)
equation (2) On both sides, first pair the variables For maximum , And then for variables
For maximum :
(3)
The last maximum value is selected from the following values :
such and
Recursion is established .
This recursion shows : If we have found it from the moment 1 To the moment t All possible states of Corresponding maximum probability
, This probability is multiplied by the transition probability
, Then find the maximum of their product , Finally, multiply by the observation probability
, That's what I found from the moment 1 To the moment t+1 Status as
The maximum probability of
. It can be shown in the figure below :
The optimal path is the path with the largest probability product . Suppose the largest of them is , So now we can at least determine the last two nodes of the optimal path , Namely
and
. How to find the first nodes in the optimal path ? in fact , The recurrence formula above (3) Implies a property :
The best path
Before t A path of nodes
yes
The corresponding optimal path . because
yes
The best path , So the formula (3) Simplified as
When the State i and j After fixing , and
In direct proportion to , that
Namely
The best path , If not , hypothesis
The optimal path is
, So the path
It will be
The best path , contradiction .
Based on the above introduction , We use The best path can be found by tracing back the recorded state . As follows :
(1) bring
Maximum , therefore
Namely t+1 The state of the moment ;
(2) Recorded in t+1 At the moment , The first t The state of the node ,
This is the state of this node , Write it down as
;
(3) The value of is recorded in t moment ,t-1 The state of the moment , Write it down as
...
Recursively Go back to t=1 The state of the moment , You can get
viterbi The basic idea of the algorithm :
(1) From the initial And recursive formulas , Calculate any time t Of
, Record the status of the previous node at the same time
;
(2) find The biggest of all , Its corresponding path is what we are looking for , At this point, you can get the last two node states i and
;
(3) Use The previous status of the record , Go back to the state at all times , obtain
viterbi Algorithm solving case
Now let's “ Xiao Bao travels ” Case to familiarize yourself with vertibi Algorithm .
(1) For the moment t=1, That's the first day , The possible states are { It's raining , a sunny day }, Xiao Bao chose an outing on his first day , in addition , from Start At first there was only one path to sunny days , therefore
( a sunny day )=P(x1= Outing ,y1= a sunny day )=P(x1= Outing |y1= a sunny day )P(y1= a sunny day )=0.6*04=0.24
( It's raining )=P(x1= Outing ,y1= It's raining )=P(x1= Outing |y1= It's raining )P(y1= It's raining )=0.1*0.6=0.06
Because there was no state the day before the first day , therefore non-existent .
(2) For the moment t=2, The next day , Xiaobao chose to go shopping , According to recursion, there are :
, therefore ,
( a sunny day )=P(x2= Shopping |y2= a sunny day )*max{P(y2= a sunny day |y1= a sunny day )
( a sunny day ),P(y2= a sunny day |y1= It's raining )
( It's raining )}
=0.3*max{0.6*0.24,0.3*0.06}=0.3*max{0.144,0.018}=0.0432
( a sunny day )= a sunny day
Empathy ,
( It's raining )=P(x2= Shopping |y2= It's raining )*max{P(y2= It's raining |y1= a sunny day )
( a sunny day ),P(y2= It's raining |y1= It's raining )
( It's raining )}
=0.4*max{0.4*0.24,0.7*0.06}=0.4*max{0.096,0.042}=0.0384
( It's raining )= a sunny day
(3) For the moment t=3, That's the third day , Xiaobao chooses the game , According to recursion :
, therefore ,
( a sunny day )=P(x3= game |y3= a sunny day )*max{P(y3= a sunny day |y2= a sunny day )
( a sunny day ),P(y3= a sunny day |y2= It's raining )
( It's raining )}
=0.1*max{0.6*0.0432,0.3*0.0384}=0.1*max{0.02592,0.01152}=0.002592
( a sunny day )= a sunny day
( It's raining )=P(x3= game |y3= It's raining )*max{P(y3= It's raining |y2= a sunny day )
( a sunny day ),P(y3= It's raining |y2= It's raining )
( It's raining )}
=0.5*max{0.4*0.0432,0.7*0.0384}=0.5*max{0.01728,0.02688}=0.01344
( It's raining )= It's raining
Let's go back :
( It's raining )>
( a sunny day ), So the first 3 It's raining , and
( It's raining )= It's raining , So it rained the next day , according to
( It's raining )= a sunny day , So the first day was sunny .
Extrapolate 3 The weather is : a sunny day , It's raining , It's raining .
viterbi Algorithm code implementation
HMM Model application
Chinese word segmentation
The stock market analysis
版权声明
本文为[wx5af853e4b9fed]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/07/20210716095409479t.html