This paper introduces several common string matching algorithms , Respectively
- BF Algorithm （Brute Force, It's the brute force algorithm ）
- KMP Algorithm （Knuth-Morria-Pratt Algorithm , In fact, this is what these three people put forward together ）
- BM Algorithm （Boyer-Moore）
- Sunday Algorithm （ from Daniel M.Sunday stay 1990 in ）
Brute Force Algorithm
That's what we call violent algorithms .
seeing the name of a thing one thinks of its function , If there are two strings ：
- M（ Main string , Also known as text string / Be compared /m Characters ）
- N（ Pattern string / Compare it to /n Characters ）
As shown in the figure ：
Compare from left to right , At the beginning of the , If the same M The next one ,N The next one continues to compare with , If not, then M From the second place ,N Compare... From the first place , And so on .
BF The time complexity of the algorithm ：
At worst N The comparison of the two should be circular m Time , And everyone has to be compared , So the worst time complexity is O(n*m)
KMP The algorithm described in text is not intuitive , It needs to be constantly changing , So here I'm moving directly to my brother Carl In the video , It's very clear , You can also focus on a wave of .
next Function construction
KMP Time complexity of ：
We found that if a character matches successfully , The position of the first character of the pattern string remains unchanged , just i++、j++; If the match doesn't match ,i unchanged （ namely i Don't go back ）, The pattern string will skip the matched next [j] Characters . The worst-case scenario for the whole algorithm is , When the first character of the pattern string is at i - j Only when the position is matched successfully , The algorithm ends .
therefore , If the length of the text string is n, The length of the pattern string is m, Then the time complexity of the matching process is O(n), Count it next Of O(m) Time ,KMP The overall time complexity of is O(m + n).
Bad character rules ：
- When a character in a text string does not match a character in a pattern string , We call this mismatch character in the text string a bad character , At this point, the pattern string needs to move to the right , The number of digits moved = Position of bad character in pattern string - The right most position of a bad character in the pattern string .
- Besides , If " Bad characters " Not included in the pattern string , Then the position on the far right is -1（ Don't worry so much , Is to move the pattern string to the next digit of the mismatched character in the text string , It will be shown in the picture later ）
Good Suffix ：
- When the characters are mismatched , Move the number backward = The position of the good suffix in the pattern string - The first time a good suffix appears on the pattern string .
- If the good suffix doesn't reappear in the pattern string , Then for -1.（ And the bad character rule above 2 Agreement ）
1. Last character comparison , Bad characters S, And EXAMPLE There's no S, Move the string directly to the mode S The latter
2. Last character comparison , Bad characters P,EXAMPLE Li Han P, So the pattern string of P With text string P alignment
3. Last character comparison , The same goes forward than , You'll find a good suffix E,LE,PLE,MPLE, Until the bad character I, At this point, use the suffix rule , Because the only good suffix is E Corresponding EXAMPLE The prefix of , So the pattern string of E And... In the text string E alignment
4. Last character comparison , Bad characters P, Same step 2, Backward matching P, Find the end result
- If the character does not appear in the pattern string, it will skip , That is, the number of moving digits = Match string length + 1;
- otherwise , The number of moving bits = Mode string length - The rightmost position of the character ( With 0 Start ) = The distance from the rightmost position of the character in the pattern string to the end of the character + 1.
1. First character comparison , atypism ,“ Bad characters ” Is a space , And there are no spaces in the pattern string , Fly straight back
2. First character comparison , atypism ,“ Bad characters ” by E, There is... In the pattern string E, Take the one on the right E And... In the text string E alignment
3. First character comparison , atypism ,“ Bad characters ” Is a space , And there are no spaces in the pattern string , Fly straight back
4. Match is found , And it can be observed that Sunday Algorithm ratio BM Algorithm faster
Time complexity ：
BF Algorithm ： best O(m+n) , The worst O(mn)
KMP Algorithm ： O(m+n)
BM Algorithm ： best O(n/m) , The worst O(mn)
Sunday Algorithm ： Average O(n) , The worst O(mn), In general Sunday be better than BM Algorithm