编程知识 cdmana.com

KMP (meikaisandu data structure detailed explanation Edition)

Preface

KMP The algorithm is a string matching algorithm , The top priority is next The construction of arrays , The simplicity and magic of its code make it widely concerned .

But it's not hard to find ,acm What I learned KMP And data structures KMP Not the same. o(︶︿︶)o

I wrote about it before acm Version of KMP, Poke it here

Now write a data structure version of KMP, Easy to cope with the coming data structure test ( acutely

Tear your hands next Array

Let's review first acm edition next Array :next[i] It's a partial match , That is, the length of the longest common element of the prefix and suffix

And the data structure version of next An array is when a match fails , Matching string j Where the pointer should point ( namely next[j])

These two are essentially , Mismatches always point to next[j], But because of acm The subscript of the input string is from 0 Start , And the data structures are all from 1 Start , All will make a difference

This paper mainly introduces how to quickly tear a string when you are given a test next Array

Have a look first next The formula for arrays :

 Insert picture description here

This kind of bird formula is only used by fools

Positive solution :

  • First of all, for the first two :next[1] = 0; next[2] = 1;( Be careful , Subscript from 1 Start )
  • Everyone in the back next Value solving : Compare with the previous one
    • Change the previous character With the previous one next Value as the subscript of the corresponding character for comparison
    • equal , Then the next The value is the first one next Value plus 1
    • Unequal , Keep looking forward next Value to compare with the previous one , Until you find something in a bit next The content corresponding to the value is equal to the previous one , Then the value corresponding to this bit plus 1 That is the demand of next value
    • If you find the first one, it doesn't match , It is changed to next The value is 1.

for instance :abaabcac

  1. next[1] = 0

  2. next[2] = 1

  3. seek next[3] To judge whether the previous character is the same as the previous character next Corresponding characters , Discover the difference , At this point, we have matched to the first , It's not the same yet , be next The value is 1

    S[2] != S[next[2]], And match to the first , so next[3] = 1

    aba

    [0, 1, 1]

  4. seek next[4] Then judge the previous character a And The first next[3] Corresponding characters a Compare , Found the same , be next[4] = next[3] + 1 = 2

    S[3] = S[next[3]], so S[4] = S[3] + 1 = 2

    abaa

    [0, 1, 1, 2]

  5. seek next[5] Then judge the first one (4) Of a With the previous (4) Of next[4] Corresponding characters b comparison , Find different , Just keep using the first one (4) The characters of a And next[4] Of the corresponding character next value (2) Corresponding characters a Compare , Found the same , be next[5] = next[next[4]] + 1, That is to say next[5] = next[2] + 1 = 2

    S[4] != S[next[4]] --->. S[4] = S[next[next[4]]], so next[5] = next[next[4]] + 1 = 2

    abaab

    [0, 1, 1, 2, 2]

  6. seek next[6] To judge the fifth b And the fifth next The character corresponding to the value b, Found the same , be next[6] = next[5] + 1

    S[5] = S[next[5]], so next[6] = next[5] + 1 = 3

    abaabc

    [0, 1, 1, 2, 2, 3]

  7. seek next[7] To judge the second 6 Bit c With the first next[6] Character corresponding to bit , Find different , Take the number 6 Bit c With the first next[next[6]] Corresponding a comparison , Find different , And match to the first , so next[7] = 1

    S[6] != S[next[6]]--->next[6] != S[next[next[6]]], And next[next[6]] = 1, That is, it's different to match to the first one , be next[7] = 1

    abaabca

    [0, 1, 1, 2, 2, 3, 1]

  8. seek next[8] To judge the second 7 Bit a And next[7] Corresponding a Compare , Found the same , be next[8] = next[7] + 1

    S[7] = S[next[7]], so next[8] = next[7] + 1 = 2

    abaabcac

    [0, 1, 1, 2, 2, 3, 1, 2]

Tear your hands nextval Array

nextval The array is right next An optimized version of the array

for example :

Match string S:aaaab

Pattern string T:aaabaaaab

Matching string next[] = {0,1, 2, 3, 4}

When the matching string and pattern string are mismatched in the fourth position , Pointing to the pattern string i It is the same. , Pointing to the matching string j It needs to become next[j] , You need to T[4] And S[3] Compare , Will find , Still different , Just let the pointer j Keep jumping , It's worth it , Will find T[4] And S[3] S[2] S[1] All of them are compared , But if we look at each other, we'll find that ,S[1] = S[2] = S[3] = S[4] = a, according to S[4] != T[4], so S[1] 、S[2] 、S[3] None of them is equal to T[4], It's like these three comparisons are useless , This is it. next Where arrays need to be optimized , So I put forward nextval Array to optimize

Tear your hands nextval Arrays have two methods :

Law 1. Try to think :

Imagine matching strings S And mode string T In the i position (1<= i <= S.size()) When it's mismatched , Look at the best case , What is the maximum length that the head of the matching string can overlap the tail of the pattern string , It's actually the offset ( set up S[1] Move to i + 1 The offset of the position representation is 0,S[1] Move to i The offset of the position representation is 1, And so on )

take aaaab for instance :

  1. nextval[1] = 0

  2. When the second character mismatches , The first character is exactly the same

    S:aa

    T:aXYYYYYY(X For the wrong a Any character of , Y For any character )

    We from T The second place of starts with S Compare it to :

    aXYYYY

    aa

    because X Not for a, So matching failed , Continue from T The third place of starts with S Match

    Because from the third place, it's all X, so T It could be aXaa……, And you can match it , According to the definition of the offset we assumed above , We get an offset of 0

  3. The third character mismatch is the same as the second , nextval[3] = 0

  4. The fourth character mismatch is the same as the second , nextval[4] = 0

  5. When the fifth character mismatches , The first four must be exactly the same , so :

    S:aaaab

    T:aaaaXYYYY……(X

    alike , Let's start with the second , Will find :S[2] = T[2],S[3] = T[3],S[4] = T[4], about T[5] He except b You can take anything else , So you can take a, Then the fifth digit can also match , Match success

    aaaaXYYYYY

    aaaab

    In this case, the offset length is 4( Offset string :aaaa)

    so nextval[5] = 4

Law 2: With the help of next Array to find nextval

in general : The difference is next value , If you want to be the same, you can continue to compare , Until you find a different or first , The first one is 0

Take the first example mentioned above to explain :

abaabcac

next= {0, 1, 1, 2, 2, 3, 1, 2}

  1. nextval[1] = 0
  2. S[2] != S[next[2]], so nextval[2] = next[2] = 1
  3. S[3] = S[next[3]] ---> Ran to the first , so nextval[3] = 0
  4. S[4] != S[next[4]], so nextval[4] = next[4] = 2
  5. S[5] = S[next[5]] ---> S[next[5]] != S[next[next[5]]], so S[5] = next[next[5]] = 1
  6. S[6] != S[next[6]], so nextval[6] = next[6] = 3
  7. S[7] = S[next[7]], And ran to the first position , so next[7] = 0
  8. S[8] != S[next[8]], so nextval[8] = next[8] = 2

For both methods , Personal feeling method 2 is much simpler , But the premise is that next The array works out , And it has to be right , Or it's cool (>_<)

Here's another post next Array and nextval Array code :

void getnext(string s){
    s = " " + s;// because next Array from 1 Start , String from 0 Start , So add a space prefix 
    int i = 1, j = 0;
    nextt[1] = 0;
    while (i < s.size()) {
        if(j == 0 || s[i] == s[j]){
            nextt[++i] = ++j;
        }
        else j = nextt[j];
    }
}

void getnextval(string s){
    s = ' ' + s;// Same as above 
    int i = 1, j = 0;
    nextval[1] = 0;
    while (i < s.size()) {
        if(j == 0 || s[i] == s[j]){
            ++i;++j;
            if(s[i] != s[j])nextval[i] = j;
            else nextval[i] = nextval[j];
        }
        else j = nextval[j];
    }
}

I'm not in the data structure class because I didn't attend the class , Just came to write a blog ⁄(⁄ ⁄ ⁄ω⁄ ⁄ ⁄)⁄

版权声明
本文为[Fan and she are lost]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/04/20210430230656304P.html

Scroll to Top