## Preface

KMP The algorithm is a string matching algorithm , The top priority is next The construction of arrays , The simplicity and magic of its code make it widely concerned .

But it's not hard to find ,acm What I learned KMP And data structures KMP Not the same. o(︶︿︶)o

I wrote about it before acm Version of KMP, Poke it here

Now write a data structure version of KMP, Easy to cope with the coming data structure test （ acutely

## Tear your hands next Array

Let's review first acm edition next Array ：next[i] It's a partial match , That is, the length of the longest common element of the prefix and suffix

And the data structure version of next An array is when a match fails , Matching string j Where the pointer should point （ namely next[j]）

These two are essentially , Mismatches always point to next[j], But because of acm The subscript of the input string is from 0 Start , And the data structures are all from 1 Start , All will make a difference

This paper mainly introduces how to quickly tear a string when you are given a test next Array

Have a look first next The formula for arrays ：

This kind of bird formula is only used by fools

### Positive solution ：

• First of all, for the first two ：next[1] = 0; next[2] = 1;（ Be careful , Subscript from 1 Start ）
• Everyone in the back next Value solving ： Compare with the previous one
• Change the previous character With the previous one next Value as the subscript of the corresponding character for comparison
• equal , Then the next The value is the first one next Value plus 1
• Unequal , Keep looking forward next Value to compare with the previous one , Until you find something in a bit next The content corresponding to the value is equal to the previous one , Then the value corresponding to this bit plus 1 That is the demand of next value
• If you find the first one, it doesn't match , It is changed to next The value is 1.

for instance ：abaabcac

1. next[1] = 0

2. next[2] = 1

3. seek next[3] To judge whether the previous character is the same as the previous character next Corresponding characters , Discover the difference , At this point, we have matched to the first , It's not the same yet , be next The value is 1

S[2] != S[next[2]], And match to the first , so next[3] = 1

aba

[0, 1, 1]

4. seek next[4] Then judge the previous character a And The first next[3] Corresponding characters a Compare , Found the same , be next[4] = next[3] + 1 = 2

S[3] = S[next[3]], so S[4] = S[3] + 1 = 2

abaa

[0, 1, 1, 2]

5. seek next[5] Then judge the first one (4) Of a With the previous (4) Of next[4] Corresponding characters b comparison , Find different , Just keep using the first one (4) The characters of a And next[4] Of the corresponding character next value (2) Corresponding characters a Compare , Found the same , be next[5] = next[next[4]] + 1, That is to say next[5] = next[2] + 1 = 2

S[4] != S[next[4]] --->. S[4] = S[next[next[4]]], so next[5] = next[next[4]] + 1 = 2

abaab

[0, 1, 1, 2, 2]

6. seek next[6] To judge the fifth b And the fifth next The character corresponding to the value b, Found the same , be next[6] = next[5] + 1

S[5] = S[next[5]], so next[6] = next[5] + 1 = 3

abaabc

[0, 1, 1, 2, 2, 3]

7. seek next[7] To judge the second 6 Bit c With the first next[6] Character corresponding to bit , Find different , Take the number 6 Bit c With the first next[next[6]] Corresponding a comparison , Find different , And match to the first , so next[7] = 1

S[6] != S[next[6]]--->next[6] != S[next[next[6]]], And next[next[6]] = 1, That is, it's different to match to the first one , be next[7] = 1

abaabca

[0, 1, 1, 2, 2, 3, 1]

8. seek next[8] To judge the second 7 Bit a And next[7] Corresponding a Compare , Found the same , be next[8] = next[7] + 1

S[7] = S[next[7]], so next[8] = next[7] + 1 = 2

abaabcac

[0, 1, 1, 2, 2, 3, 1, 2]

## Tear your hands nextval Array

nextval The array is right next An optimized version of the array

for example ：

Match string S：aaaab

Pattern string T：aaabaaaab

Matching string next[] = {0,1, 2, 3, 4}

When the matching string and pattern string are mismatched in the fourth position , Pointing to the pattern string i It is the same. , Pointing to the matching string j It needs to become next[j] , You need to T[4] And S[3] Compare , Will find , Still different , Just let the pointer j Keep jumping , It's worth it , Will find T[4] And S[3] S[2] S[1] All of them are compared , But if we look at each other, we'll find that ,S[1] = S[2] = S[3] = S[4] = a, according to S[4] != T[4], so S[1] 、S[2] 、S[3] None of them is equal to T[4], It's like these three comparisons are useless , This is it. next Where arrays need to be optimized , So I put forward nextval Array to optimize

### Tear your hands nextval Arrays have two methods ：

#### Law 1. Try to think ：

Imagine matching strings S And mode string T In the i position （1<= i <= S.size()) When it's mismatched , Look at the best case , What is the maximum length that the head of the matching string can overlap the tail of the pattern string , It's actually the offset （ set up S[1] Move to i + 1 The offset of the position representation is 0,S[1] Move to i The offset of the position representation is 1, And so on ）

take aaaab for instance ：

1. nextval[1] = 0

2. When the second character mismatches , The first character is exactly the same

S：aa

T：aXYYYYYY（X For the wrong a Any character of , Y For any character ）

We from T The second place of starts with S Compare it to ：

aXYYYY

aa

because X Not for a, So matching failed , Continue from T The third place of starts with S Match

Because from the third place, it's all X, so T It could be aXaa……, And you can match it , According to the definition of the offset we assumed above , We get an offset of 0

3. The third character mismatch is the same as the second , nextval[3] = 0

4. The fourth character mismatch is the same as the second , nextval[4] = 0

5. When the fifth character mismatches , The first four must be exactly the same , so ：

S：aaaab

T：aaaaXYYYY……（X

alike , Let's start with the second , Will find ：S[2] = T[2],S[3] = T[3],S[4] = T[4], about T[5] He except b You can take anything else , So you can take a, Then the fifth digit can also match , Match success

aaaaXYYYYY

aaaab

In this case, the offset length is 4（ Offset string ：aaaa）

so nextval[5] = 4

#### Law 2： With the help of next Array to find nextval

in general ： The difference is next value , If you want to be the same, you can continue to compare , Until you find a different or first , The first one is 0

Take the first example mentioned above to explain ：

abaabcac

next= {0, 1, 1, 2, 2, 3, 1, 2}

1. nextval[1] = 0
2. S[2] != S[next[2]], so nextval[2] = next[2] = 1
3. S[3] = S[next[3]] ---> Ran to the first , so nextval[3] = 0
4. S[4] != S[next[4]], so nextval[4] = next[4] = 2
5. S[5] = S[next[5]] ---> S[next[5]] != S[next[next[5]]], so S[5] = next[next[5]] = 1
6. S[6] != S[next[6]], so nextval[6] = next[6] = 3
7. S[7] = S[next[7]], And ran to the first position , so next[7] = 0
8. S[8] != S[next[8]], so nextval[8] = next[8] = 2

For both methods , Personal feeling method 2 is much simpler , But the premise is that next The array works out , And it has to be right , Or it's cool (>_<)

Here's another post next Array and nextval Array code ：

``````void getnext(string s){
s = " " + s;// because next Array from 1 Start , String from 0 Start , So add a space prefix
int i = 1, j = 0;
nextt[1] = 0;
while (i < s.size()) {
if(j == 0 || s[i] == s[j]){
nextt[++i] = ++j;
}
else j = nextt[j];
}
}

void getnextval(string s){
s = ' ' + s;// Same as above
int i = 1, j = 0;
nextval[1] = 0;
while (i < s.size()) {
if(j == 0 || s[i] == s[j]){
++i;++j;
if(s[i] != s[j])nextval[i] = j;
else nextval[i] = nextval[j];
}
else j = nextval[j];
}
}
``````

I'm not in the data structure class because I didn't attend the class , Just came to write a blog ⁄(⁄ ⁄ ⁄ω⁄ ⁄ ⁄)⁄

https://cdmana.com/2021/04/20210430230656304P.html

Scroll to Top