## Preface

KMP The algorithm is a string matching algorithm , The top priority is next The construction of arrays , The simplicity and magic of its code make it widely concerned .

But it's not hard to find ,acm What I learned KMP And data structures KMP Not the same. o(︶︿︶)o

I wrote about it before acm Version of KMP, Poke it here

Now write a data structure version of KMP, Easy to cope with the coming data structure test （ acutely

## Tear your hands next Array

Let's review first acm edition next Array ：next[i] It's a partial match , That is, the length of the longest common element of the prefix and suffix

And the data structure version of next An array is when a match fails , Matching string j Where the pointer should point （ namely next[j]）

These two are essentially , Mismatches always point to next[j], But because of acm The subscript of the input string is from 0 Start , And the data structures are all from 1 Start , All will make a difference

This paper mainly introduces how to quickly tear a string when you are given a test next Array

Have a look first next The formula for arrays ： This kind of bird formula is only used by fools

### Positive solution ：

• First of all, for the first two ：next = 0; next = 1;（ Be careful , Subscript from 1 Start ）
• Everyone in the back next Value solving ： Compare with the previous one
• Change the previous character With the previous one next Value as the subscript of the corresponding character for comparison
• equal , Then the next The value is the first one next Value plus 1
• Unequal , Keep looking forward next Value to compare with the previous one , Until you find something in a bit next The content corresponding to the value is equal to the previous one , Then the value corresponding to this bit plus 1 That is the demand of next value
• If you find the first one, it doesn't match , It is changed to next The value is 1.

for instance ：abaabcac

1. next = 0

2. next = 1

3. seek next To judge whether the previous character is the same as the previous character next Corresponding characters , Discover the difference , At this point, we have matched to the first , It's not the same yet , be next The value is 1

S != S[next], And match to the first , so next = 1

aba

[0, 1, 1]

4. seek next Then judge the previous character a And The first next Corresponding characters a Compare , Found the same , be next = next + 1 = 2

S = S[next], so S = S + 1 = 2

abaa

[0, 1, 1, 2]

5. seek next Then judge the first one (4) Of a With the previous (4) Of next Corresponding characters b comparison , Find different , Just keep using the first one (4) The characters of a And next Of the corresponding character next value (2) Corresponding characters a Compare , Found the same , be next = next[next] + 1, That is to say next = next + 1 = 2

S != S[next] --->. S = S[next[next]], so next = next[next] + 1 = 2

abaab

[0, 1, 1, 2, 2]

6. seek next To judge the fifth b And the fifth next The character corresponding to the value b, Found the same , be next = next + 1

S = S[next], so next = next + 1 = 3

abaabc

[0, 1, 1, 2, 2, 3]

7. seek next To judge the second 6 Bit c With the first next Character corresponding to bit , Find different , Take the number 6 Bit c With the first next[next] Corresponding a comparison , Find different , And match to the first , so next = 1

S != S[next]--->next != S[next[next]], And next[next] = 1, That is, it's different to match to the first one , be next = 1

abaabca

[0, 1, 1, 2, 2, 3, 1]

8. seek next To judge the second 7 Bit a And next Corresponding a Compare , Found the same , be next = next + 1

S = S[next], so next = next + 1 = 2

abaabcac

[0, 1, 1, 2, 2, 3, 1, 2]

## Tear your hands nextval Array

nextval The array is right next An optimized version of the array

for example ：

Match string S：aaaab

Pattern string T：aaabaaaab

Matching string next[] = {0,1, 2, 3, 4}

When the matching string and pattern string are mismatched in the fourth position , Pointing to the pattern string i It is the same. , Pointing to the matching string j It needs to become next[j] , You need to T And S Compare , Will find , Still different , Just let the pointer j Keep jumping , It's worth it , Will find T And S S S All of them are compared , But if we look at each other, we'll find that ,S = S = S = S = a, according to S != T, so S 、S 、S None of them is equal to T, It's like these three comparisons are useless , This is it. next Where arrays need to be optimized , So I put forward nextval Array to optimize

### Tear your hands nextval Arrays have two methods ：

#### Law 1. Try to think ：

Imagine matching strings S And mode string T In the i position （1<= i <= S.size()) When it's mismatched , Look at the best case , What is the maximum length that the head of the matching string can overlap the tail of the pattern string , It's actually the offset （ set up S Move to i + 1 The offset of the position representation is 0,S Move to i The offset of the position representation is 1, And so on ）

take aaaab for instance ：

1. nextval = 0

2. When the second character mismatches , The first character is exactly the same

S：aa

T：aXYYYYYY（X For the wrong a Any character of , Y For any character ）

We from T The second place of starts with S Compare it to ：

aXYYYY

aa

because X Not for a, So matching failed , Continue from T The third place of starts with S Match

Because from the third place, it's all X, so T It could be aXaa……, And you can match it , According to the definition of the offset we assumed above , We get an offset of 0

3. The third character mismatch is the same as the second , nextval = 0

4. The fourth character mismatch is the same as the second , nextval = 0

5. When the fifth character mismatches , The first four must be exactly the same , so ：

S：aaaab

T：aaaaXYYYY……（X

alike , Let's start with the second , Will find ：S = T,S = T,S = T, about T He except b You can take anything else , So you can take a, Then the fifth digit can also match , Match success

aaaaXYYYYY

aaaab

In this case, the offset length is 4（ Offset string ：aaaa）

so nextval = 4

#### Law 2： With the help of next Array to find nextval

in general ： The difference is next value , If you want to be the same, you can continue to compare , Until you find a different or first , The first one is 0

Take the first example mentioned above to explain ：

abaabcac

next= {0, 1, 1, 2, 2, 3, 1, 2}

1. nextval = 0
2. S != S[next], so nextval = next = 1
3. S = S[next] ---> Ran to the first , so nextval = 0
4. S != S[next], so nextval = next = 2
5. S = S[next] ---> S[next] != S[next[next]], so S = next[next] = 1
6. S != S[next], so nextval = next = 3
7. S = S[next], And ran to the first position , so next = 0
8. S != S[next], so nextval = next = 2

For both methods , Personal feeling method 2 is much simpler , But the premise is that next The array works out , And it has to be right , Or it's cool (>_<)

Here's another post next Array and nextval Array code ：

``````void getnext(string s){
s = " " + s;// because next Array from 1 Start , String from 0 Start , So add a space prefix
int i = 1, j = 0;
nextt = 0;
while (i < s.size()) {
if(j == 0 || s[i] == s[j]){
nextt[++i] = ++j;
}
else j = nextt[j];
}
}

void getnextval(string s){
s = ' ' + s;// Same as above
int i = 1, j = 0;
nextval = 0;
while (i < s.size()) {
if(j == 0 || s[i] == s[j]){
++i;++j;
if(s[i] != s[j])nextval[i] = j;
else nextval[i] = nextval[j];
}
else j = nextval[j];
}
}
``````

I'm not in the data structure class because I didn't attend the class , Just came to write a blog ⁄(⁄ ⁄ ⁄ω⁄ ⁄ ⁄)⁄

https://cdmana.com/2021/04/20210430230656304P.html

Scroll to Top