编程知识 cdmana.com

Data structure learning (6) -- sorting

( One ) Sort

Sort input is a collection of records , The output is also a collection of records .

There are two main operations in the sorting process : Compare and Move .

1.  Concept

Suppose it contains n The sequence of records is {r1,r2,……,rn}, The corresponding keywords are {k1,k2,……,kn}, To be determined 1,2,……,n An arrangement of p1,p2,……,pn, Make its corresponding keyword meet kp1≤kp2≤……≤kpn( Not decreasing or increasing ) Relationship , Even if the sequence becomes an ordered sequence by keyword {rp1,rp2,……,rpn}, Such an operation is called sorting .

(1)  stability and Instability

In a column of numbers , If Ai=Aj , And Ai be ranked at Aj Before . After sorting algorithm Ai Still in Aj Before . This sort algorithm is stable . contrary , This algorithm is unstable .

(2)  Internal order and External sorting

According to whether all the elements are loaded into memory during sorting . Sorting can be divided into inner sort and outer sort .

Internal order : In the sorting process , All the elements are loaded into memory .

External sorting : Too many records , Can't load all elements into memory . In the whole sorting process, we need to exchange data between internal and external storage for many times .

There are three aspects that affect inwardness

(1)  Time performance

Compare as little as possible and move as little as possible .

(2)  Auxiliary space

Additional storage space required to execute the algorithm .

(3)  The complexity of the algorithm

2.  Bubble sort

(1)  Definition

It's a typical exchange sort . Compare the key words of adjacent records , Exchange if reverse order , Until there are no records in reverse order .

(2)  Algorithm implementation

It's realized by a double loop , among The outer loop is used to control the number of sorting rounds , Generally, the length of the array to be sorted minus 1 Time , Because the last loop only leaves the last array element , There's no need to compare , At the same time, the array has finished sorting . and The inner loop is mainly used to compare the size of each adjacent element in the array , To determine whether to swap positions , The number of comparisons and exchanges decreases with the number of sorting rounds .

(3) 3 Ways of planting

(1)  Mode one : Usually we write the most bubble sort

It's not bubble sort in nature . Because it doesn't meet the definition of bubble sort . It's not a pairwise comparison between neighbors . It is , One element is compared to each of the following elements . If it's in reverse order , Then exchange .

The disadvantage is that : The process of each sort , I just chose No I The correct number up to , It doesn't help the rest of the data so far . such as : It's possible to swap the next smaller value to the last .

  1. public int[] bubbleSort1(int[] array) {  
  2. for (int i = 0; i < array.length - 1; i++) {   
  3. for (int j = i + 1; j < array.length; j++) {  
  4. if (array[i] > array[j]) {  
  5. int tmp = array[i];  
  6. array[i] = array[j];  
  7. array[j] = tmp;  
  8. }  
  9. }  
  10. }  
  11. return array;  
  12. }  

 

(2)  Mode two : Pure bubble sort

The first layer controls the number of layers in the loop . The second floor , Start with the last element , Two adjacent elements are compared in pairs , If the reverse order , Exchange the positions of each other's elements .

  1. public int[] bubbleSort2(int[] array) {  
  2. for (int i = 1; i < array.length; i++) {  
  3. for (int j = array.length - 1; j >=i; j--) {  
  4. if (array[j-1] > array[j]) {  
  5. int tmp = array[j];  
  6. array[j] = array[j-1];  
  7. array[j-1] = tmp;  
  8. }  
  9. }  
  10. }  
  11. return array;  

 

(3)  Mode three : Optimized bubble sorting

-- If it's done in order , Sort at any time after abort . From the back forward , If there is no exchange between two , Show that the rest is in order . Abort sort . Optimized a situation : If it's already in order , Discover and interrupt sorting in time .

  1. public int[] bubbleSort3(int[] array) {  
  2. boolean flag = true;  
  3. for (int i = 1; i < array.length && flag; i++) {  
  4. for (int j = array.length - 1; j >=i; j--) {  
  5. flag = false ;  
  6. if (array[j-1] > array[j]) {  
  7. int tmp = array[j];  
  8. array[j] = array[j-1];  
  9. array[j-1] = tmp;  
  10. flag = true;  
  11. }  
  12. }  
  13. }  
  14. return array;  
  15. }  

 

3.  Simple selection sort

(1)  Definition

Simple sorting is through n-1 Compare it to , Choose the smallest record . And i Exchange elements . The train of thought is : Comparison can be made many times , But there's only one exchange . Can be understood as an optimization of the first bubble sort .

Through many comparisons , The subscript that records the smallest data . Then the minimum value is exchanged with the data of the position that should be saved after sorting .

(2)  Code implementation

The train of thought is : Compare many times , Record n-i The subscript of the smallest data among the elements . One exchange . It's less moving than bubbling .

  1. public class SimpleSelectSort {  
  2. public int[] simpleSelectSort(int[] arry) {  
  3. for (int i = 0; i < arry.length-1; i++) {  
  4. int min = i;  
  5. for (int j = i + 1; j < arry.length; j++) {  
  6. if (arry[min] > arry[j]) {  
  7. min = j;  
  8. }  
  9. }  
  10. if(min!=i){  
  11. int tmp = arry[i];  
  12. arry[i] = arry[min];  
  13. arry[min]= tmp;  
  14. }  
  15. }  
  16. return arry;  
  17. }  

4.  Direct insert sort

(1)  Definition

The basic operation of direct insert sorting is to insert a record into an ordered table , So we get a new one 、 Record number increase 1 The order sheet .

understand : Will be n The elements to be sorted are treated as an ordered table and an unordered table , At first, there is only one element in the ordered table , There are... In the unordered list n-1 Elements , The first element is taken out of the unordered table every time during sorting , Insert it at the specified location in the ordered table , Make it a new ordered list , repeat n-1 The sorting process can be completed .

Code thinking :

One n Order of elements C Column can be understood as “ An ordered sequence of elements A” and “n-1 An unordered sequence of elements B”, be A=C[1],B=C[2,n-2]. from B The first element of begins , take B[0] Put the value of C[0] As a sentinel in , then , This is the sum of A The elements in the middle are compared from back to front , If A[i]> The sentinel value , be A[i] Move back a bit . If A[i]<= The sentinel value . Then loop comparison exits , And put the Sentinel's value into A[i] Location . The specific code implementation is as follows :

(2)  Code implementation
  1. public int[] sort(int[] array) {  
  2. int tmp, j = 0;  
  3. for (int i = 2; i < array.length; i++) {  
  4. if (array[i] < array[i - 1]) { // Need to put array[i] Insert ordered table   
  5. array[0] = array[i];//  Set up a sentry   
  6. for (j = i - 1; array[j] > array[0]; j--) {  
  7. array[j + 1] = array[j];//  Record back   
  8. }  
  9. array[j + 1] = array[0]; //  Insert where it should be inserted   
  10. }  
  11. }  
  12. return array;  
  13. }  

 

5.  Shell Sort

(1)  The basic idea

By dividing the elements to be sorted into subsequences . Using the idea of direct insertion sort to sort subsequences . Then the subsequence is narrowed down , Next, the subsequences are sorted by direct insertion . According to this thought , Until all elements are ordered by keyword .

 

Hill sort is an optimization of direct sort .

The incremental (gap) There is a common way to determine 3 Methods :

1.gap = The number of elements to be sorted , Every time you let gap /2;

2. Take prime numbers , Every time you let gap- -, until gap =1;

3.gap= The number of elements to be sorted , Every time you let gap /3 +1;

After being tested one by one by the great gods , The third method is more efficient .

 

(2)  Specific algorithm implementation

Suppose the elements to be sorted are n individual , The corresponding keywords are a1a2a3…….an, set up gap =4 The elements of are the same subsequence , Then the keyword of the element a1a5….aiai+5an-5 For a subsequence , Empathy ,a2 a6….an-6 It's also a subsequence , Then the keywords of the same subsequence are sorted by direct insertion sort . After the first sorting , Make gap = gap /3 +1, And sort the molecules . And so on , until gap =1, At this point, the entire element is sorted .

 

Hill sort code is as follows

  1. void ShellSort(SqList *L)  
  2. {  
  3. int i,j;  
  4. int increment=L->length;  
  5. do  
  6. {  
  7. increment=increment/3+1;            /*  Incremental sequence  */  
  8. for(i=increment+1;i<=L->length;i++)  
  9. {  
  10. if (L->r[i]<L->r[i-increment])    /*  Need to L->r[i] Insert an ordered increasing quantum table  */  
  11. {  
  12. L->r[0]=L->r[i];             /*  For the time being L->r[0] */  
  13. for(j=i-increment;j>0 && L->r[0]<L->r[j];j-=increment)  
  14. L->r[j+increment]=L->r[j]; /*  Record back , Find insertion location  */  
  15. L->r[j+increment]=L->r[0]; /*  Insert  */  
  16. }  
  17. }  
  18. }  
  19. while(increment>1);  
  20. }  

6.  Heap sort

Heap sort is an optimization of selection sort .

A binary tree has the following properties : Big push : Each node value is greater than or equal to the value of its left and right children . It's called the big top reactor . Small cap pile : The value of each node is smaller than the value of left and right children , It's called the small top reactor .

   If a node is given a sequence traversal from 1 Numbered starting , Then the nodes satisfy the following relationship :

     

(1)  Heap sort algorithm

Let's say we use big push to implement heap sort . The sequence to be sorted is constructed into a big push , here , The maximum value of the entire sequence is at the root node . Move him away , Will remain n-1 A sequence is reconstructed into a heap , This will get n Next largest value of elements . Do it over and over again , You get an ordered sequence .

(2)  Heap sort code

The code is implemented in two layers . The first level is heap sort . The second layer is to build the code of the big top heap .

Suppose the raw data of the heap is shown in the figure below : Convenient program description :

 

first floor : Heap sort

The code is as follows : The core concern is :(A How to build a new heap from an unordered sequence .(B If after the output element , How to adjust the remaining elements to a new heap .

  1. /*  For the sequence table L Make a heap sort  */  
  2. void HeapSort(SqList *L)  
  3. 2 {  
  4. 3  int i;  
  5. 4  for(i=L->length/2;i>0;i--)  /*  hold L Medium r Build it into a big top heap  */  
  6. 5    HeapAdjust(L,i,L->length);  
  7. 6  for(i=L->length;i>1;i--)  
  8. 7  {   
  9. 8    swap(L,1,i);    /* Exchange the heap top record with the last record of the current unsorted subsequence */  
  10. 9    HeapAdjust(L,1,i-1);   /*  take L->r[1..i-1] Readjustment to the big top  */  
  11. 10  }  
  12. 11 } 

The second floor :

(1) Build the big top heap . The idea is from the bottom up , From left to right , Each non leaf node acts as the root node 【4->3->2->1】, Adjust it and the subtree to a big top heap . You can see from the first level of code that ,for(i=L->length/2;i>0;i--) Output i just 4->3->2->1 The order of . Every time you call HeapAdjust, Finally, a big top heap is generated . Solved the problem A The question of .

(2) Adjustment of the GTR .

According to the fact that the heap is a complete binary tree , Root node sequence number i The relationship with leaf node is 2i,2i+1. therefore , It's about comparing left and right children with each other , Get a larger number , Then compare the larger value with the current heap top element . If the maximum value of two children is already less than the value of the top element of the heap , It shows that after adding the top of the heap, it is still a big top heap . If the maximum of two children is larger than the top element , It means that we need to adjust . Exchange the larger values with the top elements . After the analysis, the original heap top elements are changed to the child node , Whether the child's heap as the root node still conforms to the definition of the big top heap . until 2S>j, Out of the loop .

The code that adjusts the big top heap in a single call is like this . See below for specific code :

  1. /*  It is known that L->r[s..m] The key words recorded in the L->r[s] All of them meet the definition of heap , */  
  2. /*  This function adjusts L->r[s] Key words of , send L->r[s..m] Become a big pile  */  
  3. void HeapAdjust(SqList *L,int s,int m)  
  4. 2 {   
  5. 3  int temp,j;  
  6. 4  temp=L->r[s];  
  7. 5  for(j=2*s;j<=m;j*=2)  /*  Filter down the child nodes with larger keywords  */  
  8. 6  {  
  9. 7   if(j<m && L->r[j]<L->r[j+1])  
  10. 8    ++j;   /* j Is the subscript of the larger record in the keyword  */  
  11. 9   if(temp>=L->r[j])  
  12. 10    break;   /* rc Should be inserted in position s On  */  
  13. 11   L->r[s]=L->r[j];  
  14. 12   s=j;  
  15. 13  }  
  16. 14  L->r[s]=temp;   /*  Insert  */  
  17. 15 }

 

7.  Merge sort

“ Merger ” The definition in data structure is to combine two or more ordered tables into a new ordered table .

Merge sort : Using the idea of merging to realize the sorting algorithm . Suppose the initial sequence contains n A record . Can be seen as n An ordered sequence . The length of each sequence is 1, And then merge in pairs , obtain [n/2] The length of the whole extent is 2 perhaps 1 An ordered subsequence of ; And merge in pairs ,......, repeat , Until you get a length of n The ordered sequence of . This sort of sorting is called 2 Path merging and sorting .

(1)  Code thinking

The idea of merging into order : The original sequence is divided into two sequences according to the method of average distribution ,s-m and m-n. Then merge and sort their recursive calls . After we get two ordered sequences , Merge them merge.

Merge The idea of function is relatively simple . Is to traverse an ordered sequence , By way of comparison , Traversing two sequences from small to large at the same time . Move separately A Below the sequence or B The way the sequence subscripts , Take the smallest number from the two sequences . Until one of the sequence data is finished , In addition, the rest of the sequence is copied to the end . We get a complete ordered sequence . What should be noted is :Merge Parameters of (int SR[],int TR[],int s,int m,int n). take ST[s-m] ST[m+1-t] Merge into TR[s-n] in .

(2)  Merge code implementation
  1. /*  take SR[s..t] The order of merging is TR1[s..t] */  
  2. void MSort(int SR[],int TR1[],int s, int t)  
  3. 2 {  
  4. 3  int m;  
  5. 4  int TR2[MAXSIZE+1];  
  6. 5  if(s==t)  
  7. 6   TR1[s]=SR[s];  
  8. 7  else  
  9. 8  {  
  10. 9   m=(s+t)/2;   /*  take SR[s..t] Divide the for SR[s..m] and SR[m+1..t] */  
  11. 10   MSort(SR,TR2,s,m); /*  Recursively put SR[s..m] Merge into ordered TR2[s..m] */  
  12. 11   MSort(SR,TR2,m+1,t); /*  Recursively put SR[m+1..t] Merge into order TR2[m+1..t] */  
  13. 12   Merge(TR2,TR1,s,m,t); /*  take TR2[s..m] and TR2[m+1..t] Merge into TR1[s..t] */  
  14. 13  }  
  15. 14 }

Merge Code implementation

  1. /*  Will be orderly SR[i..m] and SR[m+1..n] Merge into ordered TR[i..n] */  
  2. void Merge(int SR[],int TR[],int i,int m,int n)  
  3. 2 {  
  4. 3  int j,k,l;  
  5. 4  for(j=m+1,k=i;i<=m && j<=n;k++) /*  take SR To be merged into middle and small records TR */  
  6. 5  {  
  7. 6   if (SR[i]<SR[j])  
  8. 7    TR[k]=SR[i++];  
  9. 8   else  
  10. 9    TR[k]=SR[j++];  
  11. 10  }  
  12. 11  if(i<=m)  
  13. 12  {  
  14. 13   for(l=0;l<=m-i;l++)  
  15. 14    TR[k+l]=SR[i+l];  /*  Put the rest SR[i..m] Copied to the TR */  
  16. 15  }  
  17. 16  if(j<=n)  
  18. 17  {  
  19. 18   for(l=0;l<=n-j;l++)  
  20. 19    TR[k+l]=SR[j+l];  /*  Put the rest SR[j..n] Copied to the TR */  
  21. 20  }  
  22. 21 }  

 

8.  Quick sort

(1)  Definition

Quick sort The idea is : The records to be sorted are divided into two parts by one-time sorting . One part of the record has a smaller keyword than the other . You can continue to sort the above rules on the records of these two parts . In order to achieve the purpose of orderly sequence .

Personal understanding : Quick sort algorithm is relatively simple , The core is to select a keyword , And put the records in the sequence that are smaller than the selected keyword , The sequence larger than the keyword is placed on the other side . At the same time, return the subscript of this keyword . The goal is to divide the keyword into two sequences , Call quicksort again recursively . The code to complete this part is the function Partition To achieve .

Partition Functions do , Is to select one of the keywords first , Try to put it in one place , Make the keyword of the data record on the left smaller than it , The key words of the data record on the right are bigger than his . We use such keywords as pivot (pivot

(2)  Code implementation

The code is divided into two function implementations . Quick sort algorithm and partition The function is implemented in two parts .

Fast sorting algorithm :

The general idea is as follows : adopt partition Function divides the sequence into two sorts according to the pivot . Then on this basis, the left and right sequences recursively call the quick sort algorithm to continue sorting . Until you can't check the score 【low>=high】. Here's the picture , Recursive quick sorting of low word list and high word list respectively .

  1. /*  For the sequence table L The subsequence in L->r[low..high] Quick sort  */  
  2. void QSort(SqList *L,int low,int high)  
  3. {   
  4. int pivot;  
  5. if(low<high)  
  6. {  
  7. pivot=Partition(L,low,high);  /*  take L->r[low..high] Split in two , Calculate the pivot value pivot */  
  8. QSort(L,low,pivot-1);   /*   Recursively sort low sub tables  */  
  9. QSort(L,pivot+1,high);   /*   Recursively sort the high sub table  */  
  10. }  

About merge function , First you need to select a data element , As a pivot . And then go forward from the last element , If the tail is bigger than the pivot ,high--, If the tail element is smaller than the pivot , In exchange for low and high On the element . Switch to from low towards high Move , And compare low The elements below and the pivot size . If low The subscript element is small , be low++, No , In exchange for low and high Elements . If low<highlow and high Keep getting closer to , It will eventually coincide , Without coincidence 】, Continue to cycle the above .

Merge The code is as follows :

  1. /*  Exchange order table L Records of neutron tables , Make the pivot record in place , And return to where it is  */  
  2. /*  Now before it ( after ) The records are not big ( Small ) In it . */  
  3. int Partition(SqList *L,int low,int high)  
  4. 2 {   
  5. 3  int pivotkey;  
  6. 4  pivotkey=L->r[low];  /*  Use the first record of the sub table as the pivot record  */  
  7. 5  while(low<high)      /*  Scan alternately from both ends of the table to the middle  */  
  8. 6  {   
  9. 7    while(low<high&&L->r[high]>=pivotkey)  
  10. 8    high--;  
  11. 9    swap(L,low,high); /*  Switch records smaller than pivot records to low end  */  
  12. 10    while(low<high&&L->r[low]<=pivotkey)  
  13. 11    low++;  
  14. 12    swap(L,low,high); /*  Switch records larger than pivot records to high end  */  
  15. 13  }  
  16. 14  return low;     /*  Return to the pivot position  */  
  17. 15 }
(3)  Quick platoon optimization

Optimize the selection pivot

The current pivot is to randomly take the first element . It can be downloaded from lowmidhigh Get the middle sized element in three positions as the pivot .

Optimize the exchange that you don't need

The corner of the array is marked with 0 Where to store pivot, For the sentinel . therefore , There is always an empty position in the sequence , Data can be moved to the desired location by assigning values . The assignment is a little less than the swap operation .

Optimize the sorting of small arrays

Fast sorting has an advantage for sequences with a larger amount of data . For sequences with small amounts of data , It's faster to insert directly . therefore , It can be judged by the total number of data records , Whether to use quick sort or direct insert sort algorithm . Both direct insertion and quick sorting result in a sort binary tree . This idea and spark Is it selected sort Shuffle still ByPass shuffle It's consistent . Select the appropriate algorithm according to the amount of data .

Optimize recursive operations

Because after the first recursion , Variable low It's no use , So you can pivot+1 Assign a value to low, After recycling , One time Partition(L,low,high), Its effect is equivalent to “QSort(L,pivot+1,high);”. The result is the same , But the stack depth can be reduced by using iterative rather than recursive methods , So it improves the overall performance .

版权声明
本文为[It lost schoolboy]所创,转载请带上原文链接,感谢

Scroll to Top