编程知识 cdmana.com

Use Huffman algorithm to compress and decompress files

Use Huffman algorithm to decompress and compress files

Preface : The writer is a sophomore , Using Huffman algorithm to compress and decompress files is an experiment assignment assigned by teachers . Code this program took a lot of time , But I also learned a lot from it , Writing this kind of program which is similar to engineering class and writing algorithm is very different .

part1 principle

Count the characters in the file , The key word is the number of times it appears , Build the Huffman tree , Realize the encoding of characters , Convert a character into a 01 Sequence , Because one character takes up one byte ,8 individual bit, And one byte can hold eight bits 01 code , So replace the characters in the file with bit encoding , The compression of the file is completed . In order to be able to decode , We're going to combine the Huffman tree with 01 The sequence is stored in a file together . Insert picture description here
Huffman tree
When decoding , We restore the Huffman tree stored in the compressed file , Through the document 01 Sequence to find Huffman tree , The original file can be restored .

part2 Compression code implementation

1. Count the number of characters in the file

Count the frequency of characters in text

int Count(string op, string path1, string path2, int tong[]) {
   
   
        // TODO: Statistical file 1 The frequency of character occurrence 
        // path1 It's the source file ,path2 It's the target file 
        ifstream instr(path1, ios::in | ios::binary);
        unsigned int bytebuff = 0;
        char ch;
        while (instr.get(ch)) {
   
   
            bytebuff =(int)(unsigned char)ch;
            tong[bytebuff]++;
        } // Statistics weight complete !
        int LeafNumber = 0;
        for (int i = 0; i <= 256; i++) {
   
   
            if (tong[i] != 0)
                LeafNumber++;
        }
        instr.close();
        cerr << "Count Completed" << endl;
        return LeafNumber;
    }

Use get() Read one byte at a time from the file input stream , Equivalent to one char type . Use the bucket count method to count the number of character occurrences . Here want to use unsigned char, Because the encoding of some characters is greater than 127 Of , Symbol bit 1, Do not apply unsigned It's going to read negative .

2. Build the Huffman tree

struct HuffmanNode {
   
   
    int info; // save 
    int index;
    int weight;
    int parent; // save 
    int left;
    int right;
    char side; // save 
    string BinaryCode;
    friend bool operator>(HuffmanNode f1, HuffmanNode f2) {
   
   
        return f1.weight > f2.weight;
    }
};

Node information of Huffman tree
I like to change node names and structure pointers

typedef HuffmanNode Node;
typedef HuffmanNode *Tree;

Create the Huffman tree :

int CreatHuffmanTree(int tong[], int LeafNumber, Node HuffmanTree[]) {
   
   
        // TODO: Create the Huffman tree 
        int k = 0;
        priority_queue<Node, vector<Node>, greater<Node>> pq;
        // Tree HuffmanTree = new Node[2 * LeafNumber - 1];
        // 0--->LeafNumber - 1             It's a leaf node 
        // leafNumber--->2*LeafNumber - 2  Root node 
        for (int i = 0; i <= 256; i++) {
   
   
            if (tong[i]) {
   
   
                HuffmanTree[k].info = i;
                HuffmanTree[k].index = k;
                HuffmanTree[k].left = HuffmanTree[k].right =
                    HuffmanTree[k].parent = -1;
                HuffmanTree[k].weight = tong[i];
                pq.push(HuffmanTree[k]);
                k++;
            }
        }
        int j = LeafNumber;
        // Building a Huffman tree through priority queues 
        while (pq.size() > 1) {
   
   
            Node t1 = pq.top();
            pq.pop();
            Node t2 = pq.top();
            pq.pop();
            HuffmanTree[t1.index].parent = j;
            HuffmanTree[t2.index].parent = j;
            HuffmanTree[j].index = j;
            HuffmanTree[j].parent = -1;
            // cout<<HuffmanTree[j].index;
            HuffmanTree[j].left = t1.index;
            // cout<<HuffmanTree[j].left;
            HuffmanTree[j].right = t2.index;
            // cout<<HuffmanTree[j].right;
            HuffmanTree[j].weight = t1.weight + t2.weight;
            HuffmanTree[j].info = -127;
            pq.push(HuffmanTree[j]);
            j++;
        }
        j--;
        Node HuffmanTreeHead = pq.top();
        HuffmanTreeHead.parent = -127;
        HuffmanTree[j] = HuffmanTreeHead;
        HuffmanTree[j].info = -127;
        pq.pop();
        cerr << "Creat Huffman Tree Completed" << endl;
        return 0;
    }

It's used here stl Container priority queue priority_queue. Priority queue , The bottom layer is implemented by heap . The team leader must be the one with the highest priority in the current queue .
Because the node of Huffman tree has more than one information , So use the priority queue to count the number of occurrences weight Sort , To overload operators in structs

friend bool operator>(HuffmanNode f1, HuffmanNode f2) {
   
   
        return f1.weight > f2.weight;
    }

3. Encode the characters that appear

int GetCodeNode(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (HuffmanTree[i].info == -127)
                continue;
            int IndexForSearching = i;
            HuffmanTree[i].BinaryCode = "";
            int j = 0;
            while (HuffmanTree[IndexForSearching].parent != -127) {
   
   
                j = HuffmanTree[IndexForSearching].parent;
                if (HuffmanTree[j].left == IndexForSearching)
                    HuffmanTree[i].BinaryCode += '0';
                if (HuffmanTree[j].right == IndexForSearching)
                    HuffmanTree[i].BinaryCode += '1';
                IndexForSearching = j;
            }
            reverse(HuffmanTree[i].BinaryCode.begin(),
                    HuffmanTree[i].BinaryCode.end());
        }
        cerr << "Get Node Code Completed" << endl;
        return 0;
    }

The purpose is to make it easier to encode the file in the next step , But it's going to be slower

4. Code the file according to the code

string Encode(string path1, Node HuffmanTree[], int LeafNumber) {
   
   
        ifstream instr(path1, ios::in | ios::binary);
        char ch;
        unsigned int bytebuff = 0;
        int bitmask = 0x80;
        string HuffmanPath = "";
        while (instr.get(ch)) {
   
   
            bytebuff = (int)(unsigned char)ch;
            int value = bytebuff;
            for (int i = 0; i < LeafNumber; i++) {
   
   
                if (HuffmanTree[i].info == value) {
   
   
                    HuffmanPath += HuffmanTree[i].BinaryCode;
                    break;
                }
            }
        }
        cerr << "Encode Completed" << endl;
        return HuffmanPath;
    }

In the order of the document , Convert the characters in the file to 01 The sequence is saved in a string .

5. Put... In the string 01 String conversion

In a string 0 and 1 It's stored in characters , Save one byte for one 0 or 1, Bit operation , Save all eight bits in a byte 0 and 1, In this way, one byte can store eight 0 and 1

string SwitchStringToBinary(string HuffmanPath, int &Sign) {
   
   
        // TODO: Put... In the string 01 Change the sequence to bit
        //  The last byte has to deal with the extra 0--> hold 0 Put it back 
        string BinaryPath = "";
        int bytebuff = 0;
        int shiftcount = 0;
        for (int i = 0; i < HuffmanPath.size(); i++) {
   
   
            bytebuff += (HuffmanPath[i] == '1' ? 1 : 0);
            bytebuff <<= 1;
            shiftcount++;
            if (shiftcount == 8) {
   
   
                bytebuff >>= 1;
                BinaryPath += (char)bytebuff;
                bytebuff = 0;
                shiftcount = 0;
                if (i == HuffmanPath.size() - 1)
                    break;
                if (i + 8 > HuffmanPath.size()) {
   
   
                    i++;
                    while (i <= HuffmanPath.size() - 1) {
   
   
                        bytebuff += (HuffmanPath[i] == '1' ? 1 : 0);
                        bytebuff <<= 1;
                        shiftcount++;
                        i++;
                    }
                    bytebuff <<= 7 - shiftcount;
                    BinaryPath += (char)bytebuff;
                }
            }
        }
        Sign = 8 - shiftcount;
        cerr << "Switch String To Binary Completed" << endl;
        return BinaryPath;
    }

Pay attention when you write here , Because when we read the string in eight, eight , If the length of the string % 8 != 0, So the end of the string is not even eight bits , Special treatment , Here I put the effective 01 The back of the string is full of 0, A filling 8 position , Then use one Sign To mark the end with a few more 0, hold Sign Put it in the file , In this way, we can decode the last redundant 0 It's been disposed of .

6. The preparation before saving the file

I have a lot of problems here , It's also a program bug The main reason for this . We need to store as little information as possible , Take up less space , Huffman put our files in the tree . What information should be stored so that it can be completely restored when decoding . After trying many algorithms ( At the end of the paper, we will introduce ), I have written countless bug, I used a more stupid method ~:
Node storage parent( Parent node / Parent node ),side( Whether the child node is the left or right child of the parent node ),info( The corresponding leaf node ASCII code )
Get the side

int GetSide(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (i == HuffmanTree[HuffmanTree[i].parent].left)
                HuffmanTree[i].side = 'l';
            if (i == HuffmanTree[HuffmanTree[i].parent].right)
                HuffmanTree[i].side = 'r';
        }
        return 0;
    }

7. write file

int WriteToFile(string path2, Node HuffmanTree[], int LeafNumber,
                    string BinaryPath, int Sign) {
   
   
        // path2 It's the target file to write 
        // Open binary output stream 
        // ShowTable(HuffmanTree, LeafNumber);
        LeafNumber -= 1;
        ofstream outstr(path2, ios::binary);
        outstr.write(reinterpret_cast<char *>(&Sign), sizeof(char));
        outstr.write(reinterpret_cast<char *>(&LeafNumber), sizeof(char));
        LeafNumber += 1;
        for (int i = 0; i < LeafNumber; i++) {
   
   
            // Get the address of the content to write , Convert to char*
            HuffmanTree[i].parent -= LeafNumber;
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].info),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].parent),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].side),
                         sizeof(char));
        }
        for (int i = LeafNumber; i < 2 * LeafNumber - 1; i++) {
   
   
            HuffmanTree[i].parent -= LeafNumber;
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].parent),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].side),
                         sizeof(char));
        }
        for (int i = 0; i < BinaryPath.size(); i++) {
   
   
            outstr.put(BinaryPath[i]);
        }
        outstr.close();
        cerr << "Write To File Completed" << endl;
        return 0;
    }

Write the tree information and code to the file , I wrote the number of leaf nodes of my Huffman tree in the editing dock LeafNumber, And the one mentioned above Sign. Then there's Huffman tree and file coding .

part3. Decompression code implementation

When you're done compressing , Decompression is basically flat ~. But decompression will encounter a lot of problems , To compress the code inside to change .

1. Reading documents

Tree ReadFile(string path1, unsigned int &LeafNumber, string &SearchPath,
                  int &Sign) {
   
   
        // Sign Is to delete the last few 0
        ifstream instr(path1, ios::in | ios::binary);
        SearchPath = "";
        char ch;
        instr.get(ch);
        Sign = ch;
        instr.get(ch);
        LeafNumber = (int)(unsigned char)ch;
        LeafNumber += 1;
        Tree HuffmanTree = new Node[2 * LeafNumber - 1];
        unsigned int num;
        for (int i = 0; i < LeafNumber; i++) {
   
   
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].info = num;
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].parent = num + LeafNumber;
            instr.get(ch);
            HuffmanTree[i].side = ch;
            // Initialize other information 
            HuffmanTree[i].BinaryCode = "";
            HuffmanTree[i].index = i;
            HuffmanTree[i].left = -1;
            HuffmanTree[i].right = -1;
            HuffmanTree[i].weight = -1;
        }
        for (int i = LeafNumber; i < 2 * LeafNumber - 1; i++) {
   
   
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].parent = num + LeafNumber;
            instr.get(ch);
            HuffmanTree[i].side = ch;
            // Initialize other information 
            HuffmanTree[i].info = -10000;
            HuffmanTree[i].BinaryCode = "";
            HuffmanTree[i].index = i;
            HuffmanTree[i].left = -1;
            HuffmanTree[i].right = -1;
            HuffmanTree[i].weight = -1;
        }
        int bitmask = 0x80;
        while (instr.get(ch)) {
   
   
            num = (int)(unsigned char)ch;
            while (bitmask != 0) {
   
   
                if ((bitmask & num) != 0) {
   
   
                    SearchPath += '1';
                }
                if ((bitmask & num) == 0) {
   
   
                    SearchPath += '0';
                }
                bitmask >>= 1;
            }
            bitmask = 0x80;
        }
        SearchPath.erase(SearchPath.end() - Sign, SearchPath.end());
        instr.close();
        cerr << "Read File Completed" << endl;
        return HuffmanTree;
    }

Read the entire compressed file , Open space, save tree node space . Save the coded part of the file to string Easy to use . Remember to use Sign Put the extra at the end of the code 0 Get rid of .

2. Restore Huffman tree

Create a parent-child for the node just stored in the linear table ( Mother and son / Father and daughter / mother and daughter ) Relationship

int BuiltHuffmanTree(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (HuffmanTree[i].side == 'l') {
   
   
                HuffmanTree[HuffmanTree[i].parent].left = i;
            }
            if (HuffmanTree[i].side == 'r') {
   
   
                HuffmanTree[HuffmanTree[i].parent].right = i;
            }
        }
        cerr << "Built Huffman Tree Completed" << endl;
        return 0;
    }

3. Restore files

According to 01 Sequence , Find... In the Huffman tree , Just write the found characters into the file !

int RestoreFile(string SearchPath, Node HuffmanTree[], string path2,
                    int LeafNumber) {
   
   
        ofstream outstr(path2, ios::out | ios::binary);
        int head = 2 * LeafNumber - 2, now = head;
        for (int i = 0; i < SearchPath.size(); i++) {
   
   
            if (SearchPath[i] == '0') {
   
   
                now = HuffmanTree[now].left;
            }
            if (SearchPath[i] == '1') {
   
   
                now = HuffmanTree[now].right;
            }
            if (HuffmanTree[now].left == -1 && HuffmanTree[now].right == -1) {
   
   
                char res = HuffmanTree[now].info;
                outstr.put(res);
                now = head;
            }
        }
        outstr.close();
        cerr << "Restore File Completed" << endl;
        return 0;
    }

part4. All the code

// writen by spln
// [email protected]
#include <bits/stdc++.h>
using namespace std;
struct HuffmanNode {
   
   
    int info; // save 
    int index;
    int weight;
    int parent; // save 
    int left;
    int right;
    char side; // save 
    string BinaryCode;
    friend bool operator>(HuffmanNode f1, HuffmanNode f2) {
   
   
        return f1.weight > f2.weight;
    }
};
typedef HuffmanNode Node;
typedef HuffmanNode *Tree;
int ShowHelp() {
   
   
    cerr << " Input error , Please input as required :" << endl;
    cerr << "-z/-x  file name 1  file name 2" << endl;
    return 0;
}
class Compression {
   
   
  public:
    int CreatHuffmanTree(int tong[], int LeafNumber, Node HuffmanTree[]) {
   
   
        // TODO: Create the Huffman tree 
        int k = 0;
        priority_queue<Node, vector<Node>, greater<Node>> pq;
        // Tree HuffmanTree = new Node[2 * LeafNumber - 1];
        // 0--->LeafNumber - 1             It's a leaf node 
        // laefNumber--->2*LeafNumber - 2  Root node 
        for (int i = 0; i <= 256; i++) {
   
   
            if (tong[i]) {
   
   
                HuffmanTree[k].info = i;
                HuffmanTree[k].index = k;
                HuffmanTree[k].left = HuffmanTree[k].right =
                    HuffmanTree[k].parent = -1;
                HuffmanTree[k].weight = tong[i];
                pq.push(HuffmanTree[k]);
                k++;
            }
        }
        int j = LeafNumber;
        // Building a Huffman tree through priority queues 
        while (pq.size() > 1) {
   
   
            Node t1 = pq.top();
            pq.pop();
            Node t2 = pq.top();
            pq.pop();
            HuffmanTree[t1.index].parent = j;
            HuffmanTree[t2.index].parent = j;
            HuffmanTree[j].index = j;
            HuffmanTree[j].parent = -1;
            // cout<<HuffmanTree[j].index;
            HuffmanTree[j].left = t1.index;
            // cout<<HuffmanTree[j].left;
            HuffmanTree[j].right = t2.index;
            // cout<<HuffmanTree[j].right;
            HuffmanTree[j].weight = t1.weight + t2.weight;
            HuffmanTree[j].info = -127;
            pq.push(HuffmanTree[j]);
            j++;
        }
        j--;
        Node HuffmanTreeHead = pq.top();
        HuffmanTreeHead.parent = -127;
        HuffmanTree[j] = HuffmanTreeHead;
        HuffmanTree[j].info = -127;
        pq.pop();
        cerr << "Creat Huffman Tree Completed" << endl;
        return 0;
    }
    int GetCodeNode(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (HuffmanTree[i].info == -127)
                continue;
            int IndexForSearching = i;
            HuffmanTree[i].BinaryCode = "";
            int j = 0;
            while (HuffmanTree[IndexForSearching].parent != -127) {
   
   
                j = HuffmanTree[IndexForSearching].parent;
                if (HuffmanTree[j].left == IndexForSearching)
                    HuffmanTree[i].BinaryCode += '0';
                if (HuffmanTree[j].right == IndexForSearching)
                    HuffmanTree[i].BinaryCode += '1';
                IndexForSearching = j;
            }
            reverse(HuffmanTree[i].BinaryCode.begin(),
                    HuffmanTree[i].BinaryCode.end());
        }
        cerr << "Get Node Code Completed" << endl;
        return 0;
    }
    string Encode(string path1, Node HuffmanTree[], int LeafNumber) {
   
   
        ifstream instr(path1, ios::in | ios::binary);
        char ch;
        unsigned int bytebuff = 0;
        int bitmask = 0x80;
        string HuffmanPath = "";
        while (instr.get(ch)) {
   
   
            bytebuff = (int)(unsigned char)ch;
            int value = bytebuff;
            for (int i = 0; i < LeafNumber; i++) {
   
   
                if (HuffmanTree[i].info == value) {
   
   
                    HuffmanPath += HuffmanTree[i].BinaryCode;
                    break;
                }
            }
        }
        cerr << "Encode Completed" << endl;
        return HuffmanPath;
    }
    int ShowTable(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 1; i++) {
   
   
            cout << "i:" << i << endl;
            cout << HuffmanTree[i].index << "<-index" << endl;
            cout << HuffmanTree[i].info << "<-info" << endl;
            cout << HuffmanTree[i].side << "<-side" << endl;
            cout << HuffmanTree[i].left << "<-left" << endl;
            cout << HuffmanTree[i].right << "<-right" << endl;
            cout << HuffmanTree[i].parent << "<-parent" << endl;
            cout << HuffmanTree[i].weight << "<-weight" << endl;
            cout << HuffmanTree[i].BinaryCode << "<-code" << endl;
        }
        return 0;
    }
    int Count(string op, string path1, string path2, int tong[]) {
   
   
        // TODO: Statistical file 1 The frequency of character occurrence 
        // path1 It's the source file ,path2 It's the target file 
        ifstream instr(path1, ios::in | ios::binary);
        unsigned int bytebuff = 0;
        char ch;
        while (instr.get(ch)) {
   
   
            bytebuff =(int)(unsigned char)ch;
            tong[bytebuff]++;
        } // Statistics weight complete !
        int LeafNumber = 0;
        for (int i = 0; i <= 256; i++) {
   
   
            if (tong[i] != 0)
                LeafNumber++;
        }
        instr.close();
        cerr << "Count Completed" << endl;
        return LeafNumber;
    }
    // Save the Huffman tree to FinalOutputString
    string SwitchStringToBinary(string HuffmanPath, int &Sign) {
   
   
        // TODO: Put... In the string 01 Change the sequence to bit
        //  The last byte has to deal with the extra 0--> hold 0 Put it back 
        string BinaryPath = "";
        int bytebuff = 0;
        int shiftcount = 0;
        for (int i = 0; i < HuffmanPath.size(); i++) {
   
   
            bytebuff += (HuffmanPath[i] == '1' ? 1 : 0);
            bytebuff <<= 1;
            shiftcount++;
            if (shiftcount == 8) {
   
   
                bytebuff >>= 1;
                BinaryPath += (char)bytebuff;
                bytebuff = 0;
                shiftcount = 0;
                if (i == HuffmanPath.size() - 1)
                    break;
                if (i + 8 > HuffmanPath.size()) {
   
   
                    i++;
                    while (i <= HuffmanPath.size() - 1) {
   
   
                        bytebuff += (HuffmanPath[i] == '1' ? 1 : 0);
                        bytebuff <<= 1;
                        shiftcount++;
                        i++;
                    }
                    bytebuff <<= 7 - shiftcount;
                    BinaryPath += (char)bytebuff;
                }
            }
        }
        Sign = 8 - shiftcount;
        cerr << "Switch String To Binary Completed" << endl;
        return BinaryPath;
    }
    int GetSide(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (i == HuffmanTree[HuffmanTree[i].parent].left)
                HuffmanTree[i].side = 'l';
            if (i == HuffmanTree[HuffmanTree[i].parent].right)
                HuffmanTree[i].side = 'r';
        }
        return 0;
    }
    int WriteToFile(string path2, Node HuffmanTree[], int LeafNumber,
                    string BinaryPath, int Sign) {
   
   
        // path2 It's the target file to write 
        // Open binary output stream 
        // ShowTable(HuffmanTree, LeafNumber);
        LeafNumber -= 1;
        ofstream outstr(path2, ios::binary);
        outstr.write(reinterpret_cast<char *>(&Sign), sizeof(char));
        outstr.write(reinterpret_cast<char *>(&LeafNumber), sizeof(char));
        LeafNumber += 1;
        for (int i = 0; i < LeafNumber; i++) {
   
   
            // Get the address of the content to write , Convert to char*
            HuffmanTree[i].parent -= LeafNumber;
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].info),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].parent),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].side),
                         sizeof(char));
        }
        for (int i = LeafNumber; i < 2 * LeafNumber - 1; i++) {
   
   
            HuffmanTree[i].parent -= LeafNumber;
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].parent),
                         sizeof(char));
            outstr.write(reinterpret_cast<char *>(&HuffmanTree[i].side),
                         sizeof(char));
        }
        for (int i = 0; i < BinaryPath.size(); i++) {
   
   
            outstr.put(BinaryPath[i]);
        }
        outstr.close();
        cerr << "Write To File Completed" << endl;
        return 0;
    }
};
class Decompression {
   
   
  public:
    int ShowTable(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 1; i++) {
   
   
            cout << "i:" << i << endl;
            cout << HuffmanTree[i].index << "<-index" << endl;
            cout << HuffmanTree[i].info << "<-info" << endl;
            cout << HuffmanTree[i].side << "<-side" << endl;
            cout << HuffmanTree[i].left << "<-left" << endl;
            cout << HuffmanTree[i].right << "<-right" << endl;
            cout << HuffmanTree[i].parent << "<-parent" << endl;
        }
        return 0;
    }
    Tree ReadFile(string path1, unsigned int &LeafNumber, string &SearchPath,
                  int &Sign) {
   
   
        // Sign Is to delete the last few 0
        ifstream instr(path1, ios::in | ios::binary);
        SearchPath = "";
        char ch;
        instr.get(ch);
        Sign = ch;
        instr.get(ch);
        LeafNumber = (int)(unsigned char)ch;
        LeafNumber += 1;
        Tree HuffmanTree = new Node[2 * LeafNumber - 1];
        unsigned int num;
        for (int i = 0; i < LeafNumber; i++) {
   
   
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].info = num;
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].parent = num + LeafNumber;
            instr.get(ch);
            HuffmanTree[i].side = ch;
            // Initialize other information 
            HuffmanTree[i].BinaryCode = "";
            HuffmanTree[i].index = i;
            HuffmanTree[i].left = -1;
            HuffmanTree[i].right = -1;
            HuffmanTree[i].weight = -1;
        }
        for (int i = LeafNumber; i < 2 * LeafNumber - 1; i++) {
   
   
            instr.get(ch);
            num = (int)(unsigned char)ch;
            HuffmanTree[i].parent = num + LeafNumber;
            instr.get(ch);
            HuffmanTree[i].side = ch;
            // Initialize other information 
            HuffmanTree[i].info = -10000;
            HuffmanTree[i].BinaryCode = "";
            HuffmanTree[i].index = i;
            HuffmanTree[i].left = -1;
            HuffmanTree[i].right = -1;
            HuffmanTree[i].weight = -1;
        }
        int bitmask = 0x80;
        while (instr.get(ch)) {
   
   
            num = (int)(unsigned char)ch;
            while (bitmask != 0) {
   
   
                if ((bitmask & num) != 0) {
   
   
                    SearchPath += '1';
                }
                if ((bitmask & num) == 0) {
   
   
                    SearchPath += '0';
                }
                bitmask >>= 1;
            }
            bitmask = 0x80;
        }
        SearchPath.erase(SearchPath.end() - Sign, SearchPath.end());
        instr.close();
        cerr << "Read File Completed" << endl;
        return HuffmanTree;
    }
    int BuiltHuffmanTree(Node HuffmanTree[], int LeafNumber) {
   
   
        for (int i = 0; i < 2 * LeafNumber - 2; i++) {
   
   
            if (HuffmanTree[i].side == 'l') {
   
   
                HuffmanTree[HuffmanTree[i].parent].left = i;
            }
            if (HuffmanTree[i].side == 'r') {
   
   
                HuffmanTree[HuffmanTree[i].parent].right = i;
            }
        }
        cerr << "Built Huffman Tree Completed" << endl;
        return 0;
    }
    int RestoreFile(string SearchPath, Node HuffmanTree[], string path2,
                    int LeafNumber) {
   
   
        ofstream outstr(path2, ios::out | ios::binary);
        int head = 2 * LeafNumber - 2, now = head;
        for (int i = 0; i < SearchPath.size(); i++) {
   
   
            if (SearchPath[i] == '0') {
   
   
                now = HuffmanTree[now].left;
            }
            if (SearchPath[i] == '1') {
   
   
                now = HuffmanTree[now].right;
            }
            if (HuffmanTree[now].left == -1 && HuffmanTree[now].right == -1) {
   
   
                char res = HuffmanTree[now].info;
                outstr.put(res);
                now = head;
            }
        }
        outstr.close();
        cerr << "Restore File Completed" << endl;
        return 0;
    }
};
// Parse command line :
int main(int argc, char *argv[]) {
   
   
    // // Defining classes 
    Compression Compress;
    Decompression Decompress;
    if (argc != 4)
        ShowHelp();
    else if (stricmp(argv[1], "-z") == 0)
        cerr << "Zip " << argv[2] << " to " << argv[3] << " ..." << endl;
    else if (stricmp(argv[1], "-x") == 0)
        cerr << "Extract " << argv[2] << " to " << argv[3] << " ..." << endl;
    else {
   
   
        ShowHelp();
        return 0;
    }
    // Assign the path to the string 
    const string op = argv[1];
    const string path1 = argv[2];
    const string path2 = argv[3];
    ifstream instr(path1, ios::in | ios::binary);
    if (!instr) {
   
   
        cerr << "Open File failed" << endl;
        return 0;
    }
    // Path assignment 
    if (op == "-z") {
   
   
        // cerr << "Ziping..." << endl;
        int tong[257] = {
   
   0};
        int LeafNumber =
            Compress.Count(op, path1, path2, tong); // Count the frequency of the characters 
        Tree HuffmanTree = new Node[2 * LeafNumber - 1];          // Initialization tree 
        Compress.CreatHuffmanTree(tong, LeafNumber, HuffmanTree); // Build up trees 
        Compress.GetSide(HuffmanTree, LeafNumber);
        Compress.GetCodeNode(HuffmanTree, LeafNumber); // Get the leaf node code 
        // Compress.ShowTable(HuffmanTree, LeafNumber);
        string HuffmanPath =
            Compress.Encode(path1, HuffmanTree, LeafNumber); // The document is encoded 
        int Sign;
        string BinaryPath =
            Compress.SwitchStringToBinary(HuffmanPath, Sign); // Get binary string 
        Compress.WriteToFile(path2, HuffmanTree, LeafNumber, BinaryPath,
                             Sign); // Write the Huffman tree to a file 
        // Compress.ShowTable(HuffmanTree, LeafNumber);
        cerr << "Compression Completed" << endl;
        return 0;
    } 
    else if (op == "-x") {
   
   
        unsigned int LeafNumber;
        int Sign;
        string SearchPath;
        Tree HuffmanTree =
            Decompress.ReadFile(path1, LeafNumber, SearchPath, Sign);
        Decompress.BuiltHuffmanTree(HuffmanTree, LeafNumber);
        Decompress.RestoreFile(SearchPath, HuffmanTree, path2, LeafNumber);
        // Decompress.ShowTable(HuffmanTree, LeafNumber);
        cerr << "Decompress Completed" << endl;
    } else
        return 0;
    return 0;
}

part5. At the end

In fact, after I wrote the code, I felt that some of the algorithms I used were not very good , But if you want to change it, it's basically refactoring the code , The amount of work is huge .
Let's introduce an algorithm used by big hand students : Don't write Huffman trees to files , It's like python What's in the dictionary is stored in a file : Store the encoding length and encoding of each character . In this way, you can build a map, You can restore the file .
The teacher's algorithm is : The direct opening length is 512 The linear table , The subscript corresponding to each node is the corresponding ASCII code . Save only the parent node in the file , After processing , Use subscript size to distinguish left and right child nodes of the same parent node .

Welcome to technical exchange !!!
If there is a better algorithm, welcome to discuss !

Last , Let's show you a picture and smile
 Insert picture description here

版权声明
本文为[osc_ hjtv1vkc]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224114549519y.html

Scroll to Top