编程知识 cdmana.com

Design a high concurrency and high availability HTTP service for IP query based on flash

The structure design

The infrastructure is flask+gunicorn+ Load balancing , Load balancing is divided into Alibaba cloud hardware load balancing service and software load nginx.gunicorn Use supervisor Conduct management .

Use nginx Software load structure diagram

Use alicloud hardware load balancing service structure diagram

because flask app It needs to be saved in memory ip Trees and countries 、 Province 、 City related dictionaries , So it takes up a lot of memory .gunicorn Of 1 individual worker Take up 300M Memory ,nginx Of 4 individual worker Less memory consumption ( Less than 100M), So occupy 1.3G Of memory ( That is, you need a 2G Memory servers ). When gunicorn When any node hangs up or upgrades , Another node is still using , It doesn't affect the overall service

ip database

IP library ( Also called IP Address database ), It is collected by professional and technical personnel through a variety of technical means over a long period of time , And there are professionals to update it for a long time 、 maintain 、 Add .

ip Database parsing query code

Implementation based on binary search tree

import struct
from socket import inet_aton, inet_ntoa
import os
import sys

sys.setrecursionlimit(1000000)

_unpack_V = lambda b: struct.unpack("<L", b)
_unpack_N = lambda b: struct.unpack(">L", b)
_unpack_C = lambda b: struct.unpack("B", b)


class IpTree:
    def __init__(self):
        self.ip_dict = {}
        self.country_codes = {}
        self.china_province_codes = {}
        self.china_city_codes = {}

    def load_country_codes(self, file_name):
        try:
            path = os.path.abspath(file_name)
            with open(path, "rb") as f:
                for line in f.readlines():
                    data = line.split('\t')
                    self.country_codes[data[0]] = data[1]
                    # print self.country_codes
        except Exception as ex:
            print "cannot open file %s: %s" % (file, ex)
            print ex.message
            exit(0)

    def load_china_province_codes(self, file_name):
        try:
            path = os.path.abspath(file_name)
            with open(path, "rb") as f:
                for line in f.readlines():
                    data = line.split('\t')
                    provinces = data[2].split('\r')
                    self.china_province_codes[provinces[0]] = data[0]
                    # print self.china_province_codes
        except Exception as ex:
            print "cannot open file %s: %s" % (file, ex)
            print ex.message
            exit(0)

    def load_china_city_codes(self, file_name):
        try:
            path = os.path.abspath(file_name)
            with open(path, "rb") as f:
                for line in f.readlines():
                    data = line.split('\t')
                    cities = data[3].split('\r')
                    self.china_city_codes[cities[0]] = data[0]
        except Exception as ex:
            print "cannot open file %s: %s" % (file, ex)
            print ex.message
            exit(0)

    def loadfile(self, file_name):
        try:
            ipdot0 = 254
            path = os.path.abspath(file_name)
            with open(path, "rb") as f:
                local_binary0 = f.read()
                local_offset, = _unpack_N(local_binary0[:4])
                local_binary = local_binary0[4:local_offset]
                # 256 nodes
                while ipdot0 >= 0:
                    middle_ip = None
                    middle_content = None
                    lis = []
                    # offset
                    begin_offset = ipdot0 * 4
                    end_offset = (ipdot0 + 1) * 4
                    # index
                    start_index, = _unpack_V(local_binary[begin_offset:begin_offset + 4])
                    start_index = start_index * 8 + 1024
                    end_index, = _unpack_V(local_binary[end_offset:end_offset + 4])
                    end_index = end_index * 8 + 1024
                    while start_index < end_index:
                        content_offset, = _unpack_V(local_binary[start_index + 4: start_index + 7] +
                                                    chr(0).encode('utf-8'))
                        content_length, = _unpack_C(local_binary[start_index + 7])
                        content_offset = local_offset + content_offset - 1024
                        content = local_binary0[content_offset:content_offset + content_length]
                        if middle_content != content and middle_content is not None:
                            contents = middle_content.split('\t')
                            lis.append((middle_ip, (contents[0], self.lookup_country_code(contents[0]),
                                                    contents[1], self.lookup_china_province_code(contents[1]),
                                                    contents[2], self.lookup_china_city_code(contents[2]),
                                                    contents[3], contents[4])))
                        middle_content, = content,
                        middle_ip = inet_ntoa(local_binary[start_index:start_index + 4])
                        start_index += 8
                    self.ip_dict[ipdot0] = self.generate_tree(lis)
                    ipdot0 -= 1
        except Exception as ex:
            print "cannot open file %s: %s" % (file, ex)
            print ex.message
            exit(0)

    def lookup_country(self, country_code):
        try:
            for item_country, item_country_code in self.country_codes.items():
                if country_code == item_country_code:
                    return item_country, item_country_code
            return 'None', 'None'
        except KeyError:
            return 'None', 'None'

    def lookup_country_code(self, country):
        try:
            return self.country_codes[country]
        except KeyError:
            return 'None'

    def lookup_china_province(self, province_code):
        try:
            for item_province, item_province_code, in self.china_province_codes.items():
                if province_code == item_province_code:
                    return item_province, item_province_code
            return 'None', 'None'
        except KeyError:
            return 'None', 'None'

    def lookup_china_province_code(self, province):
        try:
            return self.china_province_codes[province.encode('utf-8')]
        except KeyError:
            return 'None'

    def lookup_china_city(self, city_code):
        try:
            for item_city, item_city_code in self.china_city_codes.items():
                if city_code == item_city_code:
                    return item_city, item_city_code
            return 'None', 'None'
        except KeyError:
            return 'None', 'None'

    def lookup_china_city_code(self, city):
        try:
            return self.china_city_codes[city]
        except KeyError:
            return 'None'

    def lookup(self, ip):
        ipdot = ip.split('.')
        ipdot0 = int(ipdot[0])
        if ipdot0 < 0 or ipdot0 > 255 or len(ipdot) != 4:
            return None
        try:
            d = self.ip_dict[int(ipdot[0])]
        except KeyError:
            return None
        if d is not None:
            return self.lookup1(inet_aton(ip), d)
        else:
            return None

    def lookup1(self, net_ip, (net_ip1, content, lefts, rights)):
        if net_ip < net_ip1:
            if lefts is None:
                return content
            else:
                return self.lookup1(net_ip, lefts)
        elif net_ip > net_ip1:
            if rights is None:
                return content
            else:
                return self.lookup1(net_ip, rights)
        else:
            return content

    def generate_tree(self, ip_list):
        length = len(ip_list)
        if length > 1:
            lefts = ip_list[:length / 2]
            rights = ip_list[length / 2:]
            (ip, content) = lefts[length / 2 - 1]
            return inet_aton(ip), content, self.generate_tree(lefts), self.generate_tree(rights)
        elif length == 1:
            (ip, content) = ip_list[0]
            return inet_aton(ip), content, None, None
        else:
            return

if __name__ == "__main__":
    import sys

    reload(sys)
    sys.setdefaultencoding('utf-8')
    ip_tree = IpTree()
    ip_tree.load_country_codes("doc/country_list.txt")
    ip_tree.load_china_province_codes("doc/china_province_code.txt")
    ip_tree.load_china_city_codes("doc/china_city_code.txt")
    ip_tree.loadfile("doc/mydata4vipday2.dat")
    print ip_tree.lookup('123.12.23.45')

http request

Provide ip Query service GET Request and POST request

@ip_app.route('/api/ip_query', methods=['POST'])
def ip_query():
    try:
        ip = request.json['ip']
    except KeyError as e:
        raise InvalidUsage('bad request: no key ip in your request json body. {}'.format(e), status_code=400)
    if not is_ip(ip):
        raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
    try:
        res = ip_tree.lookup(ip)
    except Exception as e:
        raise InvalidUsage('internal error: {}'.format(e), status_code=500)
    if res is not None:
        return jsonify(res)
    else:
        raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)


@ip_app.route('/api/ip_query', methods=['GET'])
def ip_query_get():
    try:
        ip = request.values.get('ip')
    except ValueError as e:
        raise InvalidUsage('bad request: no param ip in your request. {}'.format(e), status_code=400)
    if not is_ip(ip):
        raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
    try:
        res = ip_tree.lookup(ip)
    except Exception as e:
        raise InvalidUsage('internal error: {}'.format(e), status_code=500)
    if res is not None:
        return jsonify(res)
    else:
        raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)

POST The request needs to contain something like the following in the body of the request json Field

{
	"ip": "165.118.213.9"
}

GET The request is in the form of :http://127.0.0.1:5000/api/ip_query?ip=165.118.213.9

Service deployment

Install dependency library

Dependent Library requirements.txt as follows :

certifi==2017.7.27.1
chardet==3.0.4
click==6.7
Flask==0.12.2
gevent==1.1.1
greenlet==0.4.12
gunicorn==19.7.1
idna==2.5
itsdangerous==0.24
Jinja2==2.9.6
locustio==0.7.5
MarkupSafe==1.0
meld3==1.0.2
msgpack-python==0.4.8
requests==2.18.3
supervisor==3.3.3
urllib3==1.22
Werkzeug==0.12.2

Installation method :pip install -r requirements.txt

To configure supervisor

vim /etc/supervisor/conf.d/ip_query_http_service.conf, The contents are as follows

[program:ip_query_http_service]
directory = /root/qk_python/ip_query
command = gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent
autostart = true
startsecs = 5
autorestart = true
startretries = 3
user = root
stdout_logfile=/root/qk_python/ip_query/log/gunicorn.log
stderr_logfile=/root/qk_python/ip_query/log/gunicorn.err

After the content is added , Need to create stdout_logfile and stderr_logfile These two directories , otherwise supervisor Start will report an error . And then update supervisor start-up ip_query_http_service process .

#  start-up supervisor
supervisord -c /etc/supervisor/supervisord.conf	

#  to update supervisor service 
supervisorctl update

About supervisor See the resources at the end of the article for common operations .

install nginx

If it is in the form of soft load, it needs to be installed nginx, Compilation and installation nginx See the resources at the end of the article .

To configure nginx

vim /usr/local/nginx/nginx.conf, Modify the configuration file as follows :

#user  nobody;
#nginx Number of processes , It is recommended to set equal to CPU Total core number .
worker_processes  4;
#error_log  logs/error.log;
#error_log  logs/error.log  notice;
# Global error log definition type ,[ debug | info | notice | warn | error | crit ]
error_log  logs/error.log  info;
# Process documents 
pid        logs/nginx.pid;
# One nginx The maximum number of file descriptors opened by the process , The theoretical value should be the maximum number of open files ( Value of the system ulimit -n) And nginx Divide the number of processes , however nginx Allocation requests are not even , So suggestions and ulimit -n Consistent values for .
worker_rlimit_nofile 65535;
events {
    # Refer to the event model  linux  Next use epoll
    use epoll;
    # Maximum connections per process ( maximum connection = The number of connections * Number of processes )
    worker_connections  65535;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  logs/access.log  main;
    sendfile        on;
    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nopush on; # Prevent network congestion 
    tcp_nodelay on; # Prevent network congestion 
    #gzip  on;
    server {
		# The proxy port provided by the convergence service is configured here .
        listen       9000;
        server_name  localhost;
        #charset koi8-r;
        #access_log  logs/host.access.log  main;
        location / {
            #            root   html;
            #            index  index.html index.htm;
            proxy_pass http://127.0.0.1:8000;
            proxy_redirect off;
            proxy_set_header X-Real-IP $remote_addr;
            # Back end Web The server can go through X-Forwarded-For Get the user's reality IP
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $host;
            client_max_body_size 10m; # Maximum number of single file bytes allowed for client requests 
            client_body_buffer_size 128k; # The maximum number of bytes requested by the buffer agent to buffer the client ,
            proxy_buffer_size 4k; # Setting up a proxy server (nginx) Buffer size to hold user header information 
            proxy_temp_file_write_size 64k;       # Set cache folder size , More than that , Will be taken from upstream Server transfer 
        }

        #error_page  404              /404.html;
        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }       
    }
}

Pressure test

Do a stress test , Choosing the right tool is the premise . In the following tools ,jmeter Running on the windows There are many machines , Other tool suggestions run in *nix On the machine .

Stress testing tool selection

Tool name Advantages and disadvantages Suggest
ApacheBench(ab) The command is easy to use , Efficient , The statistical information is perfect , Pressure machine memory pressure is small recommend
locust python To write , Low efficiency , Limited by GIL, You need to write python The test script Not recommended
wrk The command is easy to use , Efficient , Statistical information is refined , Few pits , Report less mistakes Most recommended
jmeter be based on java,Apache Open source , Graphical interface , Easy to operate recommend
webbench Easy to use , But not supported POST request commonly
tsung erlang To write , There are many configuration templates , More complicated Not recommended

All the six tools mentioned above have been used in person , Next choice ab、wrk、jmeter The three tools briefly explain how to install and use , How to use other tools if necessary , On their own google

ab

install

apt-get install apache2-utils 

common options

option meaning
-r When receiving socket Wrong time ab Do not exit
-t The maximum time to send a request
-c Concurrency number , Number of requests constructed at one time
-n Number of requests sent
-p postfile, The specified contains post Data files
-T content-type, Appoint post and put The type of request body when sending a request

Use

test GET request

ab -r -t 120 -c 5000 http://127.0.0.1:8080/api/ip_query?ip=165.118.213.9

test POST request

ab -r -t 120 -c 5000 -p /tmp/post_data.txt -T 'application/json' http://127.0.0.1:8080/api/ip_query

among /tmp/post_data.txt The content of the document is to be sent -T Data in specified format , Here is json Format

{"ip": "125.118.213.9"}

wrk

http://www.restran.net/2016/09/27/wrk-http-benchmark/

install

apt-get install libssl-dev
git clone https://github.com/wg/wrk.git
cd wrk
make
cp wrk /usr/sbin

common options

option meaning
-c Number of open connections , That is, concurrent number
-d Stress testing time : The maximum time to send a request
-t The number of threads used by the pressure machine
-s Specify the... To load lua Script
--latency Print delay statistics

Use

test GET request

wrk -t10 -c5000 -d120s --latency http://127.0.0.1:8080/api/ip_query?ip=165.118.213.9

test POST request

wrk -t50 -c5000 -d120s --latency -s /tmp/wrk_post.lua http://127.0.0.1:8080

among /tmp/wrk_post.lua The contents of the file are to be loaded lua Script , Appoint post Of path,header,body

request = function()
  path = "/api/ip_query"
  wrk.headers["Content-Type"] = "application/json"
  wrk.body = "{\"ip\":\"125.118.213.9\"}"
  return wrk.format("POST", path)
end

jmeter

install

install jmeter Installation is required jdk1.8. And then in Apache The official website can download jmeter, Download this

Use

xmind-jmeter

The above picture is from a test bull , Very detailed , complete xmind For file download, see :jmeter- Zhang Bei .xmind

jmeter You can also refer to the resources section at the end of the article : Use Apache Jmeter Do concurrent stress testing

Analysis of stress test results

wrk GET Request pressure test results

root@ubuntu:/tmp# wrk -t10 -c5000 -d60s --latency http://127.0.0.1:8080/api/ip_query?ip=165.118.213.9
Running 1m test @ http://127.0.0.1:8080/api/ip_query?ip=165.118.213.9
  10 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   897.19ms  322.83ms   1.99s    70.52%
    Req/Sec   318.80    206.03     2.14k    68.84%
  Latency Distribution
     50%  915.29ms
     75%    1.11s 
     90%    1.29s 
     99%    1.57s 
  187029 requests in 1.00m, 51.01MB read
  Socket errors: connect 0, read 0, write 0, timeout 38
Requests/sec:   3113.27
Transfer/sec:    869.53KB

ab GET Request pressure test results

root@ubuntu:/tmp# ab -r -t 60 -c 5000 http://127.0.0.1:8080/api/ip_query?ip=165.118.213.9
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, https://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests


Server Software:        gunicorn/19.7.1
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /api/ip_query?ip=165.118.213.9
Document Length:        128 bytes

Concurrency Level:      5000
Time taken for tests:   19.617 seconds
Complete requests:      50000
Failed requests:        2
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 1)
Total transferred:      14050000 bytes
HTML transferred:       6400000 bytes
Requests per second:    2548.85 [#/sec] (mean)
Time per request:       1961.668 [ms] (mean)
Time per request:       0.392 [ms] (mean, across all concurrent requests)
Transfer rate:          699.44 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  597 1671.8      4   15500
Processing:     4  224 201.4    173    3013
Waiting:        4  223 200.1    172    2873
Total:          7  821 1694.4    236   15914

Percentage of the requests served within a certain time (ms)
  50%    236
  66%    383
  75%   1049
  80%   1155
  90%   1476
  95%   3295
  98%   7347
  99%   7551
 100%  15914 (longest request)

jmeter GET Request pressure test results

Result analysis

The pressure test results of the above three tools are basically the same ,RPS(Requests per second) It's about 3000 about , At this time, the machine is configured as 4 nucleus 4G Memory , also gunicorn opened 10 individual worker, Memory footprint 3.2G. A single machine has only 3000 Concurrent , For machines with this configuration , Further analysis of the cause is needed . And then we'll get another machine , After load balancing, we can achieve 5000 The above can meet the requirements of use .

Notes on pressure test

Number of open files

During the pressure test, there is a general requirement for the opening number of documents of the pressure machine , More than that 1024 individual open files, Need to increase the linux The number of open files in the system , Increase method :

#  Number of open files 
ulimit -a
#  Modify the number of open files 
ulimit -n 500000

SYN Flood attack protection

linux There is a parameter in the system :/etc/sysctl.conf In the configuration file net.ipv4.tcp_syncookies Field . The default value of this field is 1, Indicates that the system will detect SYN Flood attack , And turn on the protection . Therefore, during the pressure measurement , If you send a request for a large amount of repetitive data , Pressure machine SYN After enabling queue overflow SYN cookie, As a result, there will be a large number of request timeout failures . Alibaba cloud's load balancing has SYN Flood attack detection and DDos Attack detection function , So there are two things you need to pay attention to when doing stress testing :

  • When testing, turn off the load balancing machine properly net.ipv4.tcp_syncookies Field
  • When making data, try to avoid a lot of repetitive data , To avoid being identified as an attack .

gunicorn Introduction and tuning

About gunicorn You can refer to the test report for your choice :Python WSGI Server Performance analysis

In selected gunicorn As WSGI server after , You need to choose the appropriate one according to the machine worker The quantity and each of them worker Of worker-class.

worker Quantity selection

every last worker Run as a separate child process , All hold a separate memory data , Every increase or decrease of one worker, There is a significant multiple change in system memory . At first, a single machine gunicorn Turn on 3 individual worker, The system only supports 1000RPS Concurrent . When put worker Expand to 9 After , System support 3000RPS Concurrent . So when there's enough memory , You can add worker Number .

worker-class choice

You can refer to the reference at the end of the article gunicorn Commonly used settings and Gunicorn several Worker class Performance test comparison These two articles .

take gunicorn At startup worker-class From the default sync Change to gevent after , System RPS Double directly .

worker-class worker Number ab The test of RPS
sync 3 573.90
gevent 3 1011.84

gevent rely on :gevent >= 0.13. So you need to use pip install . Corresponding gunicorn start-up flask The applied command needs to be modified to :

gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent

Improvements

improvement ip Database accuracy

Lose efficiency for accuracy : Use a single ip There will be some ip Can't find out the result , And abroad ip Generally, it can only be accurate to the country . It can balance several ip The accuracy and coverage of the database , When you can't find out the exact address information, go to the other ip database .

Increase the concurrency of a single machine

From initiating a request , To WSGI The server processes , To the application interface , To ip Querying each process requires a separate analysis of the amount of executable per second , Then analyze the bottleneck of the system , Fundamentally improve the concurrency of single machine .


Reference material


Remember to give me some praise !

Carefully organized the computer in all directions from the entry 、 Advanced 、 Practical video courses and e-books , Classify according to the catalogue , Always find the learning materials you need , What are you waiting for ? Pay attention to downloads !!!

resource-introduce

Not forget for a moment , There must be an echo , Guys, please give me a compliment , Thank you very much .

I'm a bright brother in the workplace ,YY Senior software engineer 、 Four years working experience , The slash programmer who refuses to be the leader of salted fish .

Listen to me , More progress , Program life is a shuttle

If I'm lucky enough to help you , Please order one for me 【 Fabulous 】, Pay attention , If you can give me a little encouragement with your comments , Thank you very much .

A list of articles by Liang Ge in the workplace : More articles

wechat-platform-guide-attention

All my articles 、 The answers are all in cooperation with the copyright protection platform , The copyright belongs to brother Liang , unaccredited , Reprint must be investigated !

版权声明
本文为[Bright brother in the workplace]所创,转载请带上原文链接,感谢

Scroll to Top