编程知识 cdmana.com

How to build a high performance server (take nginx as an example)

methodology

Software level

increase CPU utilization
  • Use all CPU, worker The number of processes equals CPU

  • No useless switching between processes

  • Don't give up when you are busy CPU

  • worker There is no competition between processes CPU

  • CPU Switching needs 5us, If a large number of processes need to switch , be CPU It will waste a lot of time switching , Do useless work

  • worker Process binding CPU

    pidstat -w You can see how many times a process has switched

  • Not being robbed of resources by other processes

  • Increase process priority , Get bigger CPU Time slice

  • Reduce other processes

  • Reduce the number of surprises

    scene : Multiple worker process accept On the same port

    • Default accept_mutex on

      Multiple worker Process scramble lock to get connection , At the same time, there is only one worker Obtain a connection

    • accept_mutex off

      A connection request wakes up all worker process , At the same time, there is only one worker Obtain a connection , There's a panic problem , When worker When the number of processes is small , The impact is not big , Less fight lock , High concurrency can improve the system response capability

    • SO_REUSEPORT

      kernel 3.9+ Deal with new features of large concurrent connections , After opening , Connections are allocated through the kernel worker process , Best performance

  • Improve CPU cache hit rate

    binding worker To designate CPU: worker_cpu_affinity cpumask...

Increase memory utilization
  • Use tcmalloc

    Reduce memory fragmentation

    Concurrency is higher than glibc, The more concurrent , The better the performance is ( Small memory allocation )

    google-perftools/.../tcmalloc.html

    You need to manually compile to nginx

increase IO utilization
  • contrast

    Mechanical drive

    • The price is low
    • Large storage capacity
    • BPS Big , Suitable for sequential reading and writing
    • IOPS Small , Not suitable for random reading and writing
    • Long life

    Solid state should be fat

    • high price
    • Small storage
    • BPS Big
    • IOPS Big
    • Write short life
  • Optimize reading

    • sendfile Zero copy

      Files are directly from kernel state files to socket The transfer

      location /video/{
      	sendfile on;
      	aio on;
      	directio 8m;
      }
      
    • gzip_static

      Compress files ahead of time , To speed up the gzip Return of message

    • Memory disk /SSD disc

  • Reduce write

    • empty_gif

      Use to return a piece of 1*1 Blank picture of , In order to reduce http The return message length of

    • AIO

      On disk reading and writing , Processes can handle other things

      aio on|off|threads=[pool]

    • direct IO, Reduce the read and write of cache once

      directio size|off exceed size Use direct io, Suitable for big documents

    • increase error_log Level

    • error.log Output to memory

      error_log memory: 32m debug

      Log in 32m The memory of is output in a loop , You can only see 32m Debug log of , Can improve performance

    • close access_log

    • Compress access_log

      access_log path [format] [gzip]

    • Open or not proxy buffering

    • syslog Replace local io

      Use UDP Write instead of io write in , Improve performance

  • Thread pool thread pool

    When certain io When it's going to block , Use thread pool

Increase the utilization of broadband network
  • syn Retry count

    net.ipv4.tcp_syn_retries = 6

  • Local port available range

    net.ipv4.ip_local_port_range=32768 60999

    It can enlarge

  • Connection timeout

    proxy_connect_timeout

  • Maximum number of receive connections (syn Handshake not complete )

    net.ipv4.tcp_max_syn_backlog = 262144

    You can zoom in properly

  • Handshake completed

    net.core.somaxconn: The system is the largest backlog The queue length

  • If the queue is exceeded, the message can be received and returned directly RST

    net.ipv4.tcp_abort_on_overflow

  • Message queue length that is not processed by the kernel

    net.core.netdev_max_backlog

  • syn/ack Retry count

    net.ipv4.tcp_synack_retries

  • Handle syn attack

    net.ipv4.tcp_syncookies=1

    When syn When the queue is full , new syn Not in the queue , To calculate the cookie Back to the client , The client carries cookie Reconnect the , Server authentication cookie, Through the establishment of a connection . Back to cause TCP The optional function fails , For example, expand the window / Time stamps, etc

  • Operating system maximum handle

    fs.file-max: The operating system can use the maximum number of handles

    Use fs.file-nr You can see the currently assigned / Is using / ceiling

  • Maximum number of user handles

    /etc/security/limits.conf

    root soft nofile 63535

    root har nofile 65535

  • Process limits the maximum number of handles

    worker_rlimit_nofile number

  • Process maximum connections

    worker_connections number

  • Tcp Fast Open

    When TCP When you connect again , By carrying cookie, One less time syn/ack Of rtt Time , To achieve rapid establishment TCP The purpose of the connection

nginxperformace-tfo
net.ipv4.tcp_fastopen 0|1|2|3

listen address [:port] [fastopen=number];

fastopen=number In order to prevent syn attack , Limit maximum length , Appoint TFO Maximum length of connection queue

  • TCP buffer

    net.ipv4.tcp_rmen = 4096 87380 6291456

    net.ipv4.tcp_wmen = 4096 87380 6291456

    net.ipv4.tcp_men = 1541646 2055528 3083292

    net.ipv4.tcp_moderate_rcvbuf=1 Turn on auto adjust cache mode

    listen address [:port] [recvbuf=size] [sndbuf=size]

    net.ipv4.tcp_adv_win_scale = 1

    Application cache = buffer / (2^tcp_adv_win_scale)

    Receiving window = buffer - buffer/(2^tcp_adv_win_scale)

    BDP = bandwidth * RTT/2

    buffer=BDP

  • Nagle Algorithm

    There is only one unconfirmed tabloid in the network ACK

    Purpose : Avoid a large number of tabloids on a connection , Improve network utilization

    Throughput priority : Enable Nagle tcp_nodelay off

    Low latency first : Ban Nagle tcp_nodelay on

  • Congestion window

    Actual flow rate = Minimum value of congestion window and receiving window

  • Slow start

    Exponential expansion congestion window cwnd = cwnd*2

  • Congestion avoidance

    Window greater than threshold Linear increase

  • Congestion occurs

    Packet loss ,

    RTO Overtime ,threshold = cwnd/2, cwnd=1

    Fast Retransmit: cwnd=cwnd/2, threshold=cwnd

  • Fast recovery

    When Fast Retransmit When it appears ,cwnd Adjusted for threshold+3*MSS

  • Optimize slow start

    Increase the initial cwnd=10

  • TCP keep-alive

    Turn on keepalive Can detect lost connections socket, And turn off immediately , Save system resources

    net.ipv4.tcp_keepalive_time = 7200

    net.ipv4.tcp_keepalive_intvl = 75

    net.ipv4.tcp_keepalive_probes = 9

  • timewait

    net.ipv4.tcp_orphan_retries = 0

    net.ipv4.tcp_fin_timeout = 60

    net.ipv4.tcp_max_tw_buckets = 262144 Maximum timewait The number of connections , Beyond direct closing connection

  • lingering_close Delayed closure

    When the receive buffer still receives the client's content , If the server sends it now RST Close the connection , Can cause client to receive RST And ignore http response

    lingering_close off|on|always

    reset_timedout_connection on|off; When read and write timeout takes effect, the connection is closed according to law , By sending RST Close the connection now , To release resources

  • TLS/SLL Optimize handshake

    ssl_session_cache

  • TLS/SSL Session ticket tickets

    Nginx Session session Information in as tickets Encryption sent to the client , Bring it with the client when the connection is established again tickets,Nignx Verify reuse session

    The advantages can reduce the number of symmetric encryption and decryption , Improve performance

    Disadvantages reduce security , It needs to be replaced frequently tickets secret key

    ssl_seesion_tickets on|off

    ssl_session_ticket_key file

  • Use HTTP A long connection

    keepalive_request number;

  • gzip Compress

    Improve network transmission efficiency

    gzip on|off

  • Use http2

Statistics function call statistics

google-perltool

pprof --text|pdf

goodle_perftools_profiles file

Hardware

  • network card : Wan Zhao nic , for example 10G/25G/40G
  • disk : Solid state disk , Focus on IOPS/BPS indicators
  • CPU: Faster master frequency , Bigger cache , Better architecture
  • Memory : Faster access speed

DNS

版权声明
本文为[Tan Yingzhi]所创,转载请带上原文链接,感谢

Scroll to Top