静态内容在上游超时(110:连接超时)?

问题描述:

我遇到这样的情况,其中两个Web服务器都使用nginx作为负载均衡器进行了设置,并且本身就是后端.发行人是Debian Wheezy.两台服务器上的配置都相同(具有32GB RAM的四核)

I've a situation where two webservers are setup with nginx as loadbalancer and are backends themselves. Distribution is Debian Wheezy. Config is the same on both servers (Quad-Core with 32GB RAM)

TCP

#/etc/sysctl.conf
vm.swappiness=0
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.ip_local_port_range=2000 65535
net.ipv4.tcp_max_syn_backlog=65535
net.core.somaxconn=65535
net.ipv4.tcp_max_tw_buckets=2000000
net.core.netdev_max_backlog=65535
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.core.rmem_default=8388608
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 16384 16777216
net.ipv4.tcp_congestion_control=cubic
net.ipv4.tcp_tw_reuse=1
fs.file-max=3000000

Nginx

#/etc/nginx/nginx.conf
user www-data www-data;
worker_processes 8;
worker_rlimit_nofile 300000;
pid /run/nginx.pid;

events {
        worker_connections 8192;
        use epoll;
        #multi_accept on;
}
http {
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 10;
        types_hash_max_size 2048;
        server_tokens off;

        open_file_cache max=200000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 5;
        open_file_cache_errors on;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
        gzip_min_length 10240;
        gzip_disable "MSIE [1-6]\.";
}

server {
    listen <PUBLIC-IPv4>:8080 default_server;
    listen <PUBLIC-IPv6>:8080 default_server;
    listen 127.0.0.1:8080 default_server;
    listen [::1]:8080 default_server;
    server_name backend01.example.com;
    access_log /var/log/nginx/access upstream;
    error_log /var/log/nginx/error;

    root /var/www/project/web;
    index app.php;
    error_page 500 501 502 503 504 505 /50x.html;
    client_max_body_size 8m;

    location ~ /\. { return 403; }
    try_files $uri $uri/ /app.php?$query_string;
    location ~ ^/(config|app_dev|app)\.php(/|$) {
        include fastcgi_params;
        # fastcgi_split_path_info ^(.+\.php)(/.*)$;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass_header Authorization;
        fastcgi_buffers 16 16k;
        fastcgi_buffer_size 32k;
        fastcgi_param HTTPS on;
    }
}

upstream www {
    ip_hash;
    server [::1]:8080;
    server backend02:8080;
}

server {
    listen <LOADBALANCER-IPv4>:443 ssl spdy;
    server_name www.example.com;
    access_log /var/log/nginx/access main;
    error_log /var/log/nginx/error;

    ssl                  on;
    ssl_certificate      /etc/ssl/example.com.crt;
    ssl_certificate_key  /etc/ssl/example.com.key;
    ssl_protocols        TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers          ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
    ssl_session_cache    shared:SSL:20m;
    ssl_session_timeout  10m;

    root /var/www/project/web;
    error_page 500 501 502 503 504 505 /50x.html;
    client_max_body_size 8m;

    location /00_templates { return 403; }
    location / {
        proxy_read_timeout 300;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_pass http://www;
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}

使用

ab -c 200 -n 40000 -q https://www.example.com/static/file.html

我为什么得到

upstream timed out (110: Connection timed out) while connecting to upstream

在Nginx日志中?静态文件的600个并发连接的上游超时! 运行ab测试时,我可以在第一个后端节点上看到:

in nginx log? An upstream timeout for 600 concurrent connections for a static file!? While running ab test I can see on first backend node:

# netstat -tan | grep ':8080 ' | awk '{print $6}' | sort | uniq -c
      2 LISTEN
     55 SYN_SENT
  37346 TIME_WAIT

好的,我不喜欢阅读手册,但要回答我的问题:

OK, I didn't like reading manuals, but to answer my question:

nginx在请求后关闭上游连接

解决了.那么问题是什么:我已经将上游配置为使用keepalive,但是Nginx doc建议在代理位置设置以下选项:

solved it. So what was the problem: I've configured upstream to use keepalive but Nginx doc suggests to set following options in proxy location:

    proxy_http_version 1.1;
    proxy_set_header Connection "";

就是这样,后端的数千个TIME_WAIT连接都消失了,现在只有大约150个,而不是30-40k.

That's it and thousand of TIME_WAIT connections in backend are gone, there are only round about 150 now instead of 30-40k.