HAProxy随机HTTP 503错误

HAProxy随机HTTP 503错误

问题描述:

我们已经设置了3台服务器:

We've setup 3 servers:

  • 具有Nginx + HAproxy的服务器A执行负载平衡
  • 后端服务器B
  • 后端服务器C

这是我们的/etc/haproxy/haproxy.cfg:

global
        log /dev/log   local0
        log 127.0.0.1   local1 notice
        maxconn 40096
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        contimeout      50000
        clitimeout      50000
        srvtimeout      50000
                stats enable
                stats uri /lb?stats
                stats realm Haproxy\ Statistics
                stats auth admin:admin
listen statslb :5054 # choose different names for the 2 nodes
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /
        stats auth admin:admin

listen  Server-A 0.0.0.0:80    
        mode http
        balance roundrobin
        cookie JSESSIONID prefix
        option httpchk HEAD /check.txt HTTP/1.0
        server  Server-B <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 2
        server  Server-C <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 3

这三台服务器都具有大量的RAM和CPU内核来处理请求

All of the three servers have a good amount of RAM and CPU cores to handle requests

浏览时显示随机HTTP 503错误:503 Service Unavailable - No server is available to handle this request.

Random HTTP 503 errors are shown when browsing: 503 Service Unavailable - No server is available to handle this request.

并且也在服务器的控制台上:

And also on server's console:

Message from syslogd@server-a at Dec 21 18:27:20 ...
 haproxy[1650]: proxy Server-A has no server available!

请注意,90%的时间没有错误.这些错误是随机发生的.

Note that 90% times of the time there is no errors. These errors happens randomly.

我遇到了同样的问题.经过数天的梳理,我发现了问题.

I had the same issue. After days of pulling my hair out I found the issue.

我有两个HAProxy实例正在运行.一个是僵尸,它可能在更新或代理重启期间从未被杀死.我在刷新/haproxy stats页面时注意到了这一点,并且PID会在两个不同的数字之间变化.带有数字之一的页面具有荒谬的连接统计信息.确认我做了

I had two HAProxy instances running. One was a zombie that somehow never got killed during maybe an update or a haproxy restart. I noticed this when refreshing the /haproxy stats page and the PID would change between two different numbers. The page with one of the numbers had absurd connection stats. To confirm I did

netstat -tulpn | grep 80

,看到两个haproxy进程正在监听端口80.

and saw two haproxy processes listening to port 80.

为解决此问题,我进行了一次杀死xxxx"操作,其中xxxx是具有可疑统计信息的pid.

To fix the issue I did a "kill xxxx" where xxxx is the pid with the suspicious statistics.