HAProxy随机HTTP 503错误
我们已经设置了3台服务器:
We've setup 3 servers:
- 具有Nginx + HAproxy的服务器A执行负载平衡
- 后端服务器B
- 后端服务器C
这是我们的/etc/haproxy/haproxy.cfg
:
global
log /dev/log local0
log 127.0.0.1 local1 notice
maxconn 40096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 50000
clitimeout 50000
srvtimeout 50000
stats enable
stats uri /lb?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
listen statslb :5054 # choose different names for the 2 nodes
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth admin:admin
listen Server-A 0.0.0.0:80
mode http
balance roundrobin
cookie JSESSIONID prefix
option httpchk HEAD /check.txt HTTP/1.0
server Server-B <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 2
server Server-C <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 3
这三台服务器都具有大量的RAM和CPU内核来处理请求
All of the three servers have a good amount of RAM and CPU cores to handle requests
浏览时显示随机HTTP 503错误:503 Service Unavailable - No server is available to handle this request.
Random HTTP 503 errors are shown when browsing: 503 Service Unavailable - No server is available to handle this request.
并且也在服务器的控制台上:
And also on server's console:
Message from syslogd@server-a at Dec 21 18:27:20 ...
haproxy[1650]: proxy Server-A has no server available!
请注意,90%的时间没有错误.这些错误是随机发生的.
Note that 90% times of the time there is no errors. These errors happens randomly.
我遇到了同样的问题.经过数天的梳理,我发现了问题.
I had the same issue. After days of pulling my hair out I found the issue.
我有两个HAProxy实例正在运行.一个是僵尸,它可能在更新或代理重启期间从未被杀死.我在刷新/haproxy stats页面时注意到了这一点,并且PID会在两个不同的数字之间变化.带有数字之一的页面具有荒谬的连接统计信息.确认我做了
I had two HAProxy instances running. One was a zombie that somehow never got killed during maybe an update or a haproxy restart. I noticed this when refreshing the /haproxy stats page and the PID would change between two different numbers. The page with one of the numbers had absurd connection stats. To confirm I did
netstat -tulpn | grep 80
,看到两个haproxy进程正在监听端口80.
and saw two haproxy processes listening to port 80.
为解决此问题,我进行了一次杀死xxxx"操作,其中xxxx是具有可疑统计信息的pid.
To fix the issue I did a "kill xxxx" where xxxx is the pid with the suspicious statistics.