如何診斷 Ubuntu 16 上的 RabbitMQ 崩潰問題?
當我運行sudo service rabbitmq-server status
它時報告:
● rabbitmq-server.service - RabbitMQ Messaging Server
Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Wed 2018-03-21 19:44:18 UTC; 19min ago
Process: 1100 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=killed, signal=TERM)
Process: 1099 ExecStart=/usr/sbin/rabbitmq-server (code=killed, signal=TERM)
Main PID: 1099 (code=killed, signal=TERM)
暗示它已崩潰或無法啟動。然而,當我運行 htop 時,我看到了幾十個 erlang 和beam.smp
進程,它們是由 Rabbit 啟動的。
此外,當我重新啟動 Rabbit 時,sudo service rabbitmq-server restart
它掛起大約五分鐘,然後最終返回:
Job for rabbitmq-server.service failed because a timeout was exceeded. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
當我跑步時,journalctl -xe
我會看到大量訊息,例如:
Mar 21 20:07:48 server1 postfix/error[3719]: 280524B3A: to=<[email protected]>, orig_to=<root>, relay=none, delay=101268, delays=101268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspende
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2D046FAC: from=<>, size=3126, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2D8AD474F: from=<[email protected]>, size=751, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3712]: 2ED9D499A: to=<[email protected]>, orig_to=<root>, relay=none, delay=155868, delays=155868/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspende
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2EBCF3D40: from=<>, size=3128, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3706]: 2D8AD474F: to=<[email protected]>, orig_to=<root>, relay=none, delay=38268, delays=38268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended:
Mar 21 20:07:48 server1 postfix/error[3716]: 2D046FAC: to=<[email protected]>, relay=none, delay=76240, delays=76240/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to porta
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2C9DE3945: from=<>, size=3134, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2AA2A48B3: from=<[email protected]>, size=751, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3717]: 2C9DE3945: to=<[email protected]>, relay=none, delay=399644, delays=399644/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to po
Mar 21 20:07:48 server1 postfix/error[3701]: 2EBCF3D40: to=<[email protected]>, relay=none, delay=181242, delays=181242/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to po
Mar 21 20:07:48 server1 postfix/error[3712]: 2AA2A48B3: to=<[email protected]>, orig_to=<root>, relay=none, delay=59268, delays=59268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended:
我的結論是否正確:Rabbit 試圖發送大量電子郵件,被阻止,隨後崩潰?為什麼是這樣?
答案1
我用以下方法修復了它:
sudo killall rabbitmq-server
sudo killall beam.smp
sudo rm -Rf /var/lib/rabbitmq/mnesia/*
sudo service rabbitmq-server start
我還必須重新添加我的用戶配置,但除此之外,它又恢復了。
答案2
這似乎不是一次“崩潰”……更像是由於問題而正常關閉。顯然,服務超時了。我假設這是因為它無法連接到遠端訊息伺服器。您發布的「電子郵件」表示它試圖發送失敗的電子郵件通知...這可能也意味著 postfix 郵件伺服器未配置為在盒子外中繼訊息。