¿Por qué nginx se reinicia automáticamente (y no lo hace)?

¿Por qué nginx se reinicia automáticamente (y no lo hace)?

De vez en cuando, por alguna razón que no entiendo (soy un administrador muy aficionado), nginx se reinicia automáticamente, pero no lo hace, lo que resulta en una interrupción del servicio. Sucedió esta mañana:

$ journalctl -u nginx
-- Logs begin at Mon 2018-09-03 11:46:24 CEST, end at Tue 2018-09-04 09:30:22 CEST. --
Sep 04 07:55:21 vpsXXXXXX.ovh.net systemd[1]: Stopping A high performance web server and a reverse proxy server...
Sep 04 07:55:21 vpsXXXXXX.ovh.net systemd[1]: Stopped A high performance web server and a reverse proxy server.
Sep 04 07:55:27 vpsXXXXXX.ovh.net systemd[1]: Starting A high performance web server and a reverse proxy server...
Sep 04 07:55:27 vpsXXXXXX.ovh.net nginx[29333]: nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_siz
Sep 04 07:55:27 vpsXXXXXX.ovh.net nginx[29335]: nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_siz
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:5281 failed (98: Address already in use)
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Sep 04 07:55:28 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:5281 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:5281 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Sep 04 07:55:29 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:5281 failed (98: Address already in use)
Sep 04 07:55:30 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Sep 04 07:55:30 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Sep 04 07:55:30 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] bind() to 0.0.0.0:5281 failed (98: Address already in use)
Sep 04 07:55:30 vpsXXXXXX.ovh.net nginx[29335]: nginx: [emerg] still could not bind()
Sep 04 07:55:30 vpsXXXXXX.ovh.net systemd[1]: nginx.service: Control process exited, code=exited status=1
Sep 04 07:55:30 vpsXXXXXX.ovh.net systemd[1]: Failed to start A high performance web server and a reverse proxy server.
Sep 04 07:55:30 vpsXXXXXX.ovh.net systemd[1]: nginx.service: Unit entered failed state.
Sep 04 07:55:30 vpsXXXXXX.ovh.net systemd[1]: nginx.service: Failed with result 'exit-code'.

1h30 después me doy cuenta de que mi servidor web no funciona, así que todo systemctl restart nginxfunciona bien.

Sep 04 09:23:48 vpsXXXXXX.ovh.net systemd[1]: Starting A high performance web server and a reverse proxy server...
Sep 04 09:23:48 vpsXXXXXX.ovh.net nginx[30003]: nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_siz
Sep 04 09:23:48 vpsXXXXXX.ovh.net nginx[30004]: nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_siz
Sep 04 09:23:48 vpsXXXXXX.ovh.net systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
Sep 04 09:23:48 vpsXXXXXX.ovh.net systemd[1]: Started A high performance web server and a reverse proxy server.

Desafortunadamente, no registré la última vez que sucedió y los registros aparentemente son demasiado antiguos:

$ zgrep "bind()" /var/log/nginx/*
(just this morning's episode)

... pero estoy seguro de que tuve un problema similar al menos 2 o 3 veces este año (lo cual es aceptable pero molesto).

No parece estar relacionado con un reinicio del servidor:

$ uptime
 09:43:37 up 12 days,  7:00,  1 user,  load average: 0.00, 0.00, 0.00

Este es mi crontab de usuario raíz, no creo que esté relacionado ya que el cronograma no parece coincidir, pero como no puedo pensar en ningún lugar donde buscar, aquí está de todos modos:

# let's encrypt renewal
0 5 1 * * certbot renew --authenticator standalone --installer  nginx --pre-hook "systemctl stop nginx" --post-hook "systemctl start nginx" -n
# automatically update debian
0 5 * * 1 apt -qq update && apt dist-upgrade -qq -y
# couldn't find another way to make sure that datetime.today() actually returns today's date in my flask apps
0 0 * * * systemctl reload uwsgi.service

¿Qué está causando esto?

EDITAR: En /var/log/nginx/error.log, puedo encontrar un poco más de información:

2018/09/04 07:55:19 [warn] 29269#29269: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/04 07:55:19 [info] 29269#29269: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/04 07:55:20 [warn] 29271#29271: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/04 07:55:20 [info] 29271#29271: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/04 07:55:20 [warn] 29273#29273: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/04 07:55:20 [info] 29273#29273: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/04 07:55:26 [warn] 29305#29305: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/04 07:55:26 [notice] 29305#29305: signal process started
2018/09/04 07:55:26 [error] 29305#29305: open() "/run/nginx.pid" failed (2: No such file or directory)
2018/09/04 07:55:26 [warn] 29306#29306: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/04 07:55:27 [warn] 29333#29333: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
(then the same lines journalctl displays)

¿Por qué sucedería eso open() "/run/nginx.pid" failed (2: No such file or directory)de repente?

EDITAR2: Sucedió de nuevo esta mañana:

$ cat /var/log/nginx/error.log
2018/09/13 10:12:19 [warn] 7230#7230: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:19 [info] 7230#7230: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/13 10:12:31 [warn] 7243#7243: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:31 [notice] 7243#7243: signal process started
2018/09/13 10:12:31 [error] 7243#7243: open() "/run/nginx.pid" failed (2: No such file or directory)
2018/09/13 10:12:31 [warn] 7244#7244: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:32 [warn] 7247#7247: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:32 [info] 7247#7247: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/13 10:12:32 [warn] 7251#7251: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:32 [info] 7251#7251: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/13 10:12:33 [warn] 7255#7255: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:33 [info] 7255#7255: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/13 10:12:33 [warn] 7256#7256: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:5281 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:5281 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:5281 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:5281 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: bind() to 0.0.0.0:5281 failed (98: Address already in use)
2018/09/13 10:12:33 [emerg] 7256#7256: still could not bind()
2018/09/13 10:12:36 [alert] 7245#7245: unlink() "/run/nginx.pid" failed (2: No such file or directory)
-------- this is when I realized my web services are down and manually restart nginx
2018/09/13 11:25:11 [warn] 7578#7578: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
2018/09/13 11:25:11 [info] 7578#7578: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:66
2018/09/13 11:25:12 [warn] 7579#7579: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size

Creo que simplemente agregaré un cronjob que verifique cada 5 minutos si nginxestá activo y lo iniciaré si no, pero esto está tan sucio que me dan ganas de vomitar.

información relacionada