A criança Verniz pareceu entrar em pânico nas primeiras horas da manhã. As seguintes mensagens foram registradas em /var/log/messages:
Jun 30 00:49:12 [ip-hidden] varnishd[32140]: Child (12025) not responding to CLI, killed it.
Jun 30 00:49:12 [ip-hidden] varnishd[32140]: Unexpected reply from ping: 400 CLI communication error (hdr)
Jun 30 00:49:12 [ip-hidden] varnishd[32140]: Child (12025) not responding to CLI, killed it.
Jun 30 00:49:12 [ip-hidden] varnishd[32140]: Unexpected reply from ping: 400 CLI communication error
Jun 30 00:49:12 [ip-hidden] varnishd[32140]: Child (12025) died signal=3
Jun 30 00:49:22 [ip-hidden] varnishd[32140]: Child (22758) Pushing vcls failed:#012Could not load compiled VCL.#012#011dlopen(vcl_reload_2019-04-15T110712.1555326432.399262428/vgc.so) = vcl_reload_2019-04-15T110712.1555326432.399262428/vgc.so: failed to map segment from shared object: Cannot allocate memory
Jun 30 00:49:42 [ip-hidden] varnishd[32140]: Child (22758) died signal=6
Jun 30 00:49:45 [ip-hidden] varnishd[32140]: Child (22758) Last panic at: Sun, 30 Jun 2019 00:49:42 GMT#012"Assert error in child_signal_handler(), mgt/mgt_child.c line 296:#012 Condition(Signal 6 (Aborted) received at 0x1f1000058e6 si_code -6) not true.#012errno = 12 (Cannot allocate memory)#012thread = (ban-lurker)#012version = varnish-4.1.9 revision 5024f60c3a51f537279977989b645e983a946e1a#012ident = Linux,4.9.75-25.55.amzn1.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012now = 43413242.798650 (mono), 1561855776.431004 (real)#012Backtrace:#012#012"
Jun 30 00:49:48 [ip-hidden] varnishd[32140]: Child (22758) said Child starts
Jun 30 00:49:49 [ip-hidden] varnishd[32140]: Child (22758) said libgcc_s.so.1 must be installed for pthread_cancel to work
Cerca de uma hora depois, o seguinte foi registrado:
Jun 30 02:06:41 [ip-hidden] varnishd[32140]: Manager got SIGINT
Jun 30 02:06:42 [ip-hidden] varnishd[28097]: Child (28099) said Child starts
O tempo de atividade de varnishstat
parece apoiar isso.
Provavelmente é inútil pontificar sobre por que a criança caiu, mas seria útil saber por que ela não reiniciou em tempo hábil. Suspeito de um problema de memória, mas há algo que eu possa fazer para evitar isso no futuro?
Para referência, é a versão 4.1.9 em execução no Amazon Linux AMI (2018)