Instalação do gráfico RabbitMQ Helm em um cluster Kubernetes falhando na distribuição do cookie Erlang para um nó

Instalação do gráfico RabbitMQ Helm em um cluster Kubernetes falhando na distribuição do cookie Erlang para um nó

Estou tentando instalar um cluster RabbitMQ através do gráfico Bitnami Helm (https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq) em um cluster EKS e quando executo a instalação do Helm recebo o seguinte erro no primeiro pod criado:

rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 Welcome to the Bitnami rabbitmq container
rabbitmq 13:41:15.99 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
rabbitmq 13:41:15.99 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 INFO  ==> ** Starting RabbitMQ setup **
rabbitmq 13:41:16.01 INFO  ==> Validating settings in RABBITMQ_* env vars..
rabbitmq 13:41:16.03 INFO  ==> Initializing RabbitMQ...
rabbitmq 13:41:16.03 DEBUG ==> Creating environment file...
rabbitmq 13:41:16.03 DEBUG ==> Creating enabled_plugins file...
rabbitmq 13:41:16.04 DEBUG ==> Creating Erlang cookie...
rabbitmq 13:41:16.04 DEBUG ==> Ensuring expected directories/files exist...
rabbitmq 13:41:16.05 INFO  ==> Starting RabbitMQ in background...
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
2022-04-19 13:41:19.198340+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:19.212884+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:19.212941+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:19.212965+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:19.212985+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:19.213077+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:19.213104+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:19.213124+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2022-04-19 13:41:19.637051+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2022-04-19 13:41:19.637148+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2022-04-19 13:41:19.656264+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2022-04-19 13:41:19.904087+00:00 [info] <0.222.0> ra: starting system quorum_queues
2022-04-19 13:41:19.904200+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0
2022-04-19 13:41:19.995094+00:00 [info] <0.263.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2022-04-19 13:41:20.013384+00:00 [noti] <0.268.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2022-04-19 13:41:20.022921+00:00 [info] <0.222.0> ra: starting system coordination
2022-04-19 13:41:20.022987+00:00 [info] <0.222.0> starting Ra system: coordination in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/coordination/rabbit@rabbitmq-0
2022-04-19 13:41:20.026371+00:00 [info] <0.276.0> ra: meta data store initialised for system coordination. 0 record(s) recovered
2022-04-19 13:41:20.026628+00:00 [noti] <0.281.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Starting RabbitMQ 3.9.8 on Erlang 24.1.2 [jit]
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  ##  ##      RabbitMQ 3.9.8
  ##  ##
  ##########  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Erlang:      24.1.2 [jit]
  TLS Library: OpenSSL - OpenSSL 1.1.1d  10 Sep 2019

  Doc guides:  https://rabbitmq.com/documentation.html
  Support:     https://rabbitmq.com/contact.html
  Tutorials:   https://rabbitmq.com/getstarted.html
  Monitoring:  https://rabbitmq.com/monitoring.html

  Logs: /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
        <stdout>

  Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf

  Starting broker...2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  node           : rabbit@rabbitmq-0
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  home dir       : /opt/bitnami/rabbitmq/.rabbitmq
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  cookie hash    : d3Nfp8t690Ln1h811Tuxzw==
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  log(s)         : /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>                 : <stdout>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  database dir   : /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0
2022-04-19 13:41:20.307590+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:20.307654+00:00 [info] <0.222.0> Feature flags:   [ ] drop_unroutable_metric
2022-04-19 13:41:20.307681+00:00 [info] <0.222.0> Feature flags:   [ ] empty_basic_get_metric
2022-04-19 13:41:20.307705+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:20.307792+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:20.307818+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:20.307838+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:20.307908+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:20.307947+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:20.307968+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
Error: operation wait on node [email protected] timed out. Timeout value used: 5000
2022-04-19 13:41:23.299211+00:00 [info] <0.222.0> Running boot step pre_boot defined by app rabbit
2022-04-19 13:41:23.299295+00:00 [info] <0.222.0> Running boot step rabbit_global_counters defined by app rabbit
2022-04-19 13:41:23.299545+00:00 [info] <0.222.0> Running boot step rabbit_osiris_metrics defined by app rabbit
2022-04-19 13:41:23.299746+00:00 [info] <0.222.0> Running boot step rabbit_core_metrics defined by app rabbit
2022-04-19 13:41:23.300299+00:00 [info] <0.222.0> Running boot step rabbit_alarm defined by app rabbit
2022-04-19 13:41:23.304497+00:00 [info] <0.297.0> Memory high watermark set to 12695 MiB (13312088473 bytes) of 31738 MiB (33280221184 bytes) total
2022-04-19 13:41:23.308954+00:00 [info] <0.299.0> Enabling free disk space monitoring
2022-04-19 13:41:23.309007+00:00 [info] <0.299.0> Disk free limit set to 50MB
2022-04-19 13:41:23.312489+00:00 [info] <0.222.0> Running boot step code_server_cache defined by app rabbit
2022-04-19 13:41:23.312650+00:00 [info] <0.222.0> Running boot step file_handle_cache defined by app rabbit
2022-04-19 13:41:23.312958+00:00 [info] <0.302.0> Limiting to approx 65439 file handles (58893 sockets)
2022-04-19 13:41:23.313163+00:00 [info] <0.303.0> FHC read buffering: OFF
2022-04-19 13:41:23.313217+00:00 [info] <0.303.0> FHC write buffering: ON
2022-04-19 13:41:23.313829+00:00 [info] <0.222.0> Running boot step worker_pool defined by app rabbit
2022-04-19 13:41:23.313932+00:00 [info] <0.283.0> Will use 4 processes for default worker pool
2022-04-19 13:41:23.313982+00:00 [info] <0.283.0> Starting worker pool 'worker_pool' with 4 processes in it
2022-04-19 13:41:23.314583+00:00 [info] <0.222.0> Running boot step database defined by app rabbit
2022-04-19 13:41:23.314894+00:00 [info] <0.222.0> Node database directory at /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2022-04-19 13:41:23.314963+00:00 [info] <0.222.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2022-04-19 13:41:23.315110+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2022-04-19 13:41:23.316998+00:00 [noti] <0.44.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========
Exception during startup:

2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> BOOT FAILED
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> ===========
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> Exception during startup:
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> error:{badmatch,{error,enoent}}
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:make_request/0, line 121
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:list_nodes/0, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:lock/1, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery:lock/0, line 190
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init_with_lock/3, line 104
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init/0, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:run_step/2, line 46
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
error:{badmatch,{error,enoent}}

    rabbit_peer_discovery_k8s:make_request/0, line 121
    rabbit_peer_discovery_k8s:list_nodes/0, line 41
    rabbit_peer_discovery_k8s:lock/1, line 76
    rabbit_peer_discovery:lock/0, line 190
    rabbit_mnesia:init_with_lock/3, line 104
    rabbit_mnesia:init/0, line 76
    rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
    rabbit_boot_steps:run_step/2, line 46

2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   crasher:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     initial call: application_master:init/4
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     pid: <0.221.0>
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     registered_name: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     exception exit: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>       in function  application_master:init/4 (application_master.erl, line 142)
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     ancestors: [<0.220.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     message_queue_len: 1
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     messages: [{'EXIT',<0.222.0>,normal}]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     links: [<0.220.0>,<0.44.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     dictionary: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     trap_exit: true
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     status: running
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     heap_size: 2586
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     stack_size: 29
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     reductions: 186
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   neighbours:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>
2022-04-19 13:41:24.319087+00:00 [noti] <0.44.0> Application rabbit exited with reason: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /opt/bitnami/rabbitmq/var/log/rabbitmq/erl_crash.dump...done
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
Error:
process_not_running
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
Error:
process_not_running

Parece que o cookie Erlang não está distribuído corretamente, mas depois de verificar alguns posts não cheguei a nenhuma conclusão.

Se você tiver algum tipo de informação que possa ser útil, ficaria grato se você a compartilhasse comigo.

EDIT 1: Entrei no primeiro e único pod das três réplicas que devem ser criadas, execute rabbitmq-diagnostics erlang_cookie_sourcespara descobrir onde estão armazenados os arquivos de cookie Erland (/opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie) e verifique se é o mesmo que indiquei no valores.yaml do gráfico e é exatamente igual então no final acho que não há problema em distribuir a chave mas continuo com o mesmo problema. Verificando novamente os logs, vejo que há algum processo que não está em execução, não sei se o problema deve estar aí.

Responder1

O problema era o token da conta de serviço que não foi distribuído aos pods. Alterei os valores.yaml do gráfico Helm:

serviceAccount:
  ## @param serviceAccount.create Enable creation of ServiceAccount for RabbitMQ pods
  ##
  create: true
  ## @param serviceAccount.name Name of the created serviceAccount
  ## If not set and create is true, a name is generated using the rabbitmq.fullname template
  ##
  #name: ""
  ## @param serviceAccount.automountServiceAccountToken Auto-mount the service account token in the pod
  ##
  automountServiceAccountToken: true

informação relacionada