Установка диаграммы RabbitMQ Helm в кластере Kubernetes приводит к сбою распространения cookie-файла Erlang на узел

Установка диаграммы RabbitMQ Helm в кластере Kubernetes приводит к сбою распространения cookie-файла Erlang на узел

Я пытаюсь установить кластер RabbitMQ через диаграмму Bitnami Helm (https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq) в кластере EKS и при выполнении установки Helm я получаю следующую ошибку в первом созданном модуле:

rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 Welcome to the Bitnami rabbitmq container
rabbitmq 13:41:15.99 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
rabbitmq 13:41:15.99 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 INFO  ==> ** Starting RabbitMQ setup **
rabbitmq 13:41:16.01 INFO  ==> Validating settings in RABBITMQ_* env vars..
rabbitmq 13:41:16.03 INFO  ==> Initializing RabbitMQ...
rabbitmq 13:41:16.03 DEBUG ==> Creating environment file...
rabbitmq 13:41:16.03 DEBUG ==> Creating enabled_plugins file...
rabbitmq 13:41:16.04 DEBUG ==> Creating Erlang cookie...
rabbitmq 13:41:16.04 DEBUG ==> Ensuring expected directories/files exist...
rabbitmq 13:41:16.05 INFO  ==> Starting RabbitMQ in background...
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
2022-04-19 13:41:19.198340+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:19.212884+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:19.212941+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:19.212965+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:19.212985+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:19.213077+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:19.213104+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:19.213124+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2022-04-19 13:41:19.637051+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2022-04-19 13:41:19.637148+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2022-04-19 13:41:19.656264+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2022-04-19 13:41:19.904087+00:00 [info] <0.222.0> ra: starting system quorum_queues
2022-04-19 13:41:19.904200+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0
2022-04-19 13:41:19.995094+00:00 [info] <0.263.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2022-04-19 13:41:20.013384+00:00 [noti] <0.268.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2022-04-19 13:41:20.022921+00:00 [info] <0.222.0> ra: starting system coordination
2022-04-19 13:41:20.022987+00:00 [info] <0.222.0> starting Ra system: coordination in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/coordination/rabbit@rabbitmq-0
2022-04-19 13:41:20.026371+00:00 [info] <0.276.0> ra: meta data store initialised for system coordination. 0 record(s) recovered
2022-04-19 13:41:20.026628+00:00 [noti] <0.281.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Starting RabbitMQ 3.9.8 on Erlang 24.1.2 [jit]
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  ##  ##      RabbitMQ 3.9.8
  ##  ##
  ##########  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Erlang:      24.1.2 [jit]
  TLS Library: OpenSSL - OpenSSL 1.1.1d  10 Sep 2019

  Doc guides:  https://rabbitmq.com/documentation.html
  Support:     https://rabbitmq.com/contact.html
  Tutorials:   https://rabbitmq.com/getstarted.html
  Monitoring:  https://rabbitmq.com/monitoring.html

  Logs: /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
        <stdout>

  Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf

  Starting broker...2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  node           : rabbit@rabbitmq-0
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  home dir       : /opt/bitnami/rabbitmq/.rabbitmq
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  cookie hash    : d3Nfp8t690Ln1h811Tuxzw==
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  log(s)         : /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>                 : <stdout>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  database dir   : /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0
2022-04-19 13:41:20.307590+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:20.307654+00:00 [info] <0.222.0> Feature flags:   [ ] drop_unroutable_metric
2022-04-19 13:41:20.307681+00:00 [info] <0.222.0> Feature flags:   [ ] empty_basic_get_metric
2022-04-19 13:41:20.307705+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:20.307792+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:20.307818+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:20.307838+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:20.307908+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:20.307947+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:20.307968+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
Error: operation wait on node [email protected] timed out. Timeout value used: 5000
2022-04-19 13:41:23.299211+00:00 [info] <0.222.0> Running boot step pre_boot defined by app rabbit
2022-04-19 13:41:23.299295+00:00 [info] <0.222.0> Running boot step rabbit_global_counters defined by app rabbit
2022-04-19 13:41:23.299545+00:00 [info] <0.222.0> Running boot step rabbit_osiris_metrics defined by app rabbit
2022-04-19 13:41:23.299746+00:00 [info] <0.222.0> Running boot step rabbit_core_metrics defined by app rabbit
2022-04-19 13:41:23.300299+00:00 [info] <0.222.0> Running boot step rabbit_alarm defined by app rabbit
2022-04-19 13:41:23.304497+00:00 [info] <0.297.0> Memory high watermark set to 12695 MiB (13312088473 bytes) of 31738 MiB (33280221184 bytes) total
2022-04-19 13:41:23.308954+00:00 [info] <0.299.0> Enabling free disk space monitoring
2022-04-19 13:41:23.309007+00:00 [info] <0.299.0> Disk free limit set to 50MB
2022-04-19 13:41:23.312489+00:00 [info] <0.222.0> Running boot step code_server_cache defined by app rabbit
2022-04-19 13:41:23.312650+00:00 [info] <0.222.0> Running boot step file_handle_cache defined by app rabbit
2022-04-19 13:41:23.312958+00:00 [info] <0.302.0> Limiting to approx 65439 file handles (58893 sockets)
2022-04-19 13:41:23.313163+00:00 [info] <0.303.0> FHC read buffering: OFF
2022-04-19 13:41:23.313217+00:00 [info] <0.303.0> FHC write buffering: ON
2022-04-19 13:41:23.313829+00:00 [info] <0.222.0> Running boot step worker_pool defined by app rabbit
2022-04-19 13:41:23.313932+00:00 [info] <0.283.0> Will use 4 processes for default worker pool
2022-04-19 13:41:23.313982+00:00 [info] <0.283.0> Starting worker pool 'worker_pool' with 4 processes in it
2022-04-19 13:41:23.314583+00:00 [info] <0.222.0> Running boot step database defined by app rabbit
2022-04-19 13:41:23.314894+00:00 [info] <0.222.0> Node database directory at /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2022-04-19 13:41:23.314963+00:00 [info] <0.222.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2022-04-19 13:41:23.315110+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2022-04-19 13:41:23.316998+00:00 [noti] <0.44.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========
Exception during startup:

2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> BOOT FAILED
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> ===========
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> Exception during startup:
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> error:{badmatch,{error,enoent}}
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:make_request/0, line 121
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:list_nodes/0, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:lock/1, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery:lock/0, line 190
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init_with_lock/3, line 104
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init/0, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:run_step/2, line 46
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
error:{badmatch,{error,enoent}}

    rabbit_peer_discovery_k8s:make_request/0, line 121
    rabbit_peer_discovery_k8s:list_nodes/0, line 41
    rabbit_peer_discovery_k8s:lock/1, line 76
    rabbit_peer_discovery:lock/0, line 190
    rabbit_mnesia:init_with_lock/3, line 104
    rabbit_mnesia:init/0, line 76
    rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
    rabbit_boot_steps:run_step/2, line 46

2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   crasher:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     initial call: application_master:init/4
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     pid: <0.221.0>
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     registered_name: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     exception exit: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>       in function  application_master:init/4 (application_master.erl, line 142)
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     ancestors: [<0.220.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     message_queue_len: 1
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     messages: [{'EXIT',<0.222.0>,normal}]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     links: [<0.220.0>,<0.44.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     dictionary: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     trap_exit: true
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     status: running
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     heap_size: 2586
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     stack_size: 29
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     reductions: 186
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   neighbours:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>
2022-04-19 13:41:24.319087+00:00 [noti] <0.44.0> Application rabbit exited with reason: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /opt/bitnami/rabbitmq/var/log/rabbitmq/erl_crash.dump...done
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
Error:
process_not_running
Waiting for erlang distribution on node '[email protected]' while OS process '51' is running
Error:
process_not_running

Похоже, что файлы cookie Erlang распространяются неправильно, но после проверки некоторых сообщений я так и не пришел к какому-либо выводу.

Если у вас есть какая-либо информация, которая может быть полезна, я буду признателен, если вы поделитесь ею со мной.

EDIT 1: Я вошел в первый и единственный pod из трех реплик, которые должны быть созданы, запустил, rabbitmq-diagnostics erlang_cookie_sourcesчтобы выяснить, где хранится файл cookie Erland (/opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie), и проверить, тот ли он, который я указал в values.yaml диаграммы, и он точно такой же, так что в конце концов я думаю, что нет проблем с распространением ключа, но у меня все еще та же проблема. Проверяя журналы еще раз, я вижу, что какой-то процесс не запущен, я не знаю, должна ли быть проблема там.

решение1

Проблема была в токене учетной записи службы, который не был распространен на модули. Я изменил values.yaml диаграммы Helm:

serviceAccount:
  ## @param serviceAccount.create Enable creation of ServiceAccount for RabbitMQ pods
  ##
  create: true
  ## @param serviceAccount.name Name of the created serviceAccount
  ## If not set and create is true, a name is generated using the rabbitmq.fullname template
  ##
  #name: ""
  ## @param serviceAccount.automountServiceAccountToken Auto-mount the service account token in the pod
  ##
  automountServiceAccountToken: true

Связанный контент