MasterServerWaitCondition 수신 FAILURE 신호, iptables 및 Chef 버전 오류로 인해 AWS pcluster가 실패합니다.

MasterServerWaitCondition 수신 FAILURE 신호, iptables 및 Chef 버전 오류로 인해 AWS pcluster가 실패합니다.

병렬 클러스터용 AMI를 생성하려고 합니다. 저는 Amazon의 기본 AMI(us-west-2, 제가 속한 지역 및 alinux의 경우 ami-0436692c7b452bae4)를 사용하고 몇 가지 패키지를 추가하여 약간 수정했습니다.

그러나 실행하면 pcluster create foo --norollback오류가 발생합니다.

Beginning cluster creation for cluster: stockAWS
Creating stack named: parallelcluster-stockAWS
Status: parallelcluster-stockAWS - ROLLBACK_IN_PROGRESS                         
Cluster creation failed.  Failed events:
  - AWS::AutoScaling::AutoScalingGroup ComputeFleet Resource creation cancelled
  - AWS::CloudFormation::WaitCondition MasterServerWaitCondition Received FAILURE signal with UniqueId i-booyaa

그런 다음 실행하여 긴 오류 로그가 표시된 ssh foo로그를 살펴봅니다 /var/log/cfncluster-init.log. 그 하단에는 다음과 같은 내용이 나와 있습니다.

2021-07-28 23:16:49,659 [ERROR] Command chef (chef-client --local-mode --config /etc/chef/client.rb --log_level auto --force-formatter --no-color --chef-zero-port 8889 --json-attributes /etc/chef/dna.json --override-runlist aws-parallelcluster::_prep_env) failed
2021-07-28 23:16:49,659 [DEBUG] Command chef output: Starting Chef Client, version 14.2.0
[2021-07-28T23:16:47+00:00] WARN: Run List override has been provided.
[2021-07-28T23:16:47+00:00] WARN: Run List override has been provided.
[2021-07-28T23:16:47+00:00] WARN: Original Run List: [recipe[aws-parallelcluster::slurm_config]]
[2021-07-28T23:16:47+00:00] WARN: Original Run List: [recipe[aws-parallelcluster::slurm_config]]
[2021-07-28T23:16:47+00:00] WARN: Overridden Run List: [recipe[aws-parallelcluster::_prep_env]]
[2021-07-28T23:16:47+00:00] WARN: Overridden Run List: [recipe[aws-parallelcluster::_prep_env]]
resolving cookbooks for run list: ["aws-parallelcluster::_prep_env"]
Synchronizing Cookbooks:
  - aws-parallelcluster (2.5.1)
  - poise-python (1.7.0)
  - tar (2.1.1)
  - selinux (2.1.1)
  - nfs (2.6.4)
  - yum (5.1.0)
  - yum-epel (3.1.0)
  - openssh (2.6.3)
  - apt (7.0.0)
  - hostname (0.4.2)
  - line (2.4.1)
  - ulimit (1.0.0)
  - pyenv (3.1.1)
  - kernel_module (1.1.2)
  - poise (2.8.2)
  - poise-languages (2.1.2)
  - iptables (8.0.0)
  - hostsfile (3.0.1)
  - poise-archive (1.5.0)

Running handlers:
[2021-07-28T23:16:49+00:00] ERROR: Running exception handlers
[2021-07-28T23:16:49+00:00] ERROR: Running exception handlers
Running handlers complete
[2021-07-28T23:16:49+00:00] ERROR: Exception handlers complete
[2021-07-28T23:16:49+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 11 seconds
[2021-07-28T23:16:49+00:00] FATAL: Stacktrace dumped to /etc/chef/local-mode-cache/cache/chef-stacktrace.out
[2021-07-28T23:16:49+00:00] FATAL: Stacktrace dumped to /etc/chef/local-mode-cache/cache/chef-stacktrace.out
[2021-07-28T23:16:49+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2021-07-28T23:16:49+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2021-07-28T23:16:49+00:00] FATAL: Chef::Exceptions::CookbookChefVersionMismatch: Cookbook 'iptables' version '8.0.0' depends on chef version [">= 15.3"], but the running chef version is 14.2.0
[2021-07-28T23:16:49+00:00] FATAL: Chef::Exceptions::CookbookChefVersionMismatch: Cookbook 'iptables' version '8.0.0' depends on chef version [">= 15.3"], but the running chef version is 14.2.0

2021-07-28 23:16:49,659 [ERROR] Error encountered during build of chefPrepEnv: Command chef failed
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/", line 573, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/", line 273, in build
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/", line 127, in apply
    raise ToolError(u"Command %s failed" % name)
cfnbootstrap.construction_errors.ToolError: Command chef failed
2021-07-28 23:16:49,661 [ERROR] -----------------------BUILD FAILED!------------------------

iptables --version내가 달리면 v1.8.4. sudo로 실행하는 경우에도 마찬가지입니다. 요리사는14.2.0

실망스러운 점은 기본 AWS AMI를 사용하여 병렬 클러스터 스택을 생성하면 정확히 동일한 동작이 발생한다는 것입니다. 여기서 무슨 일이 일어나고 있는 걸까요?

관련 정보