Dead node in Pacemaker cluster: "Couldn't complete CIB registration"

Dead node in Pacemaker cluster: "Couldn't complete CIB registration"

I have a situation where a two-node Pacemaker cluster does not work any more after an upgrade. Package versions are pacemaker 1.1.16-1~bpo8+ and corosync 2.4.2-3~bpo8+1 under Debian Jessie.

Pacemaker is still able to start on one node. crm_node -l then lists that node as online, the second one as lost.

Pacemaker can no longer start on the second node. The following log messages in /var/log/corosync/logfile seem pertinent:

cib: info: validate_with_relaxng: Creating RNG parser context
pacemakerd: error: pcmk_child_exit: The cib process (1234) exited: Key has expired (127)
pacemakerd: notice: pcmk_process_exit: Respawning failed child process: cib
...
cib: info: validate_with_relaxng: Creating RNG parser context
pacemakerd: error: pcmk_child_exit: The cib process (1235) exited: Key has expired (127)
pacemakerd: notice: pcmk_process_exit: Respawning failed child process: cib
...
crmd: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
...
crmd: warning: do_cib_control: Couldn't complete CIB registration 16 times... pause and retry 
crmd: notice: crm_shutdown: Shutting down cluster resource manager | limit=1200000ms
pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete

So it appears as if the second node attempts CIB registration and cancels the Pacemaker start after 16 failed attempts, and that the first node coniders the second as dead perhaps because it cannot register.

Who can one get out of a situation like this?

решение1

The root cause turned out to be a too old version of package libpe-rules2, which provides libpe-rules2.so. Package pacemaker from jessie-backports requires only >= 1.0.10 (perhaps a bug in the current package description), but the current version of libpe-rules2 (also from jessie-backports) is 1.1.16.

The older version of the library made process cib fail because of undefined symbols in the dynamic library. This was revealed by starting pacemakerd (and in effect cib) with strace -f. Upgrading with apt-get install libpe-rules2 resolved the situation.

Связанный контент