slurm - 是否可以查詢 slurmctld/slurmd 以了解它們是否使用正確的 slurm.conf 版本?

slurm - 是否可以查詢 slurmctld/slurmd 以了解它們是否使用正確的 slurm.conf 版本?

我面臨的問題是 slurmctld 和 slurmd 在使用相同的 slurm.conf 檔案方面不同步,所以我們有這個:

error: Node node1 appears to have a different slurm.conf than the slurmctld.  This could cause issues with communication and functionality.  Please review both files and make sure they are the same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node2 appears to have a different slurm.conf than the slurmctld.  This could cause issues with communication and functionality.  Please review both files and make sure they are the same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node3 appears to have a different slurm.conf than the slurmctld.  This could cause issues with communication and functionality.  Please review both files and make sure they are the same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node4 appears to have a different slurm.conf than the slurmctld.  This could cause issues with communication and functionality.  Please review both files and make sure they are the same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.

有沒有辦法(除了解析日誌錯誤)來查詢 slurmctld/slurmd個別地關於它們正在運行的配置,以了解是否需要重新啟動或重新配置它們中的任何一個?我認為,獲得哈希值應該足以將它們相互比較。

slurm.conf更新:也知道讀取檔案的時間會很方便。

答案1

我建議使用無配置在漿液會議中。當守護程式啟動時,您仍然會在 slurm 日誌中收到錯誤訊息,但可以安全地忽略它們。所有 slurmd 系統都會從 slurm 控制器中取得正確的配置。

相關內容