我正在嘗試設定 DHCP 故障轉移,並允許從其中一台 DHCP 伺服器進行 PXE 引導。根據 DHCP 規範的要求,我為「常規」DHCP 和 PXE 啟動設定了單獨的池。我的故障轉移配置工作正常,但應該回應 PXE 請求的 DHCP 配置不再運作。
背景:我最近升級到了 AlmaLinux 9(從 CentOS 7),它運行 ISC DHCP 4.4。在舊配置中,我沒有 DHCP 故障轉移,並且允許從整個池進行 PXE 引導。由於我們站點的硬體故障歷史記錄,我想設定 DHCP 故障轉移。
出於此配置的目的,我們將應該回應 PXE 請求的系統稱為「主」DHCP 伺服器。這是/etc/dhcp/dhcpd/conf
該伺服器的一個片段。請注意,我設定了一個單獨的池來處理 PXE/BOOTP 查詢。 (請原諒這些評論的說教語氣。它們是為我做系統管理員工作的。)
authoritative; # Send out acknowledgements to DHCP client queries.
failover peer "dhcp-failover" {
primary; # declare this to be the primary server
address 10.4.7.9;
port 647;
peer address 10.4.7.210;
peer port 647;
# How many seconds to wait before we assume that the other has failed.
max-response-delay 30;
# How many BNDUPD messages to send before receiving BNDACK.
max-unacked-updates 10;
# How many seconds to wait before disabling load balancing.
load balance max seconds 3;
# Maximum Client Lead Time = How long a lease may be renewed
# without contacting the other DHCP peer.
mclt 1800;
# The split between primary and secondary. 128 means a
# 50% split between peers; 255 means the primary handles
# everything until it fails.
split 128;
}
# This is the primary DHCP server. Respond to BOOTP requests.
allow booting;
allow bootp;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
# Is this a DHCP query (as opposed to a BOOTP query)?
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
deny members of "pxe";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
}
# A separate pool for BOOTP services.
pool {
range dynamic-bootp 10.4.45.251 10.4.45.255; # DHCP pool on private network
allow dynamic bootp clients;
deny members of "dhcp";
allow members of "pxe";
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if substring(option vendor-class-identifier,15,5) = "00007" {
log(info,"UEFI PXE Boot - private network");
filename "pxelinux/grubx64.efi"; # The file to load for EFI systems.
}
else {
log(info,"BIOS PXE Boot - private network");
filename "pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
}
}
這是來自/etc/dhcp/dhcpd.conf
故障轉移/輔助/非 PXE 伺服器:
authoritative; # Send out acknowledgements to DHCP client queries.
failover peer "dhcp-failover" {
secondary; # declare this to be the secondary server
address 10.4.7.210;
port 647;
peer address 10.4.7.9;
peer port 647;
# How many seconds to wait before we assume that the other has failed.
max-response-delay 30;
# How many BNDUPD messages to send before receiving BNDACK.
max-unacked-updates 10;
# How many seconds to wait before disabling load balancing.
load balance max seconds 3;
}
# Make sure that this failover DHCP server does _not_
# respond to bootp.
deny bootp;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
# Is this a DHCP query (as opposed to a BOOTP query)?
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
# Note that there are a few IP addresses in the range of the primary
# server that are not included here. This is for BOOTP, which is
# not handled by the secondary server.
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
deny members of "pxe";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
}
}
我知道我對“dhcp”和“pxe”類做得太過了。我在嘗試解決問題時添加了它們。除了引入peer holds all free leases
下面的日誌訊息之外,它們沒有任何作用。
這是我在「主」伺服器的日誌中看到的內容。請注意,這52:54:00:31:f2:7f
是我設定為透過 PXE 啟動的測試系統的 MAC 位址,然後它「放棄」並從磁碟啟動。
Sep 8 14:20:46 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:49 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:57 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:21:13 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
這是來自“輔助”伺服器上的日誌。這與客戶端首次啟動(嘗試定位 PXE 伺服器)到從作業系統啟動切換到以通常方式獲取 DHCP 位址之間大約一分鐘的延遲是一致的。
Sep 8 14:20:46 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:46 dhcpdsec dhcpd[67768]: bind update on 10.4.45.183 from dhcp-failover rejected: incoming update is less critical than outgoing update
Sep 8 14:20:49 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:57 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:21:13 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:22:03 dhcpdsec dhcpd[67768]: DHCPREQUEST for 10.4.45.183 from 52:54:00:31:f2:7f via enp7s0
Sep 8 14:22:04 dhcpdsec dhcpd[67768]: DHCPACK on 10.4.45.183 to 52:54:00:31:f2:7f via enp7s0
透過在早期測試中的反覆嘗試,我確認 的值substring (option vendor-class-identifier, 0, 9)
確實是PXEClient
。
52:54:00:31:f2:7f
我已經嘗試停止兩台機器上的 dhcpd 守護程序並手動編輯in的條目/var/lib/dhcpd/dhcpd.leases
。不用找了。
有任何想法嗎?
編輯:我想到,發布我以前的 DHCP 配置(無需故障轉移)可能會有所幫助。 PXE 啟動工作正常:
subnet 10.4.0.0 netmask 255.255.0.0 {
range dynamic-bootp 10.4.45.1 10.4.45.254; # DCHP pool on private network
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "nevis.columbia.edu";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if substring(option vendor-class-identifier,15,5) = "00007" {
log(info,"UEFI PXE Boot - private network");
filename "pxelinux/grubx64.efi"; # The file to load for EFI systems.
}
else {
log(info,"BIOS PXE Boot - private network");
filename "pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
}
答案1
經過多次實驗,我找到了答案:事實證明,池中存取控制語句的順序很重要。
這是我原來的帖子中的類別定義的重複:
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
以下是subnet
適用於我的主 DHCP 伺服器的定義。這與我原來的帖子中的配置之間的主要區別在於,與任何or語句range
相比,語句的順序,以及我首先定義池。原來的故障轉移線保持不變。allow
deny
"pxe"
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if option architecture-type = 00:07 {
filename "uefi/grubx64.efi"; # The file to load for EFI systems.
}
else {
filename "pxelinux/pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
# A separate pool for PXE services.
pool {
range dynamic-bootp 10.4.45.251 10.4.45.255; # DHCP pool on private network
allow dynamic bootp clients;
allow members of "pxe";
}
# The "regular" DHCP pool.
pool {
failover peer "dhcp-failover";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
deny dynamic bootp clients;
deny members of "pxe";
}
}
以下是我的輔助 DHCP 伺服器設定中的修訂subnet
行,儘管這些變更可能並不重要:
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
# Note that there are a few IP addresses in the range of the primary
# server that are not included here. This is for PXE, which is
# not handled by the secondary server.
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
range 10.4.45.1 10.4.45.250; # DCHP pool on private network
}
}
我現在已經有了 DHCP 故障轉移和 PXE 啟動的設置,用於安裝/修復作業系統,可容納 BIOS 和 EFI 系統。我希望其他人發現以上幾行很有用!