Estou tentando configurar o failover de DHCP com inicialização PXE permitida em um dos servidores DHCP. Conforme exigido pela especificação DHCP, configurei pools separados para DHCP "normal" e para inicialização PXE. Minha configuração de failover funciona bem, mas a configuração DHCP que deveria responder às solicitações PXE não funciona mais.
Histórico: atualizei recentemente para o AlmaLinux 9 (do CentOS 7), que executa o ISC DHCP 4.4. Na configuração mais antiga, eu não tinha failover de DHCP e permitia a inicialização PXE de todo o pool. Devido a um histórico de falhas de hardware em nosso site, quero configurar o failover de DHCP.
Para efeitos desta configuração, vamos chamar o sistema que deve responder às solicitações PXE de servidor DHCP "primário". Aqui está um fragmento /etc/dhcp/dhcpd/conf
desse servidor. Observe que configurei um pool separado apenas para lidar com as consultas PXE/BOOTP. (Por favor, desculpe o tom didático dos comentários. Eles são destinados a mim enquanto eu faço minhas tarefas de administrador de sistemas.)
authoritative; # Send out acknowledgements to DHCP client queries.
failover peer "dhcp-failover" {
primary; # declare this to be the primary server
address 10.4.7.9;
port 647;
peer address 10.4.7.210;
peer port 647;
# How many seconds to wait before we assume that the other has failed.
max-response-delay 30;
# How many BNDUPD messages to send before receiving BNDACK.
max-unacked-updates 10;
# How many seconds to wait before disabling load balancing.
load balance max seconds 3;
# Maximum Client Lead Time = How long a lease may be renewed
# without contacting the other DHCP peer.
mclt 1800;
# The split between primary and secondary. 128 means a
# 50% split between peers; 255 means the primary handles
# everything until it fails.
split 128;
}
# This is the primary DHCP server. Respond to BOOTP requests.
allow booting;
allow bootp;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
# Is this a DHCP query (as opposed to a BOOTP query)?
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
deny members of "pxe";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
}
# A separate pool for BOOTP services.
pool {
range dynamic-bootp 10.4.45.251 10.4.45.255; # DHCP pool on private network
allow dynamic bootp clients;
deny members of "dhcp";
allow members of "pxe";
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if substring(option vendor-class-identifier,15,5) = "00007" {
log(info,"UEFI PXE Boot - private network");
filename "pxelinux/grubx64.efi"; # The file to load for EFI systems.
}
else {
log(info,"BIOS PXE Boot - private network");
filename "pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
}
}
Isso ocorre /etc/dhcp/dhcpd.conf
no servidor de failover/secundário/não PXE:
authoritative; # Send out acknowledgements to DHCP client queries.
failover peer "dhcp-failover" {
secondary; # declare this to be the secondary server
address 10.4.7.210;
port 647;
peer address 10.4.7.9;
peer port 647;
# How many seconds to wait before we assume that the other has failed.
max-response-delay 30;
# How many BNDUPD messages to send before receiving BNDACK.
max-unacked-updates 10;
# How many seconds to wait before disabling load balancing.
load balance max seconds 3;
}
# Make sure that this failover DHCP server does _not_
# respond to bootp.
deny bootp;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
# Is this a DHCP query (as opposed to a BOOTP query)?
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
# Note that there are a few IP addresses in the range of the primary
# server that are not included here. This is for BOOTP, which is
# not handled by the secondary server.
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
deny members of "pxe";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
}
}
Eu sei que estou exagerando nas classes "dhcp" e "pxe". Eu os adicionei enquanto tentava corrigir o problema. Eles não surtiram efeito, exceto para apresentar as peer holds all free leases
mensagens de log abaixo.
Isto é o que vejo nos logs do servidor "primário". Observe que 52:54:00:31:f2:7f
é o endereço MAC de um sistema de teste que configurei para inicializar via PXE antes de "desistir" e inicializar a partir do disco.
Sep 8 14:20:46 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:49 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:57 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:21:13 dhcpd dhcpd[17922]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Isto vem do log no servidor "secundário". Isso é consistente com o atraso de aproximadamente um minuto desde a primeira inicialização do cliente, enquanto ele tenta localizar um servidor PXE, até o ponto em que ele deixa de inicializar a partir do sistema operacional e adquire um endereço DHCP da maneira usual.
Sep 8 14:20:46 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:46 dhcpdsec dhcpd[67768]: bind update on 10.4.45.183 from dhcp-failover rejected: incoming update is less critical than outgoing update
Sep 8 14:20:49 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:20:57 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:21:13 dhcpdsec dhcpd[67768]: DHCPDISCOVER from 52:54:00:31:f2:7f via enp7s0: peer holds all free leases
Sep 8 14:22:03 dhcpdsec dhcpd[67768]: DHCPREQUEST for 10.4.45.183 from 52:54:00:31:f2:7f via enp7s0
Sep 8 14:22:04 dhcpdsec dhcpd[67768]: DHCPACK on 10.4.45.183 to 52:54:00:31:f2:7f via enp7s0
Depois de me debater em testes anteriores, confirmei que o valor de substring (option vendor-class-identifier, 0, 9)
é de fato PXEClient
.
Eu já tentei parar o daemon dhcpd nas duas máquinas e editar manualmente as entradas para 52:54:00:31:f2:7f
in /var/lib/dhcpd/dhcpd.leases
. Nenhuma mudança.
Alguma ideia?
Editar: ocorreu-me que pode ajudar postar minha configuração DHCP anterior, sem failover. A inicialização PXE funcionou bem:
subnet 10.4.0.0 netmask 255.255.0.0 {
range dynamic-bootp 10.4.45.1 10.4.45.254; # DCHP pool on private network
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "nevis.columbia.edu";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if substring(option vendor-class-identifier,15,5) = "00007" {
log(info,"UEFI PXE Boot - private network");
filename "pxelinux/grubx64.efi"; # The file to load for EFI systems.
}
else {
log(info,"BIOS PXE Boot - private network");
filename "pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
}
Responder1
Encontrei uma resposta depois de muita experimentação: Acontece que a ordem das instruções de controle de acesso dentro de um pool é importante.
Aqui está uma repetição das definições de classe em minha postagem original:
class "dhcp" {
match if exists dhcp-message-type;
}
class "pxe" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
}
Aqui está a subnet
definição que funciona no meu servidor DHCP primário. A principal diferença entre esta e a configuração em minha postagem original é a ordem das range
instruções em comparação com as instruções any allow
ou deny
, e que eu defino o "pxe"
pool primeiro. As linhas de failover originais permanecem inalteradas.
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
next-server 10.4.7.9; # On which system the bootp filename is located.
if substring (option vendor-class-identifier, 0, 9) = "PXEClient" {
if option architecture-type = 00:07 {
filename "uefi/grubx64.efi"; # The file to load for EFI systems.
}
else {
filename "pxelinux/pxelinux.0"; # The file to load via bootp for BIOS systems.
}
}
# A separate pool for PXE services.
pool {
range dynamic-bootp 10.4.45.251 10.4.45.255; # DHCP pool on private network
allow dynamic bootp clients;
allow members of "pxe";
}
# The "regular" DHCP pool.
pool {
failover peer "dhcp-failover";
range 10.4.45.1 10.4.45.250; # DHCP pool on private network
deny dynamic bootp clients;
deny members of "pxe";
}
}
Aqui estão as linhas revisadas subnet
na configuração do meu servidor DHCP secundário, embora essas alterações provavelmente não importem:
subnet 10.4.0.0 netmask 255.255.0.0 {
default-lease-time 86400; # one day (in seconds)
option subnet-mask 255.255.0.0;
option broadcast-address 10.4.255.255;
option routers 10.4.0.1;
option domain-name-servers 10.4.7.7, 10.4.7.29;
option domain-name "company.example.com";
option time-offset -18000; # Eastern Standard Time
option ntp-servers 10.4.7.105, 10.4.7.7, 10.4.7.29;
# Note that there are a few IP addresses in the range of the primary
# server that are not included here. This is for PXE, which is
# not handled by the secondary server.
pool {
failover peer "dhcp-failover";
deny dynamic bootp clients;
range 10.4.45.1 10.4.45.250; # DCHP pool on private network
}
}
Agora tenho uma configuração com failover de DHCP e inicialização PXE para instalar/reparar um sistema operacional, que acomoda sistemas BIOS e EFI. Espero que alguém considere as linhas acima úteis!