Cómo depurar no se puede acceder a los servicios desde fuera del clúster de Kubernetes (local) con la IP del equilibrador de carga

Cómo depurar no se puede acceder a los servicios desde fuera del clúster de Kubernetes (local) con la IP del equilibrador de carga

Tengo una configuración de clúster de Kubernetes (v1.24) local con cri-o como tiempo de ejecución del contenedor, calico como cni y metallb para las IP del equilibrador de carga.

El sistema operativo Masters y Workers se ejecuta en rockylinux9, selinux habilitado pero firewalld deshabilitado y el proxy kube usa el modo ipvs.

Configuré el emparejamiento bgp con el enrutador Mikrotik y vi que los rangos de IP que configuré se anuncian en el enrutador a través de la sección de rutas con un registro que muestra

10.16.0.0/28 reachable through bridge

La IP externa de mi servicio Nginx de prueba es 10.16.0.1: rizar la IP de los maestros, trabajadores y dentro de los pods funciona y devuelve la página de bienvenida predeterminada de Nginx, pero cuando se riza desde mi computadora portátil, simplemente se cuelga hasta que se agota el tiempo de espera y los registros de auditoría de Selinux no muestran ninguna violación.

nmap la ip muestra el puerto 80 abierto, hacer ping a la ip también funciona, traceroute también muestra la respuesta correcta y para una verificación de locura también elimino el servicio y hago nmap, ping y traceroute nuevamente y deja de funcionar como se esperaba.

# commands below runs on my laptop
# that connected to local network
# but result also same as I run on
# other devices on the network

# ---

❯ nmap -T4 10.16.0.1
Starting Nmap 7.94 ( https://nmap.org ) at 2023-09-23 11:01 +07
Nmap scan report for 10.16.0.1
Host is up (0.0048s latency).
Not shown: 997 closed tcp ports (conn-refused)
PORT    STATE    SERVICE
22/tcp  open     ssh
80/tcp  filtered http
179/tcp open     bgp

# ---

# the 192.168.88.43 is local master node ip
❯ ping 10.16.0.1
PING 10.16.0.1 (10.16.0.1): 56 data bytes
64 bytes from 10.16.0.1: icmp_seq=0 ttl=64 time=9.144 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 c3b4   0 0000  3f  01 94cc 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=1 ttl=64 time=3.003 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 c9cc   0 0000  3f  01 8eb4 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=2 ttl=64 time=3.209 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 305b   0 0000  3f  01 2826 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=3 ttl=64 time=2.557 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 25d8   0 0000  3f  01 32a9 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=4 ttl=64 time=3.594 ms
x92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 4016   0 0000  3f  01 186b 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=5 ttl=64 time=2.974 ms

64 bytes from 10.16.0.1: icmp_seq=6 ttl=64 time=4.397 ms
^C
--- 10.16.0.1 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.557/4.125/9.144/2.119 ms

# --

# 192.168.88.1 is my default gateway
❯ traceroute 10.16.0.1
traceroute to 10.16.0.1 (10.16.0.1), 64 hops max, 52 byte packets
 1  192.168.88.1 (192.168.88.1)  3.669 ms  2.713 ms  2.552 ms
 2  10.16.0.1 (10.16.0.1)  3.303 ms  3.292 ms  3.145 ms

buscando información de nginx

❯ kubectl get pods -o wide
NAME                                 READY   STATUS    RESTARTS        AGE   IP              NODE           NOMINATED NODE   READINESS GATES
nginx                                1/1     Running   0               39m   172.16.29.154   k8s-worker-0   <none>           <none>

❯ kubectl get svc
NAME                 TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                               AGE                              42h
kubernetes           ClusterIP      10.96.0.1      <none>        443/TCP                               2d5h
nginx                LoadBalancer   10.97.153.41   10.16.0.1     80:30422/TCP                          40m

❯ kubectl get endpoints
NAME                 ENDPOINTS                                                               AGE                                                    42h
kubernetes           192.168.88.43:6443                                                      2d5h
nginx                172.16.29.154:80                                                        41m

❯ kubectl describe service nginx
Name:                     nginx
Namespace:                default
Labels:                   <none>
Annotations:              metallb.universe.tf/address-pool: public
                          metallb.universe.tf/ip-allocated-from-pool: public
Selector:                 app=nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.97.153.41
IPs:                      10.97.153.41
LoadBalancer Ingress:     10.16.0.1
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30422/TCP
Endpoints:                172.16.29.154:80
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     31552
Events:
  Type    Reason       Age   From                Message
  ----    ------       ----  ----                -------
  Normal  IPAllocated  41m   metallb-controller  Assigned IP ["10.16.0.1"]

configuraciones de calicó

❯ kubectl describe bgppeer
Name:         global-peer
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPPeer
Metadata:
  Creation Timestamp:  2023-09-22T06:37:35Z
  Resource Version:    430164
  UID:                 62d549ac-b7ed-47ab-bc2c-b94de8a08939
Spec:
  As Number:  65530
  Filters:
    default
  Peer IP:  192.168.88.1
Events:     <none>

❯ kubectl describe bgpconfiguration
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPConfiguration
Metadata:
  Creation Timestamp:  2023-09-22T06:37:24Z
  Resource Version:    839696
  UID:                 fd6aef3e-0a4c-4ecc-afe4-09395b76d107
Spec:
  As Number:                  65500
  Node To Node Mesh Enabled:  false
  Service Load Balancer I Ps:
    Cidr:  10.16.0.0/28
Events:    <none>

❯ kubectl describe bgpfilter
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPFilter
Metadata:
  Creation Timestamp:  2023-09-22T06:37:34Z
  Resource Version:    434512
  UID:                 df8b5aeb-e25e-481c-9403-fcc54727eea8
Spec:
  exportV4:
    Action:          Reject
    Cidr:            10.16.0.0/28
    Match Operator:  NotIn
Events:              <none>

❯ kubectl describe installation
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  operator.tigera.io/v1
Kind:         Installation
Metadata:
  Creation Timestamp:  2023-09-21T05:32:53Z
  Finalizers:
    tigera.io/operator-cleanup
  Generation:        3
  Resource Version:  1052996
  UID:               2f7928cd-3815-42be-9483-512f2f39fcf9
Spec:
  Calico Network:
    Bgp:         Enabled
    Host Ports:  Enabled
    Ip Pools:
      Block Size:          26
      Cidr:                172.16.0.0/16
      Disable BGP Export:  false
      Encapsulation:       None
      Nat Outgoing:        Disabled
      Node Selector:       all()
    Linux Dataplane:       Iptables
    Multi Interface Mode:  None
    nodeAddressAutodetectionV4:
      First Found:  true
  Cni:
    Ipam:
      Type:                    Calico
    Type:                      Calico
  Control Plane Replicas:      2
  Flex Volume Path:            /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  Kubelet Volume Plugin Path:  /var/lib/kubelet
  Logging:
    Cni:
      Log File Max Age Days:  30
      Log File Max Count:     10
      Log File Max Size:      100Mi
      Log Severity:           Info
  Node Update Strategy:
    Rolling Update:
      Max Unavailable:  1
    Type:               RollingUpdate
  Non Privileged:       Disabled
  Variant:              Calico
Status:
  Calico Version:  v3.26.1
  Computed:
    Calico Network:
      Bgp:         Enabled
      Host Ports:  Enabled
      Ip Pools:
        Block Size:          26
        Cidr:                172.16.0.0/16
        Disable BGP Export:  false
        Encapsulation:       None
        Nat Outgoing:        Disabled
        Node Selector:       all()
      Linux Dataplane:       Iptables
      Multi Interface Mode:  None
      nodeAddressAutodetectionV4:
        First Found:  true
    Cni:
      Ipam:
        Type:                    Calico
      Type:                      Calico
    Control Plane Replicas:      2
    Flex Volume Path:            /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    Kubelet Volume Plugin Path:  /var/lib/kubelet
    Logging:
      Cni:
        Log File Max Age Days:  30
        Log File Max Count:     10
        Log File Max Size:      100Mi
        Log Severity:           Info
    Node Update Strategy:
      Rolling Update:
        Max Unavailable:  1
      Type:               RollingUpdate
    Non Privileged:       Disabled
    Variant:              Calico
  Conditions:
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:
    Observed Generation:   3
    Reason:                Unknown
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:
    Observed Generation:   3
    Reason:                Unknown
    Status:                False
    Type:                  Ready
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:               DaemonSet "calico-system/calico-node" is not available (awaiting 1 nodes)
    Observed Generation:   3
    Reason:                ResourceNotReady
    Status:                True
    Type:                  Progressing
  Mtu:                     1450
  Variant:                 Calico
Events:                    <none>

calicoctl node statusmuestra

ingrese la descripción de la imagen aquí

configuración de metallb

❯ kubectl -n metallb-system describe ipaddresspool
Name:         public
Namespace:    metallb-system
Labels:       <none>
Annotations:  <none>
API Version:  metallb.io/v1beta1
Kind:         IPAddressPool
Metadata:
  Creation Timestamp:  2023-09-22T07:57:01Z
  Generation:          3
  Resource Version:    434511
  UID:                 aa2d0e08-4fee-4ce1-a822-9bf4f52a3a3c
Spec:
  Addresses:
    10.16.0.0/28
  Auto Assign:       true
  Avoid Buggy I Ps:  true
Events:              <none>

Entonces, cómo depurar esto, por qué no puedo acceder al servicio Nginx desde fuera del clúster.

actualizar0

Ejecutando tcpdump -n -i ens18 host 10.16.0.1tanto en el nodo maestro como en el trabajador y luego intentando ejecutarlo ping 10.16.0.1en arping -I 10.16.0.1otra máquina fuera del clúster, me dan esto tanto en el nodo maestro como en el trabajador.

# ping only show on master node
# 192.168.88.111 is the other machine(my laptop)
dropped privs to tcpdump
tcpdump: listening on ens18, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:31:29.842416 IP (tos 0x0, ttl 63, id 33297, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 0, length 64
15:31:29.842465 IP (tos 0x0, ttl 64, id 38235, offset 0, flags [none], proto ICMP (1), length 84)
    10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 0, length 64
15:31:30.846173 IP (tos 0x0, ttl 63, id 14503, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 1, length 64
15:31:30.846206 IP (tos 0x0, ttl 64, id 38836, offset 0, flags [none], proto ICMP (1), length 84)
    10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 1, length 64
# arping received 0 responses
# arping showing both on master and worker node
15:33:08.269603 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:09.269759 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:10.269768 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:11.269725 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:12.269575 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:13.269695 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:14.269685 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:15.269715 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:16.269754 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46

Respuesta1

Sugeriría preguntar esto enUsuarios de Calico flojos.

En tus configuraciones de BGP vi que has deshabilitado la malla completa, ¿es eso intencional? ¿Estás usando reflectores de ruta? Node To Node Mesh Enabled: false

Yo comenzaría verificando la ruta BGP en Mikrotik. Algo como "ajustar si está ejecutando una versión anterior del sistema operativo del enrutador". /ip/route/print detail from=[find gateway~"^192.168.88.63[0x00-0 xff]*"]

Si eso parece correcto, comprobaría si se puede acceder a LB a través de Miki. /tool/fetch http://10.43.1.1

Luego verifique el estado del protocolo Bird, pruebe birdcl desde los pods de calico-node kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show protocols

kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show route

información relacionada