로드 밸런서 IP를 사용하여 Kubernetes 클러스터(온프레미스) 외부에서 서비스에 액세스할 수 없는 디버깅 방법

로드 밸런서 IP를 사용하여 Kubernetes 클러스터(온프레미스) 외부에서 서비스에 액세스할 수 없는 디버깅 방법

cri-o를 컨테이너 런타임으로, calico를 cni로, metallb를 로드 밸런서 IP로 사용하는 온프레미스 Kubernetes(v1.24) 클러스터 설정이 있습니다.

마스터 및 작업자 OS는 rockylinux9에서 실행되고 selinux는 활성화되지만 방화벽은 비활성화되며 kube 프록시는 ipvs 모드를 사용합니다.

Mikrotik 라우터와 bgp 피어링을 설정했고 내가 설정한 IP 범위가 다음과 같은 레코드와 함께 경로 섹션을 통해 라우터에 광고되는 것을 확인했습니다.

10.16.0.0/28 reachable through bridge

내 테스트 Nginx 서비스 외부 IP는 10.16.0.1마스터, 작업자 및 포드 내에서 IP를 컬링하여 기본 Nginx 시작 페이지를 반환하지만 랩톱에서 컬링할 때 시간 초과될 때까지 멈추고 selinux 감사 로그에는 위반 사항이 표시되지 않습니다.

nmap은 IP에 포트 80이 열려 있음을 보여주고, IP에 대한 ping도 작동하고, Traceroute도 올바른 응답을 표시하며 정신 이상 확인을 위해 서비스를 삭제하고 nmap, ping 및 Traceroute를 다시 수행하면 예상대로 작동이 중지됩니다.

# commands below runs on my laptop
# that connected to local network
# but result also same as I run on
# other devices on the network

# ---

❯ nmap -T4 10.16.0.1
Starting Nmap 7.94 ( https://nmap.org ) at 2023-09-23 11:01 +07
Nmap scan report for 10.16.0.1
Host is up (0.0048s latency).
Not shown: 997 closed tcp ports (conn-refused)
PORT    STATE    SERVICE
22/tcp  open     ssh
80/tcp  filtered http
179/tcp open     bgp

# ---

# the 192.168.88.43 is local master node ip
❯ ping 10.16.0.1
PING 10.16.0.1 (10.16.0.1): 56 data bytes
64 bytes from 10.16.0.1: icmp_seq=0 ttl=64 time=9.144 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 c3b4   0 0000  3f  01 94cc 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=1 ttl=64 time=3.003 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 c9cc   0 0000  3f  01 8eb4 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=2 ttl=64 time=3.209 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 305b   0 0000  3f  01 2826 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=3 ttl=64 time=2.557 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 25d8   0 0000  3f  01 32a9 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=4 ttl=64 time=3.594 ms
x92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 4016   0 0000  3f  01 186b 192.168.88.111  10.16.0.1

64 bytes from 10.16.0.1: icmp_seq=5 ttl=64 time=2.974 ms

64 bytes from 10.16.0.1: icmp_seq=6 ttl=64 time=4.397 ms
^C
--- 10.16.0.1 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.557/4.125/9.144/2.119 ms

# --

# 192.168.88.1 is my default gateway
❯ traceroute 10.16.0.1
traceroute to 10.16.0.1 (10.16.0.1), 64 hops max, 52 byte packets
 1  192.168.88.1 (192.168.88.1)  3.669 ms  2.713 ms  2.552 ms
 2  10.16.0.1 (10.16.0.1)  3.303 ms  3.292 ms  3.145 ms

nginx 정보 확인 중

❯ kubectl get pods -o wide
NAME                                 READY   STATUS    RESTARTS        AGE   IP              NODE           NOMINATED NODE   READINESS GATES
nginx                                1/1     Running   0               39m   172.16.29.154   k8s-worker-0   <none>           <none>

❯ kubectl get svc
NAME                 TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                               AGE                              42h
kubernetes           ClusterIP      10.96.0.1      <none>        443/TCP                               2d5h
nginx                LoadBalancer   10.97.153.41   10.16.0.1     80:30422/TCP                          40m

❯ kubectl get endpoints
NAME                 ENDPOINTS                                                               AGE                                                    42h
kubernetes           192.168.88.43:6443                                                      2d5h
nginx                172.16.29.154:80                                                        41m

❯ kubectl describe service nginx
Name:                     nginx
Namespace:                default
Labels:                   <none>
Annotations:              metallb.universe.tf/address-pool: public
                          metallb.universe.tf/ip-allocated-from-pool: public
Selector:                 app=nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.97.153.41
IPs:                      10.97.153.41
LoadBalancer Ingress:     10.16.0.1
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30422/TCP
Endpoints:                172.16.29.154:80
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     31552
Events:
  Type    Reason       Age   From                Message
  ----    ------       ----  ----                -------
  Normal  IPAllocated  41m   metallb-controller  Assigned IP ["10.16.0.1"]

옥양목 구성

❯ kubectl describe bgppeer
Name:         global-peer
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPPeer
Metadata:
  Creation Timestamp:  2023-09-22T06:37:35Z
  Resource Version:    430164
  UID:                 62d549ac-b7ed-47ab-bc2c-b94de8a08939
Spec:
  As Number:  65530
  Filters:
    default
  Peer IP:  192.168.88.1
Events:     <none>

❯ kubectl describe bgpconfiguration
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPConfiguration
Metadata:
  Creation Timestamp:  2023-09-22T06:37:24Z
  Resource Version:    839696
  UID:                 fd6aef3e-0a4c-4ecc-afe4-09395b76d107
Spec:
  As Number:                  65500
  Node To Node Mesh Enabled:  false
  Service Load Balancer I Ps:
    Cidr:  10.16.0.0/28
Events:    <none>

❯ kubectl describe bgpfilter
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  projectcalico.org/v3
Kind:         BGPFilter
Metadata:
  Creation Timestamp:  2023-09-22T06:37:34Z
  Resource Version:    434512
  UID:                 df8b5aeb-e25e-481c-9403-fcc54727eea8
Spec:
  exportV4:
    Action:          Reject
    Cidr:            10.16.0.0/28
    Match Operator:  NotIn
Events:              <none>

❯ kubectl describe installation
Name:         default
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  operator.tigera.io/v1
Kind:         Installation
Metadata:
  Creation Timestamp:  2023-09-21T05:32:53Z
  Finalizers:
    tigera.io/operator-cleanup
  Generation:        3
  Resource Version:  1052996
  UID:               2f7928cd-3815-42be-9483-512f2f39fcf9
Spec:
  Calico Network:
    Bgp:         Enabled
    Host Ports:  Enabled
    Ip Pools:
      Block Size:          26
      Cidr:                172.16.0.0/16
      Disable BGP Export:  false
      Encapsulation:       None
      Nat Outgoing:        Disabled
      Node Selector:       all()
    Linux Dataplane:       Iptables
    Multi Interface Mode:  None
    nodeAddressAutodetectionV4:
      First Found:  true
  Cni:
    Ipam:
      Type:                    Calico
    Type:                      Calico
  Control Plane Replicas:      2
  Flex Volume Path:            /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  Kubelet Volume Plugin Path:  /var/lib/kubelet
  Logging:
    Cni:
      Log File Max Age Days:  30
      Log File Max Count:     10
      Log File Max Size:      100Mi
      Log Severity:           Info
  Node Update Strategy:
    Rolling Update:
      Max Unavailable:  1
    Type:               RollingUpdate
  Non Privileged:       Disabled
  Variant:              Calico
Status:
  Calico Version:  v3.26.1
  Computed:
    Calico Network:
      Bgp:         Enabled
      Host Ports:  Enabled
      Ip Pools:
        Block Size:          26
        Cidr:                172.16.0.0/16
        Disable BGP Export:  false
        Encapsulation:       None
        Nat Outgoing:        Disabled
        Node Selector:       all()
      Linux Dataplane:       Iptables
      Multi Interface Mode:  None
      nodeAddressAutodetectionV4:
        First Found:  true
    Cni:
      Ipam:
        Type:                    Calico
      Type:                      Calico
    Control Plane Replicas:      2
    Flex Volume Path:            /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    Kubelet Volume Plugin Path:  /var/lib/kubelet
    Logging:
      Cni:
        Log File Max Age Days:  30
        Log File Max Count:     10
        Log File Max Size:      100Mi
        Log Severity:           Info
    Node Update Strategy:
      Rolling Update:
        Max Unavailable:  1
      Type:               RollingUpdate
    Non Privileged:       Disabled
    Variant:              Calico
  Conditions:
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:
    Observed Generation:   3
    Reason:                Unknown
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:
    Observed Generation:   3
    Reason:                Unknown
    Status:                False
    Type:                  Ready
    Last Transition Time:  2023-09-23T15:08:30Z
    Message:               DaemonSet "calico-system/calico-node" is not available (awaiting 1 nodes)
    Observed Generation:   3
    Reason:                ResourceNotReady
    Status:                True
    Type:                  Progressing
  Mtu:                     1450
  Variant:                 Calico
Events:                    <none>

calicoctl node status

여기에 이미지 설명을 입력하세요

메탈파운드 구성

❯ kubectl -n metallb-system describe ipaddresspool
Name:         public
Namespace:    metallb-system
Labels:       <none>
Annotations:  <none>
API Version:  metallb.io/v1beta1
Kind:         IPAddressPool
Metadata:
  Creation Timestamp:  2023-09-22T07:57:01Z
  Generation:          3
  Resource Version:    434511
  UID:                 aa2d0e08-4fee-4ce1-a822-9bf4f52a3a3c
Spec:
  Addresses:
    10.16.0.0/28
  Auto Assign:       true
  Avoid Buggy I Ps:  true
Events:              <none>

그러면 이를 디버깅하는 방법, 클러스터 외부에서 Nginx 서비스에 액세스할 수 없는 이유는 무엇입니까?

업데이트0

마스터 노드와 작업자 노드 모두에서 실행 한 다음 클러스터 외부의 다른 머신에서 실행하려고 하면 마스터 노드와 작업자 노드 모두에서 이러한 정보가 제공 됩니다 tcpdump -n -i ens18 host 10.16.0.1.ping 10.16.0.1arping -I 10.16.0.1

# ping only show on master node
# 192.168.88.111 is the other machine(my laptop)
dropped privs to tcpdump
tcpdump: listening on ens18, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:31:29.842416 IP (tos 0x0, ttl 63, id 33297, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 0, length 64
15:31:29.842465 IP (tos 0x0, ttl 64, id 38235, offset 0, flags [none], proto ICMP (1), length 84)
    10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 0, length 64
15:31:30.846173 IP (tos 0x0, ttl 63, id 14503, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 1, length 64
15:31:30.846206 IP (tos 0x0, ttl 64, id 38836, offset 0, flags [none], proto ICMP (1), length 84)
    10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 1, length 64
# arping received 0 responses
# arping showing both on master and worker node
15:33:08.269603 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:09.269759 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:10.269768 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:11.269725 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:12.269575 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:13.269695 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:14.269685 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:15.269715 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:16.269754 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46

답변1

나는 이것을 물어볼 것을 제안합니다Calico 사용자 여유.

BGPConfigurations에서 풀메시를 비활성화한 것으로 확인되었습니다. 의도적인 것입니까? 경로 반사기를 사용하고 있습니까? Node To Node Mesh Enabled: false

Mikrotik에서 BGP 경로를 확인하는 것부터 시작하겠습니다. "이전 버전의 라우터 OS를 실행 중인 경우 조정"과 같은 것입니다. /ip/route/print detail from=[find gateway~"^192.168.88.63[0x00-0 xff]*"]

그것이 정확해 보인다면 Miki를 통해 LB를 연결할 수 있는지 확인하겠습니다. /tool/fetch http://10.43.1.1

그런 다음 Bird 프로토콜 상태를 확인하고 calico-node 포드에서 Birdcl을 사용해 보세요. kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show protocols

kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show route

관련 정보