%20%D1%81%20IP-%D0%B0%D0%B4%D1%80%D0%B5%D1%81%D0%BE%D0%BC%20%D0%B1%D0%B0%D0%BB%D0%B0%D0%BD%D1%81%D0%B8%D1%80%D0%BE%D0%B2%D1%89%D0%B8%D0%BA%D0%B0%20%D0%BD%D0%B0%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B8.png)
У меня есть локальная настройка кластера Kubernetes (v1.24) с cri-o в качестве среды выполнения контейнера, calico в качестве cni и metallb в качестве IP-адресов балансировщика нагрузки.
Основные и рабочие ОС работают на rockylinux9, selinux включен, но firewalld отключен, а прокси-сервер kube использует режим ipvs.
Я настроил пиринг bgp с маршрутизатором Mikrotik и увидел, что заданные мной диапазоны IP-адресов объявляются маршрутизатору через раздел маршрутов с записью, которая показывает
10.16.0.0/28 reachable through bridge
Мой тестовый внешний IP-адрес службы Nginx - это 10.16.0.1
, завивка IP-адресов с главных, рабочих узлов и внутри подов работает, и возвращает страницу приветствия Nginx по умолчанию, но при завивке с моего ноутбука он просто зависает до истечения времени ожидания, а журналы аудита Selinux не показывают никаких нарушений.
nmap IP показывает, что порт 80 открыт, ping IP также работает, traceroute также показывает правильный ответ, и для проверки на безумие я также удаляю службу и снова запускаю nmap, ping и traceroute, и все перестает работать так, как ожидалось.
# commands below runs on my laptop
# that connected to local network
# but result also same as I run on
# other devices on the network
# ---
❯ nmap -T4 10.16.0.1
Starting Nmap 7.94 ( https://nmap.org ) at 2023-09-23 11:01 +07
Nmap scan report for 10.16.0.1
Host is up (0.0048s latency).
Not shown: 997 closed tcp ports (conn-refused)
PORT STATE SERVICE
22/tcp open ssh
80/tcp filtered http
179/tcp open bgp
# ---
# the 192.168.88.43 is local master node ip
❯ ping 10.16.0.1
PING 10.16.0.1 (10.16.0.1): 56 data bytes
64 bytes from 10.16.0.1: icmp_seq=0 ttl=64 time=9.144 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 c3b4 0 0000 3f 01 94cc 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=1 ttl=64 time=3.003 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 c9cc 0 0000 3f 01 8eb4 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=2 ttl=64 time=3.209 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 305b 0 0000 3f 01 2826 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=3 ttl=64 time=2.557 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 25d8 0 0000 3f 01 32a9 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=4 ttl=64 time=3.594 ms
x92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 4016 0 0000 3f 01 186b 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=5 ttl=64 time=2.974 ms
64 bytes from 10.16.0.1: icmp_seq=6 ttl=64 time=4.397 ms
^C
--- 10.16.0.1 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.557/4.125/9.144/2.119 ms
# --
# 192.168.88.1 is my default gateway
❯ traceroute 10.16.0.1
traceroute to 10.16.0.1 (10.16.0.1), 64 hops max, 52 byte packets
1 192.168.88.1 (192.168.88.1) 3.669 ms 2.713 ms 2.552 ms
2 10.16.0.1 (10.16.0.1) 3.303 ms 3.292 ms 3.145 ms
проверка информации nginx
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 39m 172.16.29.154 k8s-worker-0 <none> <none>
❯ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 42h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d5h
nginx LoadBalancer 10.97.153.41 10.16.0.1 80:30422/TCP 40m
❯ kubectl get endpoints
NAME ENDPOINTS AGE 42h
kubernetes 192.168.88.43:6443 2d5h
nginx 172.16.29.154:80 41m
❯ kubectl describe service nginx
Name: nginx
Namespace: default
Labels: <none>
Annotations: metallb.universe.tf/address-pool: public
metallb.universe.tf/ip-allocated-from-pool: public
Selector: app=nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.97.153.41
IPs: 10.97.153.41
LoadBalancer Ingress: 10.16.0.1
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 30422/TCP
Endpoints: 172.16.29.154:80
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 31552
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 41m metallb-controller Assigned IP ["10.16.0.1"]
конфигурации calico
❯ kubectl describe bgppeer
Name: global-peer
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPPeer
Metadata:
Creation Timestamp: 2023-09-22T06:37:35Z
Resource Version: 430164
UID: 62d549ac-b7ed-47ab-bc2c-b94de8a08939
Spec:
As Number: 65530
Filters:
default
Peer IP: 192.168.88.1
Events: <none>
❯ kubectl describe bgpconfiguration
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPConfiguration
Metadata:
Creation Timestamp: 2023-09-22T06:37:24Z
Resource Version: 839696
UID: fd6aef3e-0a4c-4ecc-afe4-09395b76d107
Spec:
As Number: 65500
Node To Node Mesh Enabled: false
Service Load Balancer I Ps:
Cidr: 10.16.0.0/28
Events: <none>
❯ kubectl describe bgpfilter
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPFilter
Metadata:
Creation Timestamp: 2023-09-22T06:37:34Z
Resource Version: 434512
UID: df8b5aeb-e25e-481c-9403-fcc54727eea8
Spec:
exportV4:
Action: Reject
Cidr: 10.16.0.0/28
Match Operator: NotIn
Events: <none>
❯ kubectl describe installation
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: operator.tigera.io/v1
Kind: Installation
Metadata:
Creation Timestamp: 2023-09-21T05:32:53Z
Finalizers:
tigera.io/operator-cleanup
Generation: 3
Resource Version: 1052996
UID: 2f7928cd-3815-42be-9483-512f2f39fcf9
Spec:
Calico Network:
Bgp: Enabled
Host Ports: Enabled
Ip Pools:
Block Size: 26
Cidr: 172.16.0.0/16
Disable BGP Export: false
Encapsulation: None
Nat Outgoing: Disabled
Node Selector: all()
Linux Dataplane: Iptables
Multi Interface Mode: None
nodeAddressAutodetectionV4:
First Found: true
Cni:
Ipam:
Type: Calico
Type: Calico
Control Plane Replicas: 2
Flex Volume Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
Kubelet Volume Plugin Path: /var/lib/kubelet
Logging:
Cni:
Log File Max Age Days: 30
Log File Max Count: 10
Log File Max Size: 100Mi
Log Severity: Info
Node Update Strategy:
Rolling Update:
Max Unavailable: 1
Type: RollingUpdate
Non Privileged: Disabled
Variant: Calico
Status:
Calico Version: v3.26.1
Computed:
Calico Network:
Bgp: Enabled
Host Ports: Enabled
Ip Pools:
Block Size: 26
Cidr: 172.16.0.0/16
Disable BGP Export: false
Encapsulation: None
Nat Outgoing: Disabled
Node Selector: all()
Linux Dataplane: Iptables
Multi Interface Mode: None
nodeAddressAutodetectionV4:
First Found: true
Cni:
Ipam:
Type: Calico
Type: Calico
Control Plane Replicas: 2
Flex Volume Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
Kubelet Volume Plugin Path: /var/lib/kubelet
Logging:
Cni:
Log File Max Age Days: 30
Log File Max Count: 10
Log File Max Size: 100Mi
Log Severity: Info
Node Update Strategy:
Rolling Update:
Max Unavailable: 1
Type: RollingUpdate
Non Privileged: Disabled
Variant: Calico
Conditions:
Last Transition Time: 2023-09-23T15:08:30Z
Message:
Observed Generation: 3
Reason: Unknown
Status: False
Type: Degraded
Last Transition Time: 2023-09-23T15:08:30Z
Message:
Observed Generation: 3
Reason: Unknown
Status: False
Type: Ready
Last Transition Time: 2023-09-23T15:08:30Z
Message: DaemonSet "calico-system/calico-node" is not available (awaiting 1 nodes)
Observed Generation: 3
Reason: ResourceNotReady
Status: True
Type: Progressing
Mtu: 1450
Variant: Calico
Events: <none>
calicoctl node status
показывает
конфигурация metallb
❯ kubectl -n metallb-system describe ipaddresspool
Name: public
Namespace: metallb-system
Labels: <none>
Annotations: <none>
API Version: metallb.io/v1beta1
Kind: IPAddressPool
Metadata:
Creation Timestamp: 2023-09-22T07:57:01Z
Generation: 3
Resource Version: 434511
UID: aa2d0e08-4fee-4ce1-a822-9bf4f52a3a3c
Spec:
Addresses:
10.16.0.0/28
Auto Assign: true
Avoid Buggy I Ps: true
Events: <none>
Так как же отладить эту ошибку, почему я не могу получить доступ к службе Nginx извне кластера?
обновление0
Запуск tcpdump -n -i ens18 host 10.16.0.1
на главном и рабочем узлах, а затем попытка запуска ping 10.16.0.1
на arping -I 10.16.0.1
другой машине за пределами кластера выдает мне эти данные на главном и рабочем узлах
# ping only show on master node
# 192.168.88.111 is the other machine(my laptop)
dropped privs to tcpdump
tcpdump: listening on ens18, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:31:29.842416 IP (tos 0x0, ttl 63, id 33297, offset 0, flags [none], proto ICMP (1), length 84)
192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 0, length 64
15:31:29.842465 IP (tos 0x0, ttl 64, id 38235, offset 0, flags [none], proto ICMP (1), length 84)
10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 0, length 64
15:31:30.846173 IP (tos 0x0, ttl 63, id 14503, offset 0, flags [none], proto ICMP (1), length 84)
192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 1, length 64
15:31:30.846206 IP (tos 0x0, ttl 64, id 38836, offset 0, flags [none], proto ICMP (1), length 84)
10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 1, length 64
# arping received 0 responses
# arping showing both on master and worker node
15:33:08.269603 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:09.269759 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:10.269768 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:11.269725 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:12.269575 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:13.269695 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:14.269685 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:15.269715 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:16.269754 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
решение1
Я бы предложил задать этот вопрос вПользователи Calico не справляются.
В настройках BGP я увидел, что вы отключили FullMesh. Это намеренно? Вы используете Route-Reflectors?
Node To Node Mesh Enabled: false
Я бы начал с проверки маршрута BGP на Mikrotik. Что-то вроде «настройте, если вы используете старую версию ОС маршрутизатора».
/ip/route/print detail from=[find gateway~"^192.168.88.63[0x00-0 xff]*"]
Если это выглядит правильно, я бы проверил, доступен ли LB через Мики.
/tool/fetch http://10.43.1.1
Затем проверьте статус протокола Bird, попробуйте birdcl из calico-node pods
kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show protocols
kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show route