%20%E3%81%AE%E5%A4%96%E9%83%A8%E3%81%8B%E3%82%89%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%81%AB%E3%82%A2%E3%82%AF%E3%82%BB%E3%82%B9%E3%81%A7%E3%81%8D%E3%81%AA%E3%81%84%E3%81%93%E3%81%A8%E3%82%92%E3%83%87%E3%83%90%E3%83%83%E3%82%B0%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95.png)
コンテナ ランタイムとして cri-o、cni として calico、ロード バランサ IP として metallb を使用して、オンプレミスの Kubernetes (v1.24) クラスターをセットアップしています。
マスターとワーカー OS は rockylinux9 上で実行され、selinux は有効になっていますが、firewalld は無効になっており、kube プロキシは ipvs モードを使用します。
私はMikrotikルーターでBGPピアリングを設定しましたが、設定したIP範囲がルートセクションを通じてルーターにアドバタイズされ、次のようなレコードが表示されました。
10.16.0.0/28 reachable through bridge
私のテストNginxサービスの外部IPは10.16.0.1
、マスター、ワーカー、ポッド内のIPをcurlingすると、デフォルトのNginxウェルカムページが返されますが、ラップトップからcurlingすると、タイムアウトまでハングし、selinux監査ログには違反が何も表示されません。
IP の nmap ではポート 80 が開いていることが示され、IP の ping も機能し、traceroute も正しい応答を示します。また、異常チェックのためにサービスを削除し、nmap、ping、traceroute を再度実行すると、期待どおりに動作しなくなります。
# commands below runs on my laptop
# that connected to local network
# but result also same as I run on
# other devices on the network
# ---
❯ nmap -T4 10.16.0.1
Starting Nmap 7.94 ( https://nmap.org ) at 2023-09-23 11:01 +07
Nmap scan report for 10.16.0.1
Host is up (0.0048s latency).
Not shown: 997 closed tcp ports (conn-refused)
PORT STATE SERVICE
22/tcp open ssh
80/tcp filtered http
179/tcp open bgp
# ---
# the 192.168.88.43 is local master node ip
❯ ping 10.16.0.1
PING 10.16.0.1 (10.16.0.1): 56 data bytes
64 bytes from 10.16.0.1: icmp_seq=0 ttl=64 time=9.144 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 c3b4 0 0000 3f 01 94cc 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=1 ttl=64 time=3.003 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 c9cc 0 0000 3f 01 8eb4 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=2 ttl=64 time=3.209 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 305b 0 0000 3f 01 2826 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=3 ttl=64 time=2.557 ms
92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 25d8 0 0000 3f 01 32a9 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=4 ttl=64 time=3.594 ms
x92 bytes from 192.168.88.1: Redirect Host(New addr: 192.168.88.43)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 4016 0 0000 3f 01 186b 192.168.88.111 10.16.0.1
64 bytes from 10.16.0.1: icmp_seq=5 ttl=64 time=2.974 ms
64 bytes from 10.16.0.1: icmp_seq=6 ttl=64 time=4.397 ms
^C
--- 10.16.0.1 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.557/4.125/9.144/2.119 ms
# --
# 192.168.88.1 is my default gateway
❯ traceroute 10.16.0.1
traceroute to 10.16.0.1 (10.16.0.1), 64 hops max, 52 byte packets
1 192.168.88.1 (192.168.88.1) 3.669 ms 2.713 ms 2.552 ms
2 10.16.0.1 (10.16.0.1) 3.303 ms 3.292 ms 3.145 ms
nginx情報の確認
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 39m 172.16.29.154 k8s-worker-0 <none> <none>
❯ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 42h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d5h
nginx LoadBalancer 10.97.153.41 10.16.0.1 80:30422/TCP 40m
❯ kubectl get endpoints
NAME ENDPOINTS AGE 42h
kubernetes 192.168.88.43:6443 2d5h
nginx 172.16.29.154:80 41m
❯ kubectl describe service nginx
Name: nginx
Namespace: default
Labels: <none>
Annotations: metallb.universe.tf/address-pool: public
metallb.universe.tf/ip-allocated-from-pool: public
Selector: app=nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.97.153.41
IPs: 10.97.153.41
LoadBalancer Ingress: 10.16.0.1
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 30422/TCP
Endpoints: 172.16.29.154:80
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 31552
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 41m metallb-controller Assigned IP ["10.16.0.1"]
カリコ設定
❯ kubectl describe bgppeer
Name: global-peer
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPPeer
Metadata:
Creation Timestamp: 2023-09-22T06:37:35Z
Resource Version: 430164
UID: 62d549ac-b7ed-47ab-bc2c-b94de8a08939
Spec:
As Number: 65530
Filters:
default
Peer IP: 192.168.88.1
Events: <none>
❯ kubectl describe bgpconfiguration
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPConfiguration
Metadata:
Creation Timestamp: 2023-09-22T06:37:24Z
Resource Version: 839696
UID: fd6aef3e-0a4c-4ecc-afe4-09395b76d107
Spec:
As Number: 65500
Node To Node Mesh Enabled: false
Service Load Balancer I Ps:
Cidr: 10.16.0.0/28
Events: <none>
❯ kubectl describe bgpfilter
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: projectcalico.org/v3
Kind: BGPFilter
Metadata:
Creation Timestamp: 2023-09-22T06:37:34Z
Resource Version: 434512
UID: df8b5aeb-e25e-481c-9403-fcc54727eea8
Spec:
exportV4:
Action: Reject
Cidr: 10.16.0.0/28
Match Operator: NotIn
Events: <none>
❯ kubectl describe installation
Name: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: operator.tigera.io/v1
Kind: Installation
Metadata:
Creation Timestamp: 2023-09-21T05:32:53Z
Finalizers:
tigera.io/operator-cleanup
Generation: 3
Resource Version: 1052996
UID: 2f7928cd-3815-42be-9483-512f2f39fcf9
Spec:
Calico Network:
Bgp: Enabled
Host Ports: Enabled
Ip Pools:
Block Size: 26
Cidr: 172.16.0.0/16
Disable BGP Export: false
Encapsulation: None
Nat Outgoing: Disabled
Node Selector: all()
Linux Dataplane: Iptables
Multi Interface Mode: None
nodeAddressAutodetectionV4:
First Found: true
Cni:
Ipam:
Type: Calico
Type: Calico
Control Plane Replicas: 2
Flex Volume Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
Kubelet Volume Plugin Path: /var/lib/kubelet
Logging:
Cni:
Log File Max Age Days: 30
Log File Max Count: 10
Log File Max Size: 100Mi
Log Severity: Info
Node Update Strategy:
Rolling Update:
Max Unavailable: 1
Type: RollingUpdate
Non Privileged: Disabled
Variant: Calico
Status:
Calico Version: v3.26.1
Computed:
Calico Network:
Bgp: Enabled
Host Ports: Enabled
Ip Pools:
Block Size: 26
Cidr: 172.16.0.0/16
Disable BGP Export: false
Encapsulation: None
Nat Outgoing: Disabled
Node Selector: all()
Linux Dataplane: Iptables
Multi Interface Mode: None
nodeAddressAutodetectionV4:
First Found: true
Cni:
Ipam:
Type: Calico
Type: Calico
Control Plane Replicas: 2
Flex Volume Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
Kubelet Volume Plugin Path: /var/lib/kubelet
Logging:
Cni:
Log File Max Age Days: 30
Log File Max Count: 10
Log File Max Size: 100Mi
Log Severity: Info
Node Update Strategy:
Rolling Update:
Max Unavailable: 1
Type: RollingUpdate
Non Privileged: Disabled
Variant: Calico
Conditions:
Last Transition Time: 2023-09-23T15:08:30Z
Message:
Observed Generation: 3
Reason: Unknown
Status: False
Type: Degraded
Last Transition Time: 2023-09-23T15:08:30Z
Message:
Observed Generation: 3
Reason: Unknown
Status: False
Type: Ready
Last Transition Time: 2023-09-23T15:08:30Z
Message: DaemonSet "calico-system/calico-node" is not available (awaiting 1 nodes)
Observed Generation: 3
Reason: ResourceNotReady
Status: True
Type: Progressing
Mtu: 1450
Variant: Calico
Events: <none>
calicoctl node status
ショー
メタルLB設定
❯ kubectl -n metallb-system describe ipaddresspool
Name: public
Namespace: metallb-system
Labels: <none>
Annotations: <none>
API Version: metallb.io/v1beta1
Kind: IPAddressPool
Metadata:
Creation Timestamp: 2023-09-22T07:57:01Z
Generation: 3
Resource Version: 434511
UID: aa2d0e08-4fee-4ce1-a822-9bf4f52a3a3c
Spec:
Addresses:
10.16.0.0/28
Auto Assign: true
Avoid Buggy I Ps: true
Events: <none>
では、これをデバッグするにはどうすればよいでしょうか。クラスター外部から Nginx サービスにアクセスできないのはなぜでしょうか。
更新0
マスターノードとワーカーノードの両方で実行し、クラスター外の他のマシンで実行しようとすると、マスターノードとワーカーノードの両方で次のようになりますtcpdump -n -i ens18 host 10.16.0.1
。ping 10.16.0.1
arping -I 10.16.0.1
# ping only show on master node
# 192.168.88.111 is the other machine(my laptop)
dropped privs to tcpdump
tcpdump: listening on ens18, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:31:29.842416 IP (tos 0x0, ttl 63, id 33297, offset 0, flags [none], proto ICMP (1), length 84)
192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 0, length 64
15:31:29.842465 IP (tos 0x0, ttl 64, id 38235, offset 0, flags [none], proto ICMP (1), length 84)
10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 0, length 64
15:31:30.846173 IP (tos 0x0, ttl 63, id 14503, offset 0, flags [none], proto ICMP (1), length 84)
192.168.88.111 > 10.16.0.1: ICMP echo request, id 14677, seq 1, length 64
15:31:30.846206 IP (tos 0x0, ttl 64, id 38836, offset 0, flags [none], proto ICMP (1), length 84)
10.16.0.1 > 192.168.88.111: ICMP echo reply, id 14677, seq 1, length 64
# arping received 0 responses
# arping showing both on master and worker node
15:33:08.269603 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:09.269759 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:10.269768 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:11.269725 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:12.269575 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:13.269695 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:14.269685 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:15.269715 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
15:33:16.269754 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.16.0.1 (Broadcast) tell 192.168.88.63, length 46
答え1
これを尋ねることをお勧めしますCalico ユーザーの怠慢。
BGPConfigurations でフルメッシュが無効になっているのを確認しましたが、これは意図的ですか? ルートリフレクタを使用していますか?
Node To Node Mesh Enabled: false
まず、MikrotikのBGPルートをチェックします。「ルーターOSの古いバージョンを実行している場合は調整してください」
/ip/route/print detail from=[find gateway~"^192.168.88.63[0x00-0 xff]*"]
それが正しいようであれば、LBがミキ経由でアクセス可能かどうかを確認します
/tool/fetch http://10.43.1.1
次に、Birdプロトコルのステータスを確認し、calico-nodeポッドからbirdclを試してください。
kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show protocols
kubectl exec -n calico-system ds/calico-node -c calico-node -- birdcl show route