Keepalived는 Kubernetes 클러스터 설정 후 BACKUP 노드로 트래픽을 전달하지 않습니다.

2024-6-23 • tag-icon

Keepalived는 Kubernetes 클러스터 설정 후 BACKUP 노드로 트래픽을 전달하지 않습니다.

시스템 구조:

10.10.1.86: Kubernetes 마스터 노드
10.10.1.87: Kubernetes 작업자 1 노드; keepalived마스터 노드
10.10.1.88: Kubernetes 작업자 2 노드; keepalived백업 노드
10.10.1.90: VIP, 로드 밸런스는 .87& .88; 에 의해 구현되었습니다 keepalived.

이 Kubernetes 클러스터는 Netflow 로그 수집을 테스트하는 개발 환경입니다.

내가 달성하고 싶은 것은 다음과 같습니다

모든 라우터/스위치 netflow 로그 첫 번째 출력은 다음과 같습니다..90
그런 다음 두 개의 Kubernetes 작업자인 & 에 keepalived로드 밸런싱( lb_kind: NAT) 을 사용합니다..87.88
NodePort이러한 트래픽을 Kubernetes 클러스터로 포착하고 나머지 데이터 구문 분석 작업을 수행하는 서비스가 있습니다 .

다음과 같은 것 :

        |                {OS Network}                   |   {Kubernetes Network}

                                K8s Worker -> filebeat -> logstash (deployments)
                              /
<data> -> [VIP] load balance
                              \ 
                                K8s Worker -> filebeat -> logstash (deployments)

filebeat.yml(filebeat 이후 트래픽이 모두 괜찮은지 테스트했으므로 file출력을 사용하여 근본 원인을 좁힙니다.)

# cat filebeat.yml
filebeat.inputs:

- type: tcp
  max_message_size: 10MiB
  host: "0.0.0.0:5100"

- type: udp
  max_message_size: 10KiB
  host: "0.0.0.0:5150"




#output.logstash:
#  hosts: ["10.10.1.87:30044", "10.10.1.88:30044"]
output.file:
  path: "/tmp/"
  filename: tmp-filebeat.out

쿠버네티스

마스터와 작업자는 내 개인 환경에 있는 3개의 VM입니다. GCP 또는 AWS 제공업체가 아닙니다.
버전:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

서비스

# cat logstash.service.yaml
apiVersion: v1
kind: Service
metadata:
  name: logstash-service
spec:
  type: NodePort
  selector:
    app: logstash
  ports:
    - port: 9514
      name: tcp-port
      targetPort: 9514
      nodePort: 30044

데이터가 Kubernetes에 들어오면 모든 것이 잘 작동합니다.
포워딩이 아닌 VIP 로드밸런싱이었습니다.

Keepalived conf

!Configuration File for keepalived
global_defs {
  router_id proxy1   # `proxy 2` at the other node
}


vrrp_instance VI_1 {
  state MASTER       # `BACKUP` at the other node
  interface ens160
  virtual_router_id 41
  priority 100       # `50` at the other node
  advert_int 1
  virtual_ipaddress {
    10.10.1.90/23
  }
}

virtual_server 10.10.1.90 5100 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol TCP
  persistence_timeout 0

  real_server 10.10.1.87 5100 {
    weight 1
  }
  real_server 10.10.1.88 5100 {
    weight 1
  }
}
virtual_server 10.10.1.90 5150 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol UDP
  persistence_timeout 0

  real_server 10.10.1.87 5150 {
    weight 1
  }
  real_server 10.10.1.88 5150 {
    weight 1
  }

Kubernetes 클러스터 설정 전에 작동합니다.

.87& 가 모두 .88설치되어 있고 keepalived( rrRoundRobin) 로드 밸런싱이 제대로 작동합니다(TCP 및 UDP).
만일의 경우를 대비하여 Kubernetes 클러스터 설정으로 이동 시 keepalived서비스( )를 중지하세요 .systemctl stop keepalived

Kubernetes 클러스터 설정 후 문제가 발생했습니다.

MASTER 노드만 .87트래픽을 전달할 수 있는 것으로 확인되었으며 VIP는 BACKUP 노드로 전달할 수 없습니다 .88.
MASTER에서 전달된 데이터는 kubernetes NodePort및 배포에 의해 성공적으로 포착되었습니다.

문제 테스트 방법 `nc`:

nc: VIP(MASTER 노드)를 보유한 사람만 트래픽을 전달할 수 있으며, rrBACKUP으로 전달하면 시간 초과만 표시됩니다.
또한 nc -l 5100두 서버 모두에서 테스트한 결과 MASTER 노드만 결과를 얻었습니다.

# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.
# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.

일부 정보

패키지 버전

# rpm -qa |grep keepalived
keepalived-1.3.5-19.el7.x86_64

쿠버네티스 CNI:Calico

# kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-b656ddcfc-wnkcj   1/1     Running   2          78d
calico-node-vnf4d                         1/1     Running   8          78d
calico-node-xgzd5                         1/1     Running   1          78d
calico-node-zt25t                         1/1     Running   8          78d
coredns-558bd4d5db-n6hnn                  1/1     Running   2          78d
coredns-558bd4d5db-zz2rb                  1/1     Running   2          78d
etcd-a86.axv.bz                           1/1     Running   2          78d
kube-apiserver-a86.axv.bz                 1/1     Running   2          78d
kube-controller-manager-a86.axv.bz        1/1     Running   2          78d
kube-proxy-ddwsr                          1/1     Running   2          78d
kube-proxy-hs4dx                          1/1     Running   3          78d
kube-proxy-qg2nq                          1/1     Running   1          78d
kube-scheduler-a86.axv.bz                 1/1     Running   2          78d

ipvsadm( , 에서도 동일한 결과 .87).88

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.10.1.90:5100 rr
  -> 10.10.1.87:5100              Masq    1      0          0
  -> 10.10.1.88:5100              Masq    1      0          0
UDP  10.10.1.90:5150 rr
  -> 10.10.1.87:5150              Masq    1      0          0
  -> 10.10.1.88:5150              Masq    1      0          0

Selinux는 항상Permissive
stop 하면 firewalld여전히 작동하지 않습니다.
sysctl차이점:

# before:
net.ipv4.conf.all.accept_redirects = 1
net.ipv4.conf.all.forwarding = 0
net.ipv4.conf.all.route_localnet = 0
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.lo.forwarding = 0
net.ipv4.ip_forward = 0

# after
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.route_localnet = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.lo.forwarding = 1
net.ipv4.ip_forward = 1

지금 추가 확인이 가능한지 잘 모르겠습니다. 조언해 주세요. 감사합니다!

시스템 구조:

쿠버네티스

Keepalived conf

Kubernetes 클러스터 설정 전에 작동합니다.

Kubernetes 클러스터 설정 후 문제가 발생했습니다.

문제 테스트 방법 nc:

일부 정보

관련 정보

문제 테스트 방법 `nc`: