Patroni 상호 연결 장애 조치

2024-6-23 • tag-icon

3개의 데이터 센터:

패트로니 버전: 2.1.4

PostgreSQL 버전: 14.4

기타 버전: 3.3.11

DC	섬기는 사람	이름	주인	상태
1위	패트로니	Patori-s11	172.16.0.2	지도자
1위	패트로니	Patori-s12	172.16.0.3	동기화 대기
1위	ETCD	etcd-s11	172.16.0.4	지도자
2위	패트로니	Patori-s21	172.16.1.2	레플리카
2위	패트로니	Patori-s22	172.16.1.3	레플리카
2위	ETCD	etcd-s21	172.16.1.4	노예
3번째	패트로니	Patori-s31	172.16.2.2	레플리카
3번째	ETCD	etcd-s31	172.16.2.4	노예

첫 번째 데이터 센터와 두 번째 데이터 센터 간의 상호 연결 오류를 시뮬레이션했습니다. 두 DC는 모두 작동하지만 첫 번째와 두 번째는 서로 "인식"하지 않습니다.

이 경우 Patroni 리더는 여전히 1차 DC에 남아 있습니다. 그러나 두 번째 DC의 서버는 클러스터와 동기화되지 않습니다. 클러스터 상태를 믿으면 모두 괜찮고 서버 간 복제 지연이 없습니다. 실제로 마스터의 모든 변경 사항은 두 번째 데이터 센터의 복제본과 동기화되지 않습니다.

[user@patroni-s11 ~]$ sudo patronictl -c /etc/patroni/patroni.yml list
2022-12-01 16:00:00,015 - ERROR - Request to server 172.16.1.4:2379 failed: MaxRetryError("HTTPConnectionPool(host='172.16.1.4', port=2379): Max retries exceeded with url: /v2/keys/service/patroni_cluster/?recursive=true (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))",)
+ Cluster: patroni_cluster (7117639577766255236) ---+---------+-----+-----------+
| Member          | Host          | Role         | State   |  TL | Lag in MB |
+-----------------+---------------+--------------+---------+-----+-----------+
| patroni-s11     | 172.16.0.2    | Leader       | running | 103 |           |
| patroni-s12     | 172.16.0.3    | Sync Standby | running | 103 |         0 |
| patroni-s21     | 172.16.1.2    | Replica      | running | 103 |         0 |
| patroni-s22     | 172.16.1.3    | Replica      | running | 103 |         0 |
| patroni-s31     | 172.16.2.2    | Replica      | running | 103 |         0 |
+-----------------+---------------+--------------+---------+-----+-----------+

Etcd 서버에서는 여전히 발생하며 리더는 여전히 1차 DC에 남아 있습니다.

[user@etcd-s11 ~]$ sudo etcdctl cluster-health
failed to check the health of member a85c06b926e6c6c8 on 172.16.1.4:2379: Get 172.16.1.4:2379/health: read tcp 10.220.0.3:38836->172.16.1.4:2379: read: connection reset by peer
member 261f8081db14d568 is healthy: got healthy result from 172.16.0.4:2379
member a85c06b926e6c6c8 is unreachable: [172.16.1.4: 2379] are all unreachable
member b87bd1df518cc9e4 is healthy: got healthy result from 172.16.2.4:2379
cluster is degraded

[user@etcd-s11 ~]$ sudo etcdctl member list
261f8081db14d568: name=etcd-s11 peerURLs=172.16.0.4:2380 clientURLs=172.16.0.4:2379 isLeader=true
a85c06b926e6c6c8: name=etcd-s21 peerURLs=172.16.1.4:2380 clientURLs=172.16.1.4:2379 isLeader=false
b87bd1df518cc9e4: name=etcd-s31 peerURLs=172.16.2.4:2380 clientURLs=172.16.2.4: 2379 isLeader=false

하지만 3rd Data Center의 Etcd에서는 클러스터가 정상임을 확인합니다.

[user@etcd-s31 ~]$ sudo etcdctl cluster-health
member 261f8081db14d568 is healthy: got healthy result from http:// 172.16.0.4: 2379
member a85c06b926e6c6c8 is healthy: got healthy result from http:// 172.16.1.4: 2379
member b87bd1df518cc9e4 is healthy: got healthy result from http:// 172.16.2.4: 2379
cluster is healthy

나는 리더들이 3차 DC의 서버가 될 것이라고 예상했다.

이 경우 Patroni\etcd가 리더를 변경할 수 있습니까?

답변1

우선, qourm은 레벨 업이 있는 5/2이며, 사이트 1 + 사이트 3이 실행 중이고 본 동작이 예상된 동작인 경우 충족될 서버 3개입니다.

사이트 1 + 사이트 3이 qourm을 충족하지 않으면 달라질 것입니다.

답변1

관련 정보