
我們的 gluster 池中有 3 台伺服器。每台計算機的規格為(OVH 的 Advanced STOR-2 Gen 2):
- AMD 銳龍 7 Pro 3700 - 8c/16t - 3.6 GHz/4.4 GHz
- 6 x 14 TB 磁碟(WD DC HC 530,使用 CMR 技術)西部資料文檔
- 系統的 2 個補充驅動器。
- 64 Go ECC 2933 MHz
以下規格適用於一台機器,但即使與其他機器不同,也應該相似。
系統:
- Ubuntu 22.04.1 LTS
- 5.15.0-69-通用
zfs版本:
- zfs-2.1.5-1ubuntu6~22.04.1
- zfs-kmod-2.1.5-1ubuntu6~22.04.1
控制器:
-
2b:00.0 Mass storage controller [0180]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02) Subsystem: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:1000] Kernel driver in use: mpt3sas Kernel modules: mpt3sas
我不知道如何配置控制器(HBA 與否),但是
- 此類機器上沒有可用的硬體 raid
- 我可以存取每個驅動器的智慧數據。 (所以如果不是 HBA,至少是 JBOD)
我們的驅動器有故障,並進行了更換。這時我們就發現了一個問題:
#> zpool status; echo; date
pool: storage
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Apr 18 20:44:53 2023
17.4T scanned at 27.9M/s, 17.4T issued at 27.9M/s, 17.7T total
3.31T resilvered, 98.12% done, 03:29:17 to go
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
wwn-0x5000cca2ad235164 ONLINE 0 0 0
wwn-0x5000cca28f4ec59c ONLINE 0 0 0
wwn-0x5000cca2ad29cc1c ONLINE 0 0 0
wwn-0x5000cca2a31743d4 ONLINE 0 0 0
wwn-0x5000cca2a40f9b00 ONLINE 0 0 0
replacing-5 DEGRADED 0 0 0
9949261471066455025 UNAVAIL 0 0 0 was /dev/disk/by-id/wwn-0x5000cca2ad2eba3c-part1
scsi-SWDC_WUH721414AL5201_9LKLGWSG ONLINE 0 0 0 (resilvering)
errors: No known data errors
Wed Apr 26 10:44:02 UTC 2023
忘記向您展示池的用法:
zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
storage 76.4T 18.3T 58.1T - - 15% 23% 1.00x ONLINE -
Fri Apr 28 14:32:58 UTC 2023
儘管 gluster 是並行運行的,但我認為它不會造成太大的吞吐量。
然後是非常高的延遲:
我dmesg
看到了這些輸出:
[Tue Apr 25 10:30:31 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Tue Apr 25 10:30:31 2023] Tainted: P O 5.15.0-69-generic #76-Ubuntu
[Tue Apr 25 10:30:31 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 25 10:30:31 2023] task:txg_sync state:D stack: 0 pid: 1985 ppid: 2 flags:0x00004000
[Tue Apr 25 10:30:31 2023] Call Trace:
[Tue Apr 25 10:30:31 2023] <TASK>
[Tue Apr 25 10:30:31 2023] __schedule+0x24e/0x590
[Tue Apr 25 10:30:31 2023] schedule+0x69/0x110
[Tue Apr 25 10:30:31 2023] cv_wait_common+0xf8/0x130 [spl]
[Tue Apr 25 10:30:31 2023] ? wait_woken+0x70/0x70
[Tue Apr 25 10:30:31 2023] __cv_wait+0x15/0x20 [spl]
[Tue Apr 25 10:30:31 2023] arc_read+0x1e1/0x15c0 [zfs]
[Tue Apr 25 10:30:31 2023] ? arc_evict_cb_check+0x20/0x20 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x4f5/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x333/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x333/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x333/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x333/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x333/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitbp+0x813/0xcf0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visit_rootbp+0xe8/0x160 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_visitds+0x15d/0x4b0 [zfs]
[Tue Apr 25 10:30:31 2023] ? __kmalloc_node+0x166/0x3a0
[Tue Apr 25 10:30:31 2023] ? do_raw_spin_unlock+0x9/0x10 [spl]
[Tue Apr 25 10:30:31 2023] ? __raw_spin_unlock+0x9/0x10 [spl]
[Tue Apr 25 10:30:31 2023] ? __list_add+0x17/0x40 [spl]
[Tue Apr 25 10:30:31 2023] ? do_raw_spin_unlock+0x9/0x10 [spl]
[Tue Apr 25 10:30:31 2023] ? __raw_spin_unlock+0x9/0x10 [spl]
[Tue Apr 25 10:30:31 2023] ? tsd_hash_add+0x145/0x180 [spl]
[Tue Apr 25 10:30:31 2023] ? tsd_set+0x98/0xd0 [spl]
[Tue Apr 25 10:30:31 2023] dsl_scan_visit+0x1ae/0x2c0 [zfs]
[Tue Apr 25 10:30:31 2023] dsl_scan_sync+0x412/0x910 [zfs]
[Tue Apr 25 10:30:31 2023] spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
[Tue Apr 25 10:30:31 2023] spa_sync+0x2dc/0x5b0 [zfs]
[Tue Apr 25 10:30:31 2023] txg_sync_thread+0x266/0x2f0 [zfs]
[Tue Apr 25 10:30:31 2023] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Tue Apr 25 10:30:31 2023] thread_generic_wrapper+0x64/0x80 [spl]
[Tue Apr 25 10:30:31 2023] ? __thread_exit+0x20/0x20 [spl]
[Tue Apr 25 10:30:31 2023] kthread+0x12a/0x150
[Tue Apr 25 10:30:31 2023] ? set_kthread_struct+0x50/0x50
[Tue Apr 25 10:30:31 2023] ret_from_fork+0x22/0x30
[Tue Apr 25 10:30:31 2023] </TASK>
不是很頻繁,但仍然太多:(正常運行時間:11:10:30 up 7 天, 15:27)
[Thu Apr 20 07:47:49 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Thu Apr 20 09:08:22 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Thu Apr 20 09:38:35 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Thu Apr 20 10:16:51 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Thu Apr 20 10:26:55 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Fri Apr 21 07:57:48 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Fri Apr 21 08:58:13 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Fri Apr 21 09:32:27 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Fri Apr 21 10:00:39 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
[Tue Apr 25 10:30:31 2023] INFO: task txg_sync:1985 blocked for more than 120 seconds.
磁碟年齡相當年輕,這裡沒有什麼值得注意的:
smart_WUH721414AL5201_81G8L1SV.log : power on : 259 days, 21:21:00
smart_WUH721414AL5201_9LHD9Z7G.log : power on : 197 days, 19:08:00
smart_WUH721414AL5201_9LKLGWSG.log : power on : 7 days, 17:25:00
smart_WUH721414AL5201_QBGDTMXT.log : power on : 255 days, 21:44:00
smart_WUH721414AL5201_Y6GME43C.log : power on : 346 days, 22:59:00
smart_WUH721414AL5201_Y6GRZLKC.log : power on : 197 days, 12:56:00
iostat 並不令人擔憂(儘管平均輸出,iowait 最多不會達到 10%、15% 以上):
avg-cpu: %user %nice %system %iowait %steal %idle
0.13 0.00 1.49 6.65 0.00 91.73
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 369.80 6115.05 0.04 0.01 5.25 16.54 10.81 328.36 0.01 0.06 0.77 30.38 0.00 0.00 0.00 0.00 0.00 0.00 0.19 7.87 1.95 80.10
sdb 412.40 6215.40 0.02 0.01 5.43 15.07 10.69 328.41 0.01 0.06 0.78 30.72 0.00 0.00 0.00 0.00 0.00 0.00 0.19 10.06 2.25 88.23
sdc 395.09 6004.60 0.02 0.00 5.49 15.20 10.72 328.56 0.01 0.07 0.78 30.66 0.00 0.00 0.00 0.00 0.00 0.00 0.19 8.66 2.18 85.42
sdd 412.57 6229.91 0.02 0.01 5.57 15.10 10.34 328.57 0.01 0.05 0.84 31.77 0.00 0.00 0.00 0.00 0.00 0.00 0.19 14.77 2.31 90.34
sde 374.34 6150.81 0.03 0.01 5.33 16.43 10.74 328.43 0.01 0.06 0.78 30.58 0.00 0.00 0.00 0.00 0.00 0.00 0.19 8.47 2.01 81.84
sdf 25.72 113.11 0.00 0.00 2.82 4.40 219.12 5713.02 0.09 0.04 1.25 26.07 0.00 0.00 0.00 0.00 0.00 0.00 0.18 49.05 0.36 27.09
但 zpool 延遲是(所有磁碟都有超過 1 秒的延遲):
zpool iostat -w
storage total_wait disk_wait syncq_wait asyncq_wait
latency read write read write read write read write scrub trim
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1ns 0 0 0 0 0 0 0 0 0 0
[...] # all zeros
127ns 0 0 0 0 0 0 0 0 0 0
255ns 0 0 0 0 1.17K 0 12 2.43K 1.61M 0
511ns 0 0 0 0 616K 484K 19.4K 1.39M 121M 0
1us 0 0 0 0 4.65M 895K 205K 10.4M 126M 0
2us 0 0 0 0 11.9M 49.4K 209K 2.74M 9.56M 0
4us 0 0 0 0 7.17M 4.99K 42.5K 168K 2.76M 0
8us 0 0 0 0 53.0K 1.37K 657 163K 3.96M 0
16us 72 0 315 0 11.7K 374 533 172K 2.84M 0
32us 61.3M 4 65.2M 36 768 8 201 269K 4.36M 0
65us 60.8M 96 60.6M 566 236 0 343 426K 6.04M 0
131us 79.4M 271 81.6M 879 80 0 734 824K 7.92M 0
262us 35.0M 443K 175M 464K 116 0 5.64K 2.11M 11.2M 0
524us 36.0M 8.21M 186M 50.5M 44 0 5.09K 5.27M 5.60M 0
1ms 17.5M 9.73M 58.8M 59.5M 89 0 2.63K 5.75M 3.15M 0
2ms 5.30M 13.9M 39.0M 41.3M 114 0 2.31K 9.08M 3.73M 0
4ms 6.29M 15.3M 97.1M 15.8M 176 0 3.48K 11.8M 6.05M 0
8ms 13.5M 12.3M 201M 2.59M 277 0 6.76K 9.97M 9.68M 0
16ms 26.7M 8.84M 198M 779K 334 0 8.13K 8.10M 14.9M 0
33ms 36.1M 9.82M 75.3M 275K 218 0 6.30K 9.17M 23.4M 0
67ms 41.8M 10.8M 12.5M 48.9K 215 0 2.79K 10.5M 37.1M 0
134ms 59.3M 9.52M 1.92M 9.46K 213 0 680 9.33M 57.2M 0
268ms 88.0M 7.43M 543K 893 272 0 121 7.35M 86.7M 0
536ms 132M 7.58M 389K 140 521 0 19 7.55M 131M 0
1s 190M 9.42M 21.4K 59 795 0 8 9.41M 189M 0
2s 205M 12.8M 2.09K 16 1.26K 0 0 12.8M 204M 0
4s 110M 17.9M 565 8 1.36K 0 0 17.9M 109M 0
8s 33.2M 15.0M 0 0 955 0 0 15.0M 33.0M 0
17s 11.3M 2.38M 0 0 269 0 0 2.38M 11.3M 0
34s 3.40M 18.4K 0 0 81 0 0 18.4K 3.40M 0
68s 392K 0 0 0 30 0 0 0 391K 0
137s 31.4K 0 0 0 13 0 0 0 31.4K 0
--------------------------------------------------------------------------------
1 秒延遲:
zpool iostat -vw | awk '/^1s/'
1s 191M 9.42M 21.5K 59 795 0 8 9.41M 190M 0
1s 191M 9.42M 21.5K 59 795 0 8 9.41M 190M 0
1s 31.6M 0 3.72K 0 117 0 0 0 31.4M 0
1s 41.1M 0 4.06K 0 198 0 0 0 41.0M 0
1s 41.8M 0 4.31K 0 171 0 0 0 41.7M 0
1s 40.3M 15 4.14K 0 204 0 0 15 40.2M 0
1s 35.8M 2 3.98K 0 105 0 0 1 35.6M 0
1s 1.91K 9.42M 1.29K 59 0 0 8 9.41M 554 0
1s 0 0 0 0 0 0 0 0 0 0
1s 1.91K 9.42M 1.29K 59 0 0 8 9.41M 554 0
我對所有這些資訊表示歉意。我盡量簡潔。
需要明確的是,我試圖找到這種爬行速度背後的原因。但我所看到的一切似乎都是正確的。
- smartctl 報告並不令人擔憂(與 OVH 交換)
- 我確實發現 ECC 發揮作用令人震驚,但我懷疑 18 個驅動器是壞的:
smart_WUH721414AL5201_81G8L1SV.log - total errors corrected -> 193
smart_WUH721414AL5201_9LHD9Z7G.log - total errors corrected -> 4
smart_WUH721414AL5201_QBGDTMXT.log - total errors corrected -> 6
smart_WUH721414AL5201_Y6GME43C.log - total errors corrected -> 13
無論如何,我在這裡不知所措。如果您知道在哪裡尋找,或者需要更多信息,請儘管詢問!
感謝您的時間。
編輯:新增輸出zpool list
- 重建終於完成了:
Fri Apr 28 03:33:32 2023