Estou observando o erro abaixo no dmesg para um RAID Array. Como posso descobrir qual unidade no RAID está ruim?
[Fri Aug 26 19:31:13 2022] EXT4-fs warning (device md0): ext4_end_bio:349: I/O error 10 writing to inode 100728932 starting block 1514702321)
[Fri Aug 26 19:31:13 2022] buffer_io_error: 80 callbacks suppressed
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702124
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702125
Acontece que também fiz uma verificação de integridade com smartctl em todos os discos que fazem parte do RAID.OKentão eu estava pensando que poderia recolocar os discos uma vez e isso deveria funcionar, mas eu queria uma solução de longo prazo para identificar o disco defeituoso subjacente ou qualquer comando que pudesse me ajudar a verificar o que está desencadeando o problema.
Responder1
Escrevi um script para verificar meu RAID, pois tive problemas com meu RAID. Acontece que eu precisava travar os cabos SATA, pois eles estavam se soltando, fazendo com que minhas unidades parecessem estar com erros.
De qualquer forma, aqui está o meu script que executo e um exemplo abaixo dele em ação:
O roteiro:
#!/bin/bash
# Check for root
if [ "$EUID" -ne 0 ]; then
echo "Please run $0 as root"
echo ""
echo "example:"
echo "sudo $0"
exit 1
fi
# Check for smartmontools
smartctl -h > /dev/null
case $? in
1) echo "smartmontools is not installed. Please install it with the following command:"
echo ""
echo "sudo apt install smartmontools"
exit 1;;
0) ;;
esac
awk '/: active/ {print $1}' /proc/mdstat | while read drv
do
sudo mdadm -D /dev/$drv
done
echo ""
# Create drive array
drives=( $(smartctl --scan | awk '{print $1}') )
# Loop through array and check each drive
for ((i=0; i < ${#drives[@]}; ++i))
do
model=$(smartctl -a ${drives[$i]} | grep -i "device model:" | awk '{print substr($0,index($0,$3))}')
serial=$(smartctl -a ${drives[$i]} | grep -i "serial number:" | awk '{print $NF}')
result=$(smartctl -H ${drives[$i]} | awk '/overall-health/ {print $NF}')
echo -n "${drives[$i]} Model: $model Serial: $serial SMART: $result"
j=$(echo ${drives[$i]} | cut -d/ -f3); echo -n " Errors: "
grep -i error /var/log/kern.log 2>/dev/null | grep "$j," | wc -l
done
Exemplo:
terrance@Intrepid:~$ sudo ./drive_check.bsh
/dev/md0:
Version : 1.2
Creation Time : Wed Dec 27 18:06:03 2017
Raid Level : raid1
Array Size : 484323328 (461.89 GiB 495.95 GB)
Used Dev Size : 484323328 (461.89 GiB 495.95 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 08:12:59 2022
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : Intrepid:root (local to host Intrepid)
UUID : f9b257fc:d64f97c7:95581e88:004e3a4b
Events : 71486
Number Major Minor RaidDevice State
2 8 161 0 active sync /dev/sdk1
1 8 1 1 active sync /dev/sda1
/dev/md2:
Version : 1.2
Creation Time : Wed Dec 27 18:18:25 2017
Raid Level : raid1
Array Size : 3927040 (3.75 GiB 4.02 GB)
Used Dev Size : 3927040 (3.75 GiB 4.02 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Aug 7 10:59:07 2022
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
Name : Intrepid:swap (local to host Intrepid)
UUID : 2cdfcb03:e5e0c30f:d68d4e20:37b50e41
Events : 191
Number Major Minor RaidDevice State
2 8 165 0 active sync /dev/sdk5
1 8 5 1 active sync /dev/sda5
/dev/md1:
Version : 1.2
Creation Time : Tue Feb 3 01:16:55 2015
Raid Level : raid5
Array Size : 15627542528 (14.55 TiB 16.00 TB)
Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 07:39:42 2022
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : Intrepid:1 (local to host Intrepid)
UUID : 3bb988cb:d5270497:36e75f46:67a9bc65
Events : 1155019
Number Major Minor RaidDevice State
0 8 81 0 active sync /dev/sdf1
1 8 97 1 active sync /dev/sdg1
2 8 113 2 active sync /dev/sdh1
3 8 129 3 active sync /dev/sdi1
5 8 145 4 active sync /dev/sdj1
/dev/sda Model: MAXTOR STM3500630A Serial: 9QG9152W SMART: PASSED Errors: 0
/dev/sdf Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJPD3EXP SMART: PASSED Errors: 0
/dev/sdg Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E5UZUKPY SMART: PASSED Errors: 0
/dev/sdh Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E3XCP660 SMART: PASSED Errors: 0
/dev/sdi Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E7ZRRN8U SMART: PASSED Errors: 0
/dev/sdj Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJXKY26C SMART: PASSED Errors: 0
/dev/sdk Model: ST3500418AS Serial: 6VM1HTNN SMART: PASSED Errors: 0