Linux의 RAID 설정에서 불량 디스크를 식별하는 방법은 무엇입니까?

Linux의 RAID 설정에서 불량 디스크를 식별하는 방법은 무엇입니까?

RAID 어레이에 대한 dmesg에서 아래 오류가 관찰되었습니다. RAID에서 어떤 드라이브가 불량인지 어떻게 찾을 수 있나요?

[Fri Aug 26 19:31:13 2022] EXT4-fs warning (device md0): ext4_end_bio:349: I/O error 10 writing to inode 100728932 starting block 1514702321)
[Fri Aug 26 19:31:13 2022] buffer_io_error: 80 callbacks suppressed
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702124
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702125

나는 또한 RAID의 일부인 모든 디스크에 대해 smartctl을 사용하여 상태 점검을 수행했는데 모두 밝혀졌습니다.좋아요그래서 디스크를 한 번만 다시 장착하면 작동할 것이라고 생각했지만 기본 불량 디스크를 식별하는 장기적인 솔루션이나 문제를 유발하는 원인을 확인하는 데 도움이 될 수 있는 명령을 원했습니다.

답변1

RAID에 문제가 있었기 때문에 RAID를 확인하는 스크립트를 작성했습니다. SATA 케이블이 느슨해지면서 드라이브에 오류가 발생한 것처럼 보이도록 잠금이 필요하다는 것이 밝혀졌습니다.

어쨌든, 여기 제가 실행하는 스크립트와 실제 작동 예가 있습니다.

스크립트:

#!/bin/bash

# Check for root
if [ "$EUID" -ne 0 ]; then
  echo "Please run $0 as root"
  echo ""
  echo "example:"
  echo "sudo $0"
  exit 1
fi

# Check for smartmontools
smartctl -h > /dev/null
case $? in
    1) echo "smartmontools is not installed.  Please install it with the following command:"
    echo ""
    echo "sudo apt install smartmontools"
    exit 1;;
    0) ;;
esac

awk '/: active/ {print $1}' /proc/mdstat | while read drv
do
    sudo mdadm -D /dev/$drv
done

echo ""

# Create drive array
drives=( $(smartctl --scan | awk '{print $1}') )

# Loop through array and check each drive
for ((i=0; i < ${#drives[@]}; ++i))
do
    model=$(smartctl -a ${drives[$i]} | grep -i "device model:" | awk '{print substr($0,index($0,$3))}')
    serial=$(smartctl -a ${drives[$i]} | grep -i "serial number:" | awk '{print $NF}')
    result=$(smartctl -H ${drives[$i]} | awk '/overall-health/ {print $NF}')
    echo -n "${drives[$i]} Model: $model Serial: $serial SMART: $result"
    j=$(echo ${drives[$i]} | cut -d/ -f3); echo -n " Errors: "
    grep -i error /var/log/kern.log 2>/dev/null | grep "$j," | wc -l
done

예:

terrance@Intrepid:~$ sudo ./drive_check.bsh 
/dev/md0:
           Version : 1.2
     Creation Time : Wed Dec 27 18:06:03 2017
        Raid Level : raid1
        Array Size : 484323328 (461.89 GiB 495.95 GB)
     Used Dev Size : 484323328 (461.89 GiB 495.95 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Sep  1 08:12:59 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : Intrepid:root  (local to host Intrepid)
              UUID : f9b257fc:d64f97c7:95581e88:004e3a4b
            Events : 71486

    Number   Major   Minor   RaidDevice State
       2       8      161        0      active sync   /dev/sdk1
       1       8        1        1      active sync   /dev/sda1
/dev/md2:
           Version : 1.2
     Creation Time : Wed Dec 27 18:18:25 2017
        Raid Level : raid1
        Array Size : 3927040 (3.75 GiB 4.02 GB)
     Used Dev Size : 3927040 (3.75 GiB 4.02 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Sun Aug  7 10:59:07 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : Intrepid:swap  (local to host Intrepid)
              UUID : 2cdfcb03:e5e0c30f:d68d4e20:37b50e41
            Events : 191

    Number   Major   Minor   RaidDevice State
       2       8      165        0      active sync   /dev/sdk5
       1       8        5        1      active sync   /dev/sda5
/dev/md1:
           Version : 1.2
     Creation Time : Tue Feb  3 01:16:55 2015
        Raid Level : raid5
        Array Size : 15627542528 (14.55 TiB 16.00 TB)
     Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
      Raid Devices : 5
     Total Devices : 5
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Sep  1 07:39:42 2022
             State : clean 
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Intrepid:1  (local to host Intrepid)
              UUID : 3bb988cb:d5270497:36e75f46:67a9bc65
            Events : 1155019

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       97        1      active sync   /dev/sdg1
       2       8      113        2      active sync   /dev/sdh1
       3       8      129        3      active sync   /dev/sdi1
       5       8      145        4      active sync   /dev/sdj1

/dev/sda Model: MAXTOR STM3500630A Serial: 9QG9152W SMART: PASSED Errors: 0
/dev/sdf Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJPD3EXP SMART: PASSED Errors: 0
/dev/sdg Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E5UZUKPY SMART: PASSED Errors: 0
/dev/sdh Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E3XCP660 SMART: PASSED Errors: 0
/dev/sdi Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E7ZRRN8U SMART: PASSED Errors: 0
/dev/sdj Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJXKY26C SMART: PASSED Errors: 0
/dev/sdk Model: ST3500418AS Serial: 6VM1HTNN SMART: PASSED Errors: 0

관련 정보