xfs: não consigo ler o superbloco

xfs: não consigo ler o superbloco

Estou tendo o erro a seguir:

[root@mediaserv ~]# mount /dev/mapper/media1 /media
mount: /media: can't read superblock on /dev/mapper/media1.

Este é o Fedora 33. Eu tenho um RAID5 de 8 unidades WD Red de 8 TB rodando em um controlador RAID Adaptec 7805Q, este é /dev/sdc. Eu tenho uma partição GPT,/dev/sdc1, que é criptografada com LUKSv2 com um sistema de arquivos XFS.

[root@mediaserv ~]# lsblk /dev/sdc
NAME       MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sdc          8:32   1 50.9T  0 disk
└─sdc1       8:33   1 50.9T  0 part
  └─media1 253:0    0 50.9T  0 crypt
[root@mediaserv ~]#

O RAID acabou em modo degradado. Muito provavelmente bati em um cabo na primeira unidade ao instalar uma nova ventoinha. De qualquer forma, depois de inicializar, executei em modo degradado por várias horas antes de perceber. Desliguei-o, inicializei no modo de usuário único a partir de uma imagem de resgate e deixei-o rodar para reconstruir o array. Isso levou cerca de 14 horas.

Ao inicializá-lo novamente, sou solicitada a senha LUKs da partição, mas ela simplesmente fica lá. Deixei isso funcionar por cerca de 8 horas, sem ter certeza se algo estava sendo corrigido em segundo plano.

Eu inicializei do resgate novamente. Comentei o sistema de arquivos /etc/crypttabe /etc/fstabsou capaz de efetuar login no sistema sem o /mediasistema de arquivos montado.

Consegui correr cryptsetup luksOpen /dev/sdc1 media1com sucesso; a partição parece descriptografar sem erro.

Quando executo o comando mount (acima), recebo o seguinte /var/log/messages:

Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: Buffer I/O error on dev dm-0, logical block 0, async page read
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: ISOFS: unsupported/invalid hardware sector size 4096
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: FAT-fs (dm-0): unable to read boot sector

Tentei executar xfs_repair, mas ainda não tentei a -Lopção.

[root@mediaserv ~]# xfs_repair /dev/mapper/media1
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Remote I/O error

Não tenho certeza de onde devo ir em seguida, estou preocupado em poder executar o comando errado e causar mais danos. Qualquer ajuda certamente seria apreciada.

Obrigado!

-Mike

EDITAR:

Depois de mais investigações, não acho que seja um problema de superbloco, acho que o erro ocorreu porque não especifiquei o tipo de sistema de arquivos no comando de montagem. Executando-o novamente de maneira mais adequada, obtenho:

[root@mediaserv ~]# mount -t xfs /dev/mapper/media1 /media
mount: /media: mount(2) system call failed: Remote I/O error.

O que coloca o seguinte no meu /var/log/messages:

Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Sense Key : Hardware Error [current]
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Add. Sense: Internal target failure
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 12:15:43 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Jan  5 12:15:43 mediaserv kernel: XFS (dm-0): SB validate failed with error -121.

Não tenho certeza de como interpretar isso. Dados incorretos começando no setor 34816?

EDITAR # 2:

Em relação à integridade do RAID Array. Como mencionei, ele entrou no modo degradado com a unidade perdida. Tirei-o de serviço e coloquei-o no modo de usuário único enquanto o RAID era reconstruído. A seguir está a saída da ferramenta Adaptec após a reconstrução (eu a reduzi para ser menos detalhada):

arcconf getconfig 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Controller Mode                          : RAID (Expose RAW)
   Controller Model                         : Adaptec ASR7805Q
   Performance Mode                         : Big Block Bypass
   --------------------------------------------------------
   RAID Properties
   --------------------------------------------------------
   Logical devices/Failed/Degraded          : 1/0/0
   Copyback                                 : Disabled
   Automatic Failover                       : Enabled
   Background consistency check             : Disabled
   Background consistency check period      : 0
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical Device number 0
   Logical Device name                      : media
   Block Size of member drives              : 4K Bytes
   RAID level                               : 5
   Status of Logical Device                 : Optimal
   Size                                     : 53387257 MB
   Parity space                             : 7626751 MB
   Stripe-unit size                         : 1024 KB
   Interface Type                           : Serial ATA
   Device Type                              : HDD
   Read-cache setting                       : Enabled
   Read-cache status                        : On
   Write-cache setting                      : On when protected by battery/ZMM
   Write-cache status                       : On
   maxCache read cache setting              : Enabled
   maxCache read cache status               : Off
   maxCache write cache setting             : Disabled
   maxCache write cache status              : Off
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : Yes
   Power settings                           : Disabled
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #1
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #2
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #3
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #4
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #5
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #6
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #7
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes

Este é o status SMART de cada uma das unidades da matriz:

[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,0" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,1" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,2" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,3" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,4" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,5" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,6" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,7" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED

NO ENTANTO, apenas algumas horas atrás, examinando os registros, encontrei o seguinte:

Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=9s
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Sense Key : Hardware Error [current]
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Add. Sense: Internal target failure
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 60 2f 5c bf 00 00 00 20 00 00
Jan  4 08:25:25 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 47269471736 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

Cinco dos itens acima em sequência, que ainda continuam nos logs, e os seguintes ao mesmo tempo em que a máquina perdeu o sistema de arquivos:

Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:55 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:55 mediaserv kernel: aacraid: Host bus reset request. SCSI hang ?
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: midlevel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: error handler-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: firmware-56
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: kernel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Controller reset type is 3
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Issuing IOP reset
Jan  4 08:27:30 mediaserv kernel: aacraid 0000:02:00.0: IOP reset succeeded
Jan  4 08:27:30 mediaserv kernel: aacraid: Comm Interface type2 enabled
Jan  4 08:27:56 mediaserv kernel: aacraid 0000:02:00.0: Scheduling bus rescan

O interessante a notar é que o array entrou no modo degradado e, 10 horas e 15 minutos depois, aconteceu o que foi dito acima. Portanto, o problema do array e o problema do sistema de arquivos xfs ocorreram com horas de diferença. E embora a matriz e as unidades apresentem relatórios saudáveis ​​agora, estouaindarecebendo o bloco "FAILED Result" acima.

informação relacionada