RAID5 necesita que se agregue una unidad cada reinicio

RAID5 necesita que se agregue una unidad cada reinicio

Tengo una matriz mdadm RAID5 de 3 discos con un total de 4 TB en un servidor Ubuntu 14.04.3 LTS.

Debido a un pánico en el kernel causado por un dispositivo ya reemplazado no relacionado con la matriz, después de cada reinicio la matriz se inicia [UU_]. La solución temporal que encontré fue ejecutar mdadm --add /dev/md0 /dev/sdd1porque comienza a reconstruirse y lo hace correctamente.

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[4] sdb1[3] sdc1[1]
      3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

Pero tengo que hacerlo cada reinicio y noté que los números de disco parecen incorrectos: 4, 3 y 1 en lugar de 2, 1 y 0.

root@Bt-Networks-Server:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Aug  1 00:53:53 2014
     Raid Level : raid5
     Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
  Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Oct 26 17:40:43 2015
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : Bt-Networks-Server:0  (local to host Bt-Networks-Server)
           UUID : 4e860a9e:0b433a00:54d2c991:78ca3d15
         Events : 83137

    Number   Major   Minor   RaidDevice State
       4       8       49        0      active sync   /dev/sdd1
       1       8       33        1      active sync   /dev/sdc1
       3       8       17        2      active sync   /dev/sdb1

También encontré la siguiente información en dmesg sobre cómo expulsar un disco no nuevo:

[    2.430966] md: raid1 personality registered for level 1
[    2.500019] raid6: sse2x1    3900 MB/s
[    2.568110] raid6: sse2x2    4957 MB/s
[    2.582322] md: bind<sdc1>
[    2.583992] md: bind<sdd1>
[    2.608030] usb 6-2: new low-speed USB device number 2 using uhci_hcd
[    2.619248] md: bind<sdb1>
[    2.620098] md: kicking non-fresh sdd1 from array!
[    2.620103] md: unbind<sdd1>
[    2.636013] raid6: sse2x4    6926 MB/s
[    2.636015] raid6: using algorithm sse2x4 (6926 MB/s)
[    2.636017] raid6: using ssse3x2 recovery algorithm
[    2.637624] xor: measuring software checksum speed
[    2.664021] usb 7-1: new low-speed USB device number 2 using uhci_hcd
[    2.676012]    prefetch64-sse: 10026.000 MB/sec
[    2.716011]    generic_sse:  8868.000 MB/sec
[    2.716013] xor: using function: prefetch64-sse (10026.000 MB/sec)
[    2.717321] async_tx: api initialized (async)
[    2.725129] md: raid6 personality registered for level 6
[    2.725131] md: raid5 personality registered for level 5
[    2.725133] md: raid4 personality registered for level 4
[    2.728509] md: export_rdev(sdd1)
[    2.729556] md/raid:md0: device sdb1 operational as raid disk 2
[    2.729559] md/raid:md0: device sdc1 operational as raid disk 1
[    2.729927] md/raid:md0: allocated 0kB
[    2.729976] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[    2.729983] RAID conf printout:
[    2.729984]  --- level:5 rd:3 wd:2
[    2.729986]  disk 1, o:1, dev:sdc1
[    2.729988]  disk 2, o:1, dev:sdb1
[    2.730030] md0: detected capacity change from 0 to 4000526106624
[    2.731863] md: raid10 personality registered for level 10
[    2.755618]  md0: unknown partition table
[    2.812332] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

Ya volví a verificar mdadm.conf actualizando:

root@Bt-Networks-Server:~# mdadm --detail --scan
ARRAY /dev/md/0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15

guardar en el archivo de configuración y ejecutar update-initramfs -u

¿Existe alguna solución para evitar agregar y reconstruir/resincronizar la matriz cada reinicio?

¡Gracias!

EDITAR:

Contenido de /etc/mdadm/mdadm.conf:

root@Bt-Networks-Server:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Thu, 31 Jul 2014 23:42:00 -0300
# by mkconf $Id$
#ARRAY /dev/md/Bt-Networks-Server:0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15
ARRAY /dev/md/0 metadata=1.2 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15 name=Bt-Networks-Server:0

Busqué en dmesg y encontré un registro relacionado con la recuperación

[  185.105099] md: export_rdev(sdd1)
[  185.220543] md: bind<sdd1>
[  185.320114] RAID conf printout:
[  185.320118]  --- level:5 rd:3 wd:2
[  185.320121]  disk 0, o:1, dev:sdd1
[  185.320123]  disk 1, o:1, dev:sdc1
[  185.320124]  disk 2, o:1, dev:sdb1
[  185.320272] md: recovery of RAID array md0
[  185.320276] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  185.320278] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  185.320281] md: using 128k window, over a total of 1953381888k.
[ 1009.812057] EXT4-fs (md0): recovery complete
[ 1009.896520] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[ 1109.136229] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[19295.440128] md: md0: recovery done.
[19295.607089] RAID conf printout:
[19295.607096]  --- level:5 rd:3 wd:3
[19295.607099]  disk 0, o:1, dev:sdd1
[19295.607101]  disk 1, o:1, dev:sdc1
[19295.607103]  disk 2, o:1, dev:sdb1

También encontré algunos datos sobre una verificación periódica exitosa de la matriz.

[501643.369779] md: data-check of RAID array md0
[501643.369784] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[501643.369786] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[501643.369791] md: using 128k window, over a total of 1953381888k.
[518452.072029] md: md0: data-check done.

información relacionada