El ensamblaje de la matriz mdadm provoca que el kernel se cuelgue

El ensamblaje de la matriz mdadm provoca que el kernel se cuelgue

Tengo 2 matrices mdadm grandes y las combino en un grupo de volumen. Recientemente estaba agregando nuevas unidades a la segunda matriz y sufrí un corte de energía. Normalmente, no tengo grandes problemas para que los arreglos se reinicien o reanuden una remodelación después de una interrupción, pero esta vez tengo muchos problemas.

Detalles del sistema: CentOS 6.8 x64

Linux myserver 2.6.32-642.4.2.el6.x86_64 #1 SMP Tue Aug 23 19:58:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Cuando inicio por primera vez, la matriz se ensambla pero no se inicia porque está degradada. No puedo empezar con:

mdadm -R /dev/md126

Entonces lo detengo:

mdadm -S /dev/md126

Si lo monto normalmente no arranca:

mdadm --assemble --scan
mdadm: device 26 in /dev/md/128 has wrong state in superblock, but /dev/sdn seems ok
mdadm: /dev/md/128 assembled from 12 drives and 2 spares - not enough to start the array while not clean - consider --force.
mdadm: No arrays found in config file or automatically

entonces lo armo con:

mdadm --assemble --scan --force
mdadm: clearing FAULTY flag for device 1 in /dev/md/128 for /dev/sdn
mdadm: Marking array /dev/md/128 as 'clean'
mdadm: /dev/md/128 has been started with 12 drives (out of 13) and 2 spares.

En este punto, mi sesión está colgada. Pero si vuelvo a ingresar al dispositivo, puedo ejecutar comandos mientras la otra sesión continúa colgándose.

gato /proc/mdstat

md128 : active raid6 sdf[0] sdl[9](S) sdn[14](S) sdo[6] sdm[7] sdc[10] sdb[11] sdd[12] sde[13] sdk[5] sdj[4] sdi[3] sdh[2] sdg[1]
      23441323008 blocks super 1.2 level 6, 512k chunk, algorithm 2 [13/12] [UUUUUUUUUUUU_]
      [>....................]  reshape =  0.0% (128496/3906887168) finish=133658963.6min speed=0K/sec
      bitmap: 1/30 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid6 sdaa[2] sdab[13] sdac[0] sdad[15] sdy[4] sdx[5] sdz[14] sdw[6] sdr[9] sds[8] sdq[10] sdp[16] sdu[11] sdv[7] sdt[12]
      50789533184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

unused devices: <none>

La remodelación nunca pasa de ese punto. Siempre está atrapado ahí.

En el dmesg aparece el siguiente error

created bitmap (30 pages) for device md128
md128: bitmap initialized from disk: read 2 pages, set 1 of 59615 bits
md128: detected capacity change from 0 to 24003914760192
md: reshape of RAID array md128
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
 md128:
md: using 128k window, over a total of 3906887168k.
 unknown partition table
INFO: task mdadm:2357 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm         D 0000000000000002     0  2357   2179 0x00000004
 ffff880821773688 0000000000000082 0000000000000000 0000000000000086
 ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
 ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
 [<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
 [<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
 [<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
 [<ffffffff81196d47>] ? __dentry_open+0x257/0x380
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
 [<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001     0  2428      2 0x00000000
 ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
 ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
 ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
 [<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
 [<ffffffff81431ae5>] md_thread+0x115/0x150
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff814319d0>] ? md_thread+0x0/0x150
 [<ffffffff810a640e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a6370>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20
INFO: task blkid:2430 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid         D 0000000000000002     0  2430      1 0x00000000
 ffff8808211d36f8 0000000000000082 0000000000000000 0000000000000082
 ffff8808211d3688 ffffffff810633a3 0000013d9831af63 ffff88081f96d440
 ffff88081dc65ed0 0000000100104045 ffff88081e615ad8 ffff8808211d3fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff8142af00>] ? try_module_get+0x30/0xb0
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff81340b4b>] ? mix_pool_bytes_extract+0x16b/0x180
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81151319>] ? zone_statistics+0x99/0xc0
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff81130ba5>] ? mempool_alloc_slab+0x15/0x20
 [<ffffffff81130d43>] ? mempool_alloc+0x63/0x140
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811cfc8d>] submit_bh+0x11d/0x1f0
 [<ffffffff811d2a9c>] block_read_full_page+0x27c/0x3a0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811513ee>] ? __inc_zone_page_state+0x2e/0x30
 [<ffffffff81145480>] ? __lru_cache_add+0x40/0x90
 [<ffffffff811d7498>] blkdev_readpage+0x18/0x20
 [<ffffffff811441aa>] __do_page_cache_readahead+0x20a/0x210
 [<ffffffff81144251>] force_page_cache_readahead+0x71/0xa0
 [<ffffffff81144773>] page_cache_sync_readahead+0x43/0x50
 [<ffffffff81130128>] generic_file_aio_read+0x558/0x700
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8115f3d0>] ? unmap_region+0x110/0x130
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2357 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm         D 0000000000000002     0  2357   2179 0x00000004
 ffff880821773688 0000000000000082 0000000000000000 0000000000000086
 ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
 ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
 [<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
 [<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
 [<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
 [<ffffffff81196d47>] ? __dentry_open+0x257/0x380
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
 [<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001     0  2428      2 0x00000000
 ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
 ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
 ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
 [<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
 [<ffffffff81431ae5>] md_thread+0x115/0x150
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff814319d0>] ? md_thread+0x0/0x150
 [<ffffffff810a640e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a6370>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20
INFO: task blkid:2430 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid         D 0000000000000002     0  2430      1 0x00000000
 ffff8808211d36f8 0000000000000082 0000000000000000 0000000000000082
 ffff8808211d3688 ffffffff810633a3 0000013d9831af63 ffff88081f96d440
 ffff88081dc65ed0 0000000100104045 ffff88081e615ad8 ffff8808211d3fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff8142af00>] ? try_module_get+0x30/0xb0
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff81340b4b>] ? mix_pool_bytes_extract+0x16b/0x180
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81151319>] ? zone_statistics+0x99/0xc0
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff81130ba5>] ? mempool_alloc_slab+0x15/0x20
 [<ffffffff81130d43>] ? mempool_alloc+0x63/0x140
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811cfc8d>] submit_bh+0x11d/0x1f0
 [<ffffffff811d2a9c>] block_read_full_page+0x27c/0x3a0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811513ee>] ? __inc_zone_page_state+0x2e/0x30
 [<ffffffff81145480>] ? __lru_cache_add+0x40/0x90
 [<ffffffff811d7498>] blkdev_readpage+0x18/0x20
 [<ffffffff811441aa>] __do_page_cache_readahead+0x20a/0x210
 [<ffffffff81144251>] force_page_cache_readahead+0x71/0xa0
 [<ffffffff81144773>] page_cache_sync_readahead+0x43/0x50
 [<ffffffff81130128>] generic_file_aio_read+0x558/0x700
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8115f3d0>] ? unmap_region+0x110/0x130
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2481 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm         D 0000000000000003     0  2481   2459 0x00000004
 ffff8808226ef688 0000000000000086 0000000000000000 0000000000000086
 ffff8808226ef618 ffffffff810633a3 0000014fb75098d5 ffff88081f96d440
 ffff88081dc65ed0 0000000100117098 ffff88081e7f7068 ffff8808226effd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff8117ec7c>] ? transfer_objects+0x5c/0x80
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8117f66e>] ? cache_alloc_refill+0x9e/0x240
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
 [<ffffffff81151319>] ? zone_statistics+0x99/0xc0
 [<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
 [<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
 [<ffffffff81196d47>] ? __dentry_open+0x257/0x380
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
 [<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task pvscan:2518 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pvscan        D 0000000000000003     0  2518   2505 0x00000000
 ffff8808205e3688 0000000000000086 0000000000000000 0000000000000086
 ffff8808205e3618 ffffffff810633a3 000001550b5d4884 ffff88081f96d440
 ffff88081dc65ed0 000000010011ca13 ffff88081ddf5ad8 ffff8808205e3fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
 [<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
 [<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff811d7ef0>] ? blkdev_get+0x10/0x20
 [<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
 [<ffffffff81196d47>] ? __dentry_open+0x257/0x380
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff81397a3f>] ? scsi_device_put+0x2f/0x40
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
 [<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2357 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm         D 0000000000000002     0  2357   2179 0x00000004
 ffff880821773688 0000000000000082 0000000000000000 0000000000000086
 ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
 ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
 [<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
 [<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8143066f>] md_make_request+0xdf/0x240
 [<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
 [<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
 [<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
 [<ffffffff8127e180>] submit_bio+0x70/0x120
 [<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
 [<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
 [<ffffffff81196d47>] ? __dentry_open+0x257/0x380
 [<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
 [<ffffffff81199a6a>] do_sync_read+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
 [<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
 [<ffffffff8119a365>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a6b1>] sys_read+0x51/0xb0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001     0  2428      2 0x00000000
 ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
 ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
 ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
 [<ffffffff810633a3>] ? __wake_up+0x53/0x70
 [<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
 [<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
 [<ffffffff81431ae5>] md_thread+0x115/0x150
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff814319d0>] ? md_thread+0x0/0x150
 [<ffffffff810a640e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a6370>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

Ahora, los comandos como pvscan y otros comandos que controlan la matriz md se bloquean. Básicamente, lo único que puedo hacer es reiniciar para volver a un estado en el que pueda controlar la matriz. Incluso el reinicio se bloquea hacia el final, momento en el que tengo que realizar un reinicio completo.

¿Alguien tiene alguna idea sobre cómo resolver esto? Intenté arrancar desde un CD en vivo de Ubuntu, pero cuando quise ensamblar la matriz, todo se congeló.

Respuesta1

Entonces, al final, pude recuperar esta matriz ejecutando el siguiente comando:

mdadm --create /dev/md126 --level=6 --raid-devices=14 --name=gigantor:128 /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdo /dev/sdm missing missing /dev/sdc /dev/sdb /dev/sdd /dev/sde --assume-clean

Luego, vuelva a agregar los dos dispositivos.

mdadm --add /dev/md126 /dev/sdn
mdadm --add /dev/md126 /dev/sdl

Luego, como mi información de LVM también estaba dañada, tuve que recuperarla:

pvcreate --uuid "kvEA4X-vobg-2Ipz-ITF1-ZhtW-Ewej-6liKVx" --restorefile /etc/lvm/backup/vg_data /dev/md126
vgcfgrestore vg_data
lvchange -ay /dev/vg_data/lvm0

Luego intenté xfs_check el LVM y decía que necesitaba ejecutar la reparación y destruir los registros:

xfs_repair -L /dev/mapper/vg_data-lvm0

Después de reparar esto, ahora pude montar mi LVM y mis datos estaban intactos.

Mi LVM se está reparando.

Personalities : [raid6] [raid5] [raid4]
md126 : active raid6 sdm[14] sdl[7] sdk[15] sdn[6] sdj[5] sdi[4] sdg[2] sde[0] sdf[1] sdh[3] sdc[12] sda[11] sdd[13] sdb[10]
      46882646016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [14/12] [UUUUUUUU__UUUU]
      [==========>..........]  recovery = 53.6% (2097061864/3906887168) finish=2448.0min speed=12321K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

md127 : active raid6 sdac[15] sdaa[13] sdz[2] sdab[0] sdq[9] sdy[14] sdx[4] sdr[8] sdp[10] sdo[16] sdv[6] sdu[7] sdw[5] sds[12] sdt[11]
      50789533184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

unused devices: <none>

Que dolor....

información relacionada