Eu tenho duas matrizes mdadm grandes e as combino em um grupo de volumes. Recentemente, adicionei novas unidades ao segundo array e sofri uma queda de energia. Normalmente, não tenho grandes problemas para reiniciar os arrays ou retomar uma remodelação após uma interrupção, mas desta vez estou tendo muitos problemas.
Detalhes do sistema: CentOS 6.8 x64
Linux myserver 2.6.32-642.4.2.el6.x86_64 #1 SMP Tue Aug 23 19:58:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Quando inicializo pela primeira vez, o array é montado, mas não inicia porque está degradado. Não consigo começar com:
mdadm -R /dev/md126
Eu então paro:
mdadm -S /dev/md126
Se eu montar normalmente, ele não iniciará:
mdadm --assemble --scan
mdadm: device 26 in /dev/md/128 has wrong state in superblock, but /dev/sdn seems ok
mdadm: /dev/md/128 assembled from 12 drives and 2 spares - not enough to start the array while not clean - consider --force.
mdadm: No arrays found in config file or automatically
então eu monto com:
mdadm --assemble --scan --force
mdadm: clearing FAULTY flag for device 1 in /dev/md/128 for /dev/sdn
mdadm: Marking array /dev/md/128 as 'clean'
mdadm: /dev/md/128 has been started with 12 drives (out of 13) and 2 spares.
Neste ponto, minha sessão está suspensa. Mas se eu voltar ao dispositivo, posso executar comandos enquanto a outra sessão continua travada.
gato /proc/mdstat
md128 : active raid6 sdf[0] sdl[9](S) sdn[14](S) sdo[6] sdm[7] sdc[10] sdb[11] sdd[12] sde[13] sdk[5] sdj[4] sdi[3] sdh[2] sdg[1]
23441323008 blocks super 1.2 level 6, 512k chunk, algorithm 2 [13/12] [UUUUUUUUUUUU_]
[>....................] reshape = 0.0% (128496/3906887168) finish=133658963.6min speed=0K/sec
bitmap: 1/30 pages [4KB], 65536KB chunk
md127 : active (auto-read-only) raid6 sdaa[2] sdab[13] sdac[0] sdad[15] sdy[4] sdx[5] sdz[14] sdw[6] sdr[9] sds[8] sdq[10] sdp[16] sdu[11] sdv[7] sdt[12]
50789533184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
unused devices: <none>
A remodelação nunca passa desse ponto. Está sempre preso lá.
No dmesg recebo o seguinte erro
created bitmap (30 pages) for device md128
md128: bitmap initialized from disk: read 2 pages, set 1 of 59615 bits
md128: detected capacity change from 0 to 24003914760192
md: reshape of RAID array md128
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
md128:
md: using 128k window, over a total of 3906887168k.
unknown partition table
INFO: task mdadm:2357 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm D 0000000000000002 0 2357 2179 0x00000004
ffff880821773688 0000000000000082 0000000000000000 0000000000000086
ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
[<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
[<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
[<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
[<ffffffff81196d47>] ? __dentry_open+0x257/0x380
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
[<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001 0 2428 2 0x00000000
ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
[<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
[<ffffffff81431ae5>] md_thread+0x115/0x150
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff814319d0>] ? md_thread+0x0/0x150
[<ffffffff810a640e>] kthread+0x9e/0xc0
[<ffffffff8100c28a>] child_rip+0xa/0x20
[<ffffffff810a6370>] ? kthread+0x0/0xc0
[<ffffffff8100c280>] ? child_rip+0x0/0x20
INFO: task blkid:2430 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid D 0000000000000002 0 2430 1 0x00000000
ffff8808211d36f8 0000000000000082 0000000000000000 0000000000000082
ffff8808211d3688 ffffffff810633a3 0000013d9831af63 ffff88081f96d440
ffff88081dc65ed0 0000000100104045 ffff88081e615ad8 ffff8808211d3fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff8142af00>] ? try_module_get+0x30/0xb0
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff81340b4b>] ? mix_pool_bytes_extract+0x16b/0x180
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81151319>] ? zone_statistics+0x99/0xc0
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff81130ba5>] ? mempool_alloc_slab+0x15/0x20
[<ffffffff81130d43>] ? mempool_alloc+0x63/0x140
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811cfc8d>] submit_bh+0x11d/0x1f0
[<ffffffff811d2a9c>] block_read_full_page+0x27c/0x3a0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811513ee>] ? __inc_zone_page_state+0x2e/0x30
[<ffffffff81145480>] ? __lru_cache_add+0x40/0x90
[<ffffffff811d7498>] blkdev_readpage+0x18/0x20
[<ffffffff811441aa>] __do_page_cache_readahead+0x20a/0x210
[<ffffffff81144251>] force_page_cache_readahead+0x71/0xa0
[<ffffffff81144773>] page_cache_sync_readahead+0x43/0x50
[<ffffffff81130128>] generic_file_aio_read+0x558/0x700
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8115f3d0>] ? unmap_region+0x110/0x130
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2357 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm D 0000000000000002 0 2357 2179 0x00000004
ffff880821773688 0000000000000082 0000000000000000 0000000000000086
ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
[<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
[<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
[<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
[<ffffffff81196d47>] ? __dentry_open+0x257/0x380
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
[<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001 0 2428 2 0x00000000
ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
[<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
[<ffffffff81431ae5>] md_thread+0x115/0x150
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff814319d0>] ? md_thread+0x0/0x150
[<ffffffff810a640e>] kthread+0x9e/0xc0
[<ffffffff8100c28a>] child_rip+0xa/0x20
[<ffffffff810a6370>] ? kthread+0x0/0xc0
[<ffffffff8100c280>] ? child_rip+0x0/0x20
INFO: task blkid:2430 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid D 0000000000000002 0 2430 1 0x00000000
ffff8808211d36f8 0000000000000082 0000000000000000 0000000000000082
ffff8808211d3688 ffffffff810633a3 0000013d9831af63 ffff88081f96d440
ffff88081dc65ed0 0000000100104045 ffff88081e615ad8 ffff8808211d3fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff8142af00>] ? try_module_get+0x30/0xb0
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff81340b4b>] ? mix_pool_bytes_extract+0x16b/0x180
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81151319>] ? zone_statistics+0x99/0xc0
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff81130ba5>] ? mempool_alloc_slab+0x15/0x20
[<ffffffff81130d43>] ? mempool_alloc+0x63/0x140
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811cfc8d>] submit_bh+0x11d/0x1f0
[<ffffffff811d2a9c>] block_read_full_page+0x27c/0x3a0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811513ee>] ? __inc_zone_page_state+0x2e/0x30
[<ffffffff81145480>] ? __lru_cache_add+0x40/0x90
[<ffffffff811d7498>] blkdev_readpage+0x18/0x20
[<ffffffff811441aa>] __do_page_cache_readahead+0x20a/0x210
[<ffffffff81144251>] force_page_cache_readahead+0x71/0xa0
[<ffffffff81144773>] page_cache_sync_readahead+0x43/0x50
[<ffffffff81130128>] generic_file_aio_read+0x558/0x700
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8115f3d0>] ? unmap_region+0x110/0x130
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2481 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm D 0000000000000003 0 2481 2459 0x00000004
ffff8808226ef688 0000000000000086 0000000000000000 0000000000000086
ffff8808226ef618 ffffffff810633a3 0000014fb75098d5 ffff88081f96d440
ffff88081dc65ed0 0000000100117098 ffff88081e7f7068 ffff8808226effd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff8117ec7c>] ? transfer_objects+0x5c/0x80
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8117f66e>] ? cache_alloc_refill+0x9e/0x240
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
[<ffffffff81151319>] ? zone_statistics+0x99/0xc0
[<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
[<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
[<ffffffff81196d47>] ? __dentry_open+0x257/0x380
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
[<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task pvscan:2518 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pvscan D 0000000000000003 0 2518 2505 0x00000000
ffff8808205e3688 0000000000000086 0000000000000000 0000000000000086
ffff8808205e3618 ffffffff810633a3 000001550b5d4884 ffff88081f96d440
ffff88081dc65ed0 000000010011ca13 ffff88081ddf5ad8 ffff8808205e3fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
[<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
[<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff811d7ef0>] ? blkdev_get+0x10/0x20
[<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
[<ffffffff81196d47>] ? __dentry_open+0x257/0x380
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff81397a3f>] ? scsi_device_put+0x2f/0x40
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
[<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task mdadm:2357 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm D 0000000000000002 0 2357 2179 0x00000004
ffff880821773688 0000000000000082 0000000000000000 0000000000000086
ffff880821773618 ffffffff810633a3 0000013e9de36948 ffff88081f96d440
ffff88081dc65ed0 000000010010516a ffff88081d713068 ffff880821773fd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff81130d00>] ? mempool_alloc+0x20/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810a6bce>] ? prepare_to_wait+0x4e/0x80
[<ffffffffa02455ef>] make_request+0x19f/0xcb0 [raid456]
[<ffffffff812ab7de>] ? __sg_alloc_table+0x7e/0x130
[<ffffffff81056cc5>] ? gup_pte_range+0xe5/0x130
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8143066f>] md_make_request+0xdf/0x240
[<ffffffff8127ddb0>] generic_make_request+0x240/0x5a0
[<ffffffff811d941c>] ? do_direct_IO+0x57c/0xfa0
[<ffffffff811d60cb>] ? bio_alloc_bioset+0x5b/0xf0
[<ffffffff8127e180>] submit_bio+0x70/0x120
[<ffffffff811daabd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811db127>] __blockdev_direct_IO+0x77/0xe0
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff811d73a7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d6320>] ? blkdev_get_block+0x0/0x20
[<ffffffff8113028b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff8143877f>] ? md_ioctl+0x31f/0x1ac0
[<ffffffff811d7f00>] ? blkdev_open+0x0/0xc0
[<ffffffff81196d47>] ? __dentry_open+0x257/0x380
[<ffffffff811d6891>] blkdev_aio_read+0x51/0x80
[<ffffffff81199a6a>] do_sync_read+0xfa/0x140
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff811d66bc>] ? block_ioctl+0x3c/0x40
[<ffffffff811af522>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811af6c4>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff8123aa06>] ? security_file_permission+0x16/0x20
[<ffffffff8119a365>] vfs_read+0xb5/0x1a0
[<ffffffff8119b116>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a6b1>] sys_read+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task md128_reshape:2428 blocked for more than 120 seconds.
Not tainted 2.6.32-642.4.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md128_reshape D 0000000000000001 0 2428 2 0x00000000
ffff88081fdcfa60 0000000000000046 0000000000000000 0000000000000086
ffff88081fdcf9f0 ffffffff810633a3 0000013d983220e0 ffff88081f96d440
ffff88081dc65ed0 0000000100103f9f ffff88081ea1e5f8 ffff88081fdcffd8
Call Trace:
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa0240805>] get_active_stripe+0x2d5/0x880 [raid456]
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02447c0>] reshape_request+0x440/0xa20 [raid456]
[<ffffffff810633a3>] ? __wake_up+0x53/0x70
[<ffffffffa02450b2>] sync_request+0x312/0x3a0 [raid456]
[<ffffffff81430ff7>] md_do_sync+0x6c7/0xd60
[<ffffffff81431ae5>] md_thread+0x115/0x150
[<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff814319d0>] ? md_thread+0x0/0x150
[<ffffffff810a640e>] kthread+0x9e/0xc0
[<ffffffff8100c28a>] child_rip+0xa/0x20
[<ffffffff810a6370>] ? kthread+0x0/0xc0
[<ffffffff8100c280>] ? child_rip+0x0/0x20
Agora, comandos como pvscan e outros comandos que controlam o array md travam. Essencialmente, a única coisa que posso fazer é reiniciar para voltar a um estado em que possa controlar o array. Até a reinicialização trava no final, momento em que preciso fazer uma reinicialização total.
Alguém tem alguma idéia de como resolver isso? Eu tentei inicializar com um live cd do ubunutu, mas quando quis montar o array, tudo congelou.
Responder1
Então, no final, consegui recuperar esse array executando o seguinte comando:
mdadm --create /dev/md126 --level=6 --raid-devices=14 --name=gigantor:128 /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdo /dev/sdm missing missing /dev/sdc /dev/sdb /dev/sdd /dev/sde --assume-clean
Em seguida, adicione os dois dispositivos novamente.
mdadm --add /dev/md126 /dev/sdn
mdadm --add /dev/md126 /dev/sdl
Depois, como minhas informações do LVM também foram corrompidas, tive que recuperar isso:
pvcreate --uuid "kvEA4X-vobg-2Ipz-ITF1-ZhtW-Ewej-6liKVx" --restorefile /etc/lvm/backup/vg_data /dev/md126
vgcfgrestore vg_data
lvchange -ay /dev/vg_data/lvm0
Então tentei xfs_check o LVM e ele disse que precisava executar o reparo e destruir os logs:
xfs_repair -L /dev/mapper/vg_data-lvm0
Depois de reparado, agora pude montar meu LVM e meus dados estavam intactos.
Meu LVM está reparando.
Personalities : [raid6] [raid5] [raid4]
md126 : active raid6 sdm[14] sdl[7] sdk[15] sdn[6] sdj[5] sdi[4] sdg[2] sde[0] sdf[1] sdh[3] sdc[12] sda[11] sdd[13] sdb[10]
46882646016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [14/12] [UUUUUUUU__UUUU]
[==========>..........] recovery = 53.6% (2097061864/3906887168) finish=2448.0min speed=12321K/sec
bitmap: 0/30 pages [0KB], 65536KB chunk
md127 : active raid6 sdac[15] sdaa[13] sdz[2] sdab[0] sdq[9] sdy[14] sdx[4] sdr[8] sdp[10] sdo[16] sdv[6] sdu[7] sdw[5] sds[12] sdt[11]
50789533184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
unused devices: <none>
Que dor....