암호화된 LVM에서 Debian/테스트 Linux - 파일 시스템 집중 작업으로 인해 시스템이 정지되고 시스템 저널 서비스가 중단됩니다.

암호화된 LVM에서 Debian/테스트 Linux - 파일 시스템 집중 작업으로 인해 시스템이 정지되고 시스템 저널 서비스가 중단됩니다.

여러 컴퓨터에서 Debian/테스트 Linux를 사용합니다. 보안상의 이유로 저는 항상 암호화된 LVM에 설치합니다. 일반적으로 저는 1TB~3TB 크기의 ext4 파일 시스템을 사용합니다.

불행하게도 이는 매우 불쾌한 부작용을 야기합니다.

파일 시스템 집약적인 작업을 수행할 때(예: 몇몇 학생의 Buildroot 프로젝트를 병렬로 컴파일, 20GB의 데이터를 tar.xz 형식으로 보관 또는 파일 시스템에 약 130GB를 쓰는 Xilinx Vivado와 같은 소프트웨어 설치) ), 시스템은 ca. 2분. dmesg를 실행하면 다음과 같은 짜증나는 메시지가 나타납니다.

[ 8648.672075] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 8648.672083] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 8648.672140] systemd[1]: systemd-journald.service: Unit process 7708 (systemd-journal) remains running after unit stopped.
[ 8648.672299] systemd[1]: Failed to start Journal Service.
[ 8648.672679] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 14.
[ 8648.672911] systemd[1]: Stopped Journal Service.
[ 8648.672980] systemd[1]: systemd-journald.service: Found left-over process 7708 (systemd-journal) in control group while starting unit. Ignoring.
[ 8648.672983] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 8648.673699] systemd[1]: Starting Journal Service...
[ 8738.922289] systemd[1]: systemd-journald.service: start operation timed out. Terminating.
[ 8828.923063] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing.
[ 8828.923105] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 8828.923141] systemd[1]: systemd-journald.service: Killing process 7708 (systemd-journal) with signal SIGKILL.
[ 8919.173428] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[ 9009.423787] systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
[ 9009.423831] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 9099.674142] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 9099.674173] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 9099.674241] systemd[1]: systemd-journald.service: Unit process 7854 (systemd-journal) remains running after unit stopped.
[ 9099.674477] systemd[1]: Failed to start Journal Service.
[ 9099.674924] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 15.
[ 9099.675102] systemd[1]: Stopped Journal Service.
[ 9099.675185] systemd[1]: systemd-journald.service: Found left-over process 7854 (systemd-journal) in control group while starting unit. Ignoring.
[ 9099.675209] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 9099.675958] systemd[1]: Starting Journal Service...

흥미로운 점은 RAM이 더 큰 시스템에서 문제가 더 자주 발생한다는 것입니다(저는 RAM 용량이 16GB(Intel i7 CPU), 32GB(Intel i7 CPU) 또는 64GB(Intel Xeon E-2176M CPU)인 시스템을 사용합니다). .

다음과 같은 오래된 유사한 문제 보고서를 발견했습니다.

위 내용을 바탕으로 마감일 스케줄러를 사용하는지 확인했습니다.

# cat /sys/block/sda/queue/scheduler
[mq-deadline] none

나는 또한 아주 새로운 커널을 사용합니다.

# uname -a
Linux WZabHP 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2 (2021-07-20) x86_64 GNU/Linux

심지어 해당 머신(32GB, i7 CPU) 중 하나에서 RT 커널로 전환하려고 시도했지만 도움이 되지 않았습니다. 더 나쁜 것은 정지로 인해 파일 시스템이 손상된다는 것입니다(RT 커널 사용을 중단한 후 효과가 두 번 반복되었습니다).

설명된 문제의 원인은 무엇입니까? 이를 치료하려면 어떤 설정을 확인/조정해야 합니까?

업데이트

나는 가상 메모리의 매개변수가 설명된 문제와 연관될 수 있음을 제안하는 또 다른 게시물 세트를 발견했습니다.

vm.dirty_background_ratio그러나 , 를 튜닝하는 것은 vm.dirty_ratio디스크 속도 분석이 필요한 것으로 보입니다 .

업데이트 2 사망 문제가 journald발생하기 전에 커널 로그에 다음이 나타납니다(첫 번째 시간 초과 관련 오류입니다).

Jul 26 20:44:38 WZabHP kernel: [  484.449978] INFO: task StreamTrans #6:2335 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.449981]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.449981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.449982] task:StreamTrans #6  state:D stack:    0 pid: 2335 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.449985] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.449989]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.449991]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.449997]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450000]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450013]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450016]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450017]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450021] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450022] RSP: 002b:00007fd2b8d528c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450023] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450024] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:44:38 WZabHP kernel: [  484.450025] RBP: 00007fd2af506ba0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:44:38 WZabHP kernel: [  484.450025] R10: 000200050000002a R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450026] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:44:38 WZabHP kernel: [  484.450030] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.450031]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.450031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.450032] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.450033] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.450034]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.450036]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.450039]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450040]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450047]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450049]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450050]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450051]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450052] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450053] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450054] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000073
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450056] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f7
Jul 26 20:44:38 WZabHP kernel: [  484.450057] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e
Jul 26 20:48:40 WZabHP kernel: [  726.109957] INFO: task jbd2/dm-2-8:591 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109959]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.109960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.109961] task:jbd2/dm-2-8     state:D stack:    0 pid:  591 ppid:     2 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.109963] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.109967]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.109969]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109970]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109971]  io_schedule+0x42/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109973]  bit_wait_io+0xd/0x50
Jul 26 20:48:40 WZabHP kernel: [  726.109974]  __wait_on_bit+0x2a/0x90
Jul 26 20:48:40 WZabHP kernel: [  726.109975]  out_of_line_wait_on_bit+0x92/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109977]  ? var_wake_function+0x20/0x20
Jul 26 20:48:40 WZabHP kernel: [  726.109982]  jbd2_journal_commit_transaction+0x16b3/0x1ad0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109987]  kjournald2+0xab/0x270 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109988]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109991]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109993]  kthread+0x11b/0x140
Jul 26 20:48:40 WZabHP kernel: [  726.109994]  ? __kthread_bind_mask+0x60/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.109996]  ret_from_fork+0x22/0x30
Jul 26 20:48:40 WZabHP kernel: [  726.109999] INFO: task journal-offline:2579 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109999]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110001] task:journal-offline state:D stack:    0 pid: 2579 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110002] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110003]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110004]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110007]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110009]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110021]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110024]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110025]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110029] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110030] RSP: 002b:00007f68be229cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110031] RAX: ffffffffffffffda RBX: 000056427bbd8870 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RDX: 00007f68c5a2b000 RSI: 00007f68c7649414 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68be22a700
Jul 26 20:48:40 WZabHP kernel: [  726.110033] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110034] R13: 00007ffe691a51df R14: 00007f68be229e00 R15: 000056427bbf9920
Jul 26 20:48:40 WZabHP kernel: [  726.110035] INFO: task journal-offline:2580 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110036]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110037] task:journal-offline state:D stack:    0 pid: 2580 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110038] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110039]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110040]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110043]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110045]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110051]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110053]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110054]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110056] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110057] RSP: 002b:00007f68bda28cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110058] RAX: ffffffffffffffda RBX: 000056427bc095d0 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RDX: 00007f68c622b000 RSI: 00007f68c7649414 RDI: 0000000000000026
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68bda29700
Jul 26 20:48:40 WZabHP kernel: [  726.110060] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110061] R13: 00007ffe691a51df R14: 00007f68bda28e00 R15: 0000000000802000
Jul 26 20:48:40 WZabHP kernel: [  726.110064] INFO: task NetworkManager:1090 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110065]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110066] task:NetworkManager  state:D stack:    0 pid: 1090 ppid:     1 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110067] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110068]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110069]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110072]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110073]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110080]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110081]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110083]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RIP: 0033:0x7fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RSP: 002b:00007ffedee571c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110086] RAX: ffffffffffffffda RBX: 00007fd15cd06928 RCX: 00007fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110087] RDX: 0000000000000184 RSI: 000055c28ee17560 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110088] RBP: 000000000000001a R08: 0000000000000000 R09: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110088] R10: 0000000000000184 R11: 0000000000000293 R12: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110089] R13: 000055c28ed78fd0 R14: 000055c28ee176e4 R15: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110104] INFO: task Permission:2299 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110105]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110106] task:Permission      state:D stack:    0 pid: 2299 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110107] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110108]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110109]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110112]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110113]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110120]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110122]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110123]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110124]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RSP: 002b:00007fd2beffe3e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110126] RAX: ffffffffffffffda RBX: 00007fd2bf0582e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110127] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000089
Jul 26 20:48:40 WZabHP kernel: [  726.110128] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2cec47af0
Jul 26 20:48:40 WZabHP kernel: [  726.110128] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110129] R13: 0000000000000000 R14: 0000000000010400 R15: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110131] INFO: task StreamTrans #3:2319 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110132]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110133] task:StreamTrans #3  state:D stack:    0 pid: 2319 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110134] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110135]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110136]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110139]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110140]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110146]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110148]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110149]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110150]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110151] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110152] RSP: 002b:00007fd2bba8e8c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110154] RBP: 00007fd2aecd16d0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R10: 0002000500000049 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:48:40 WZabHP kernel: [  726.110157] INFO: task QuotaManager IO:2333 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110158]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110159] task:QuotaManager IO state:D stack:    0 pid: 2333 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110160] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110161]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110162]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110165]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110166]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110172]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110173]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110175]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110176]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RSP: 002b:00007fd2b8d93020 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110178] RAX: ffffffffffffffda RBX: 00007fd2ba6f52e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000047
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2b8c1b910
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R13: 0000000000000000 R14: 0000000000000a00 R15: 000000000000003b
Jul 26 20:48:40 WZabHP kernel: [  726.110184] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110185]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110186] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00000000
Jul 26 20:48:40 WZabHP kernel: [  726.110187] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110188]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110189]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110192]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110193]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110196]  __jbd2_journal_force_commit+0x5d/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110199]  jbd2_journal_force_commit+0x1d/0x30 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110205]  ext4_sync_file+0x2c4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110206]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110207]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110209]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110209] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110210] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110211] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RDX: 00000000000a0000 RSI: 00007fd2b334c0f0 RDI: 0000000000000072
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RBP: 0000000000000002 R08: 0000000000000000 R09: 000000000000003e
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e

따라서 문제는 파일 시스템의 낮은 동기화와 관련이 있습니다. 디스크 처리량이 충분합니다. 그러나 암호화된 볼륨이 있는 설치에서는 문제가 나타납니다. CPU 성능(특히 Xeon 시스템의 경우)은 암호화를 충분히 빠르게 수행하기에 충분합니다.

암호화에 필요한 또 다른 희소 자원은 진정한 무작위 데이터입니다. 실제로 지금 액세스할 수 있는 시스템에는 rng-tools5나 rng-tools-debian이 설치되어 있지 않은 것 같습니다. 이것이 문제의 원인일 가능성이 높습니다.

답변1

나는 다음에 설명된 비슷한 문제를 발견했습니다.https://askubuntu.com/questions/1406444/what-is-causing-my-system-to-stall-freeze-corrupt-data-when-using-lvm-luks.

여기에 설명된 수정 사항 중 하나는 대규모 쓰기 시 시스템이 정지되는 현상을 제거했습니다.

cryptsetup  --perf-no_write_workqueue refresh name_of_the_mapping

(내 경우에는: cryptsetup --perf-no_write_workqueue refresh sda6_crypt)

이 옵션은 다음을 수행하여 영구적으로 설정할 수 있습니다.

cryptsetup  --perf-no_write_workqueue --persistent refresh name_of_the_mapping

문제를 버그로 보고했습니다.https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016474, 그러나 이것이 cryptsetup 패키지의 버그인지는 의심스럽습니다.

따라서 전체 솔루션은 다음과 같습니다.

  1. haveged패키지 설치
  2. --perf-no_write_workqueue암호화된 LVM 매핑에 옵션을 사용합니다 .

답변2

문제는 시스템의 엔트로피 부족과 관련이 있는 것 같습니다(설명에서 발견됨). 이는 암호화된 LVM이 있는 시스템에서 문제가 발생하는 이유를 설명합니다. 작성된 데이터를 암호화하려면 실제 임의 바이트가 필요합니다.

rng-tools5, 및 (@michalng가 제안한 대로 감사합니다!) 을 설치했는데 haveged문제가 줄어든 것 같습니다. 그러나 이것이 실제로 해당 문제에 대한 올바른 솔루션인지 확인하려면 더욱 철저한 테스트가 필요합니다.

관련 정보