Debian/Linux auf einem verschlüsselten LVM testen – dateisystemintensive Vorgänge führen zum Einfrieren des Systems und zum Absturz des System-Journald-Dienstes

Debian/Linux auf einem verschlüsselten LVM testen – dateisystemintensive Vorgänge führen zum Einfrieren des Systems und zum Absturz des System-Journald-Dienstes

Ich verwende Debian/Test-Linux auf meinen mehreren Maschinen. Aus Sicherheitsgründen installiere ich es immer auf einem verschlüsselten LVM. Normalerweise verwende ich die ext4-Dateisysteme mit einer Größe von 1 TB bis 3 TB.

Leider verursacht es einige sehr unangenehme Nebenwirkungen.

Wenn ich dateisystemintensive Operationen durchführe (z. B. paralleles Kompilieren einiger Buildroot-Projekte von Studenten, Archivieren von 20 GB Daten im tar.xz-Format oder Installieren einer Software wie Xilinx Vivado, die ca. 130 GB in das Dateisystem schreibt), friert das System regelmäßig für ca. 2 Minuten ein. Wenn ich dmesg ausführe, erhalte ich die folgenden nervigen Meldungen:

[ 8648.672075] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 8648.672083] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 8648.672140] systemd[1]: systemd-journald.service: Unit process 7708 (systemd-journal) remains running after unit stopped.
[ 8648.672299] systemd[1]: Failed to start Journal Service.
[ 8648.672679] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 14.
[ 8648.672911] systemd[1]: Stopped Journal Service.
[ 8648.672980] systemd[1]: systemd-journald.service: Found left-over process 7708 (systemd-journal) in control group while starting unit. Ignoring.
[ 8648.672983] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 8648.673699] systemd[1]: Starting Journal Service...
[ 8738.922289] systemd[1]: systemd-journald.service: start operation timed out. Terminating.
[ 8828.923063] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing.
[ 8828.923105] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 8828.923141] systemd[1]: systemd-journald.service: Killing process 7708 (systemd-journal) with signal SIGKILL.
[ 8919.173428] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[ 9009.423787] systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
[ 9009.423831] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 9099.674142] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 9099.674173] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 9099.674241] systemd[1]: systemd-journald.service: Unit process 7854 (systemd-journal) remains running after unit stopped.
[ 9099.674477] systemd[1]: Failed to start Journal Service.
[ 9099.674924] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 15.
[ 9099.675102] systemd[1]: Stopped Journal Service.
[ 9099.675185] systemd[1]: systemd-journald.service: Found left-over process 7854 (systemd-journal) in control group while starting unit. Ignoring.
[ 9099.675209] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 9099.675958] systemd[1]: Starting Journal Service...

Interessant ist, dass das Problem häufiger auf Maschinen mit größerem RAM auftritt (ich verwende Maschinen mit folgenden RAM-Kapazitäten: 16 GB (Intel i7-CPU), 32 GB (Intel i7-CPU) oder 64 GB (Intel Xeon E-2176M-CPU)).

Ich habe alte ähnliche Problemberichte gefunden wie:

Auf Grundlage des oben Gesagten habe ich geprüft, dass ich den Terminplaner verwende:

# cat /sys/block/sda/queue/scheduler
[mq-deadline] none

Ich verwende auch einen ziemlich neuen Kernel:

# uname -a
Linux WZabHP 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2 (2021-07-20) x86_64 GNU/Linux

Ich habe sogar versucht, auf einer dieser Maschinen (32 GB, i7-CPU) auf den RT-Kernel umzustellen, aber das hat nicht geholfen. Schlimmer noch: Das Einfrieren führt zu einer Beschädigung des Dateisystems (der Effekt wiederholte sich zweimal, danach habe ich aufgehört, den RT-Kernel zu verwenden).

Was kann die Ursache für das beschriebene Problem sein? Welche Einstellungen muss ich überprüfen/anpassen, um das Problem zu beheben?

Aktualisieren

Ich habe noch eine weitere Reihe von Beiträgen gefunden, die darauf schließen lassen, dass die Parameter des virtuellen Speichers mit dem beschriebenen Problem in Verbindung stehen könnten:

Es scheint jedoch, dass für die Feinabstimmung vm.dirty_background_ratioeine vm.dirty_ratioAnalyse der Festplattengeschwindigkeit erforderlich ist.

Aktualisierung 2 Bevor das Problem mit dem Sterben journaldauftritt, erscheint Folgendes im Kernel-Protokoll (das sind die ersten Timeout-bezogenen Fehler):

Jul 26 20:44:38 WZabHP kernel: [  484.449978] INFO: task StreamTrans #6:2335 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.449981]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.449981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.449982] task:StreamTrans #6  state:D stack:    0 pid: 2335 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.449985] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.449989]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.449991]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.449997]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450000]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450013]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450016]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450017]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450021] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450022] RSP: 002b:00007fd2b8d528c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450023] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450024] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:44:38 WZabHP kernel: [  484.450025] RBP: 00007fd2af506ba0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:44:38 WZabHP kernel: [  484.450025] R10: 000200050000002a R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450026] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:44:38 WZabHP kernel: [  484.450030] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.450031]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.450031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.450032] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.450033] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.450034]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.450036]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.450039]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450040]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450047]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450049]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450050]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450051]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450052] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450053] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450054] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000073
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450056] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f7
Jul 26 20:44:38 WZabHP kernel: [  484.450057] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e
Jul 26 20:48:40 WZabHP kernel: [  726.109957] INFO: task jbd2/dm-2-8:591 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109959]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.109960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.109961] task:jbd2/dm-2-8     state:D stack:    0 pid:  591 ppid:     2 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.109963] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.109967]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.109969]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109970]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109971]  io_schedule+0x42/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109973]  bit_wait_io+0xd/0x50
Jul 26 20:48:40 WZabHP kernel: [  726.109974]  __wait_on_bit+0x2a/0x90
Jul 26 20:48:40 WZabHP kernel: [  726.109975]  out_of_line_wait_on_bit+0x92/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109977]  ? var_wake_function+0x20/0x20
Jul 26 20:48:40 WZabHP kernel: [  726.109982]  jbd2_journal_commit_transaction+0x16b3/0x1ad0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109987]  kjournald2+0xab/0x270 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109988]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109991]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109993]  kthread+0x11b/0x140
Jul 26 20:48:40 WZabHP kernel: [  726.109994]  ? __kthread_bind_mask+0x60/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.109996]  ret_from_fork+0x22/0x30
Jul 26 20:48:40 WZabHP kernel: [  726.109999] INFO: task journal-offline:2579 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109999]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110001] task:journal-offline state:D stack:    0 pid: 2579 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110002] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110003]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110004]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110007]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110009]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110021]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110024]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110025]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110029] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110030] RSP: 002b:00007f68be229cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110031] RAX: ffffffffffffffda RBX: 000056427bbd8870 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RDX: 00007f68c5a2b000 RSI: 00007f68c7649414 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68be22a700
Jul 26 20:48:40 WZabHP kernel: [  726.110033] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110034] R13: 00007ffe691a51df R14: 00007f68be229e00 R15: 000056427bbf9920
Jul 26 20:48:40 WZabHP kernel: [  726.110035] INFO: task journal-offline:2580 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110036]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110037] task:journal-offline state:D stack:    0 pid: 2580 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110038] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110039]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110040]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110043]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110045]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110051]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110053]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110054]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110056] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110057] RSP: 002b:00007f68bda28cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110058] RAX: ffffffffffffffda RBX: 000056427bc095d0 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RDX: 00007f68c622b000 RSI: 00007f68c7649414 RDI: 0000000000000026
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68bda29700
Jul 26 20:48:40 WZabHP kernel: [  726.110060] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110061] R13: 00007ffe691a51df R14: 00007f68bda28e00 R15: 0000000000802000
Jul 26 20:48:40 WZabHP kernel: [  726.110064] INFO: task NetworkManager:1090 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110065]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110066] task:NetworkManager  state:D stack:    0 pid: 1090 ppid:     1 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110067] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110068]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110069]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110072]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110073]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110080]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110081]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110083]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RIP: 0033:0x7fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RSP: 002b:00007ffedee571c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110086] RAX: ffffffffffffffda RBX: 00007fd15cd06928 RCX: 00007fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110087] RDX: 0000000000000184 RSI: 000055c28ee17560 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110088] RBP: 000000000000001a R08: 0000000000000000 R09: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110088] R10: 0000000000000184 R11: 0000000000000293 R12: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110089] R13: 000055c28ed78fd0 R14: 000055c28ee176e4 R15: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110104] INFO: task Permission:2299 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110105]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110106] task:Permission      state:D stack:    0 pid: 2299 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110107] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110108]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110109]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110112]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110113]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110120]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110122]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110123]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110124]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RSP: 002b:00007fd2beffe3e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110126] RAX: ffffffffffffffda RBX: 00007fd2bf0582e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110127] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000089
Jul 26 20:48:40 WZabHP kernel: [  726.110128] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2cec47af0
Jul 26 20:48:40 WZabHP kernel: [  726.110128] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110129] R13: 0000000000000000 R14: 0000000000010400 R15: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110131] INFO: task StreamTrans #3:2319 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110132]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110133] task:StreamTrans #3  state:D stack:    0 pid: 2319 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110134] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110135]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110136]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110139]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110140]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110146]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110148]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110149]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110150]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110151] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110152] RSP: 002b:00007fd2bba8e8c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110154] RBP: 00007fd2aecd16d0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R10: 0002000500000049 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:48:40 WZabHP kernel: [  726.110157] INFO: task QuotaManager IO:2333 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110158]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110159] task:QuotaManager IO state:D stack:    0 pid: 2333 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110160] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110161]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110162]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110165]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110166]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110172]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110173]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110175]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110176]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RSP: 002b:00007fd2b8d93020 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110178] RAX: ffffffffffffffda RBX: 00007fd2ba6f52e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000047
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2b8c1b910
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R13: 0000000000000000 R14: 0000000000000a00 R15: 000000000000003b
Jul 26 20:48:40 WZabHP kernel: [  726.110184] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110185]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110186] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00000000
Jul 26 20:48:40 WZabHP kernel: [  726.110187] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110188]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110189]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110192]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110193]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110196]  __jbd2_journal_force_commit+0x5d/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110199]  jbd2_journal_force_commit+0x1d/0x30 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110205]  ext4_sync_file+0x2c4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110206]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110207]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110209]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110209] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110210] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110211] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RDX: 00000000000a0000 RSI: 00007fd2b334c0f0 RDI: 0000000000000072
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RBP: 0000000000000002 R08: 0000000000000000 R09: 000000000000003e
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e

Das Problem hängt also mit der zu geringen Synchronisation des Dateisystems zusammen. Der Datendurchsatz der Festplatte ist ausreichend. Das Problem tritt jedoch bei Installationen mit verschlüsselten Volumes auf. Die CPU-Leistung (insbesondere bei der Xeon-Maschine) ist mehr als ausreichend, um die Verschlüsselung schnell genug durchzuführen.

Eine weitere knappe Ressource, die für die Verschlüsselung benötigt wird, sind die echten Zufallsdaten. Tatsächlich schien auf dem System, auf das ich jetzt zugreifen kann, weder rng-tools5 noch rng-tools-debian installiert zu sein. Das ist wahrscheinlich die Ursache des Problems.

Antwort1

Ich habe das ähnliche Problem gefunden, das inhttps://askubuntu.com/questions/1406444/was-führt-dazu-dass-mein-System-stalliert-einfriert-beschädigte-Daten-bei-der-Verwendung-von-lvm-luks.

Eine der hier beschriebenen Modifikationen hat das Einfrieren des Systems bei massiven Schreibvorgängen verhindert.

cryptsetup  --perf-no_write_workqueue refresh name_of_the_mapping

(in meinem Fall war es: cryptsetup --perf-no_write_workqueue refresh sda6_crypt)

Die Option kann wie folgt dauerhaft gesetzt werden:

cryptsetup  --perf-no_write_workqueue --persistent refresh name_of_the_mapping

Ich habe das Problem als Fehler gemeldethttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016474, es ist jedoch fraglich, ob es sich hierbei um einen Fehler im Cryptsetup-Paket handelt.

Die vollständige Lösung scheint also zu sein:

  1. Installation des havegedPakets
  2. Verwenden der --perf-no_write_workqueueOption im Mapping des verschlüsselten LVM.

Antwort2

Es scheint, dass das Problem (wie in den Kommentaren beschrieben) mit einem Mangel an Entropie im System zusammenhängt. Das erklärt, warum das Problem in Systemen mit verschlüsseltem LVM auftritt. Die echten Zufallsbytes werden für die Verschlüsselung der geschriebenen Daten benötigt.

Ich habe das installiert rng-tools5und haveged(wie @michalng vorschlägt, danke!), und es scheint, dass das Problem reduziert ist. Es sind jedoch weitere gründliche Tests erforderlich, um zu bestätigen, dass dies wirklich die richtige Lösung für dieses Problem ist.

verwandte Informationen