Solução de problemas do Ubuntu 22.10 Hard Crash com sistema legal e sem carga pesada

Solução de problemas do Ubuntu 22.10 Hard Crash com sistema legal e sem carga pesada

Estou executando o Ubuntu 22.10 em minha máquina Home Desktop.

Meu sistema está travando em intervalos que parecem ser aleatórios, sem nenhuma causa imediata que eu possa imaginar. Sem qualquer aviso ou qualquer ação específica que possa causar isso, o computador simplesmente desliga e reinicia, como quando alguém pressiona o botão “Reset” no gabinete/BIOS.

Estes são meus sensores imediatamente após uma dessas falhas:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +58.6°C  
Tccd1:        +46.2°C  
Tccd2:        +44.5°C  

nvme-pci-2200
Adapter: PCI adapter
Composite:    +46.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +56.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-2300
Adapter: PCI adapter
Composite:    +56.9°C  (low  =  -0.1°C, high = +89.8°C)
                       (crit = +94.8°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +42.0°C  

nct6797-isa-0a20
Adapter: ISA adapter
in0:            1.26 V  (min =  +0.00 V, max =  +1.74 V)
in1:          1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:            3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:            1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:          160.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:          672.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:            1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
in11:         456.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:           1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:         680.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:           1.53 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:            0 RPM  (min =    0 RPM)
fan2:         1086 RPM  (min =    0 RPM)
fan3:            0 RPM  (min =    0 RPM)
fan4:            0 RPM  (min =    0 RPM)
fan5:          699 RPM  (min =    0 RPM)
fan6:          969 RPM  (min =    0 RPM)
fan7:         1422 RPM  (min =    0 RPM)
SYSTIN:        +47.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = CPU diode
CPUTIN:        +41.0°C  (high = +108.0°C, hyst = +90.0°C)  sensor = thermistor
AUXTIN0:       +45.0°C  (high = +108.0°C, hyst = +90.0°C)  sensor = thermistor
AUXTIN1:      -128.0°C    sensor = thermistor
AUXTIN2:       +62.0°C    sensor = thermistor
AUXTIN3:        -2.0°C    sensor = thermistor
Virtual_TEMP:  +58.0°C  
Virtual_TEMP:  +59.0°C  
Virtual_TEMP:  +58.0°C  
Virtual_TEMP:  +58.0°C  
TSI0_TEMP:     +58.5°C  
intrusion0:   ALARM
intrusion1:   ALARM
beep_enable:  disabled

nvme-pci-0100
Adapter: PCI adapter
Composite:    +45.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)

Esta é a minha journalctl -b -1 -esaída, mostrando que nada foi registrado antes da falha/reinicialização:

mag 29 10:49:14 bwian-MS-7C35 gnome-shell[2742]: Window manager warning: Overwriting existing binding of keysym 38 with keysym 38 (keycode 11).
mag 29 10:49:14 bwian-MS-7C35 gnome-shell[2742]: Window manager warning: Overwriting existing binding of keysym 39 with keysym 39 (keycode 12).
mag 29 10:50:04 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x5650fd7ec680] is on because it needs an allocation.
mag 29 10:50:04 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x5650fdea1ba0] is on because it needs an allocation.
mag 29 10:50:04 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x5651001067c0] is on because it needs an allocation.
mag 29 10:50:25 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x5651001067c0] is on because it needs an allocation.
mag 29 10:52:10 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x5650fd7ec680] is on because it needs an allocation.
mag 29 10:52:10 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x5650ffdcd1f0] is on because it needs an allocation.
mag 29 10:52:10 bwian-MS-7C35 gnome-shell[2742]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x565109af6f60] is on because it needs an allocation.
mag 29 10:53:08 bwian-MS-7C35 [email protected][2742]: Microsoft Teams - Preview1, Impossible to lookup icon for 'Microsoft Teams - Preview1_13-panel' in path /tmp/.org.chr>
mag 29 10:53:08 bwian-MS-7C35 [email protected][2742]: unable to update icon for Microsoft Teams - Preview1
mag 29 10:53:15 bwian-MS-7C35 [email protected][2742]: Microsoft Teams - Preview1, Impossible to lookup icon for 'Microsoft Teams - Preview1_14-panel' in path /tmp/.org.chr>
mag 29 10:53:15 bwian-MS-7C35 [email protected][2742]: unable to update icon for Microsoft Teams - Preview1
mag 29 10:55:01 bwian-MS-7C35 CRON[29178]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
mag 29 10:55:01 bwian-MS-7C35 CRON[29179]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
mag 29 10:55:01 bwian-MS-7C35 CRON[29178]: pam_unix(cron:session): session closed for user root

Também kern.lognão mostro nada que considero relevante:

May 29 09:48:09 bwian-MS-7C35 kernel: [ 2121.745635] kauditd_printk_skb: 7 callbacks suppressed
May 29 09:48:09 bwian-MS-7C35 kernel: [ 2121.745638] audit: type=1400 audit(1685346489.868:116): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" pro
file="unconfined" name="libreoffice-oosplash" pid=18136 comm="apparmor_parser"
May 29 09:48:09 bwian-MS-7C35 kernel: [ 2121.767162] audit: type=1400 audit(1685346489.888:117): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" pro
file="unconfined" name="libreoffice-senddoc" pid=18140 comm="apparmor_parser"
May 29 09:48:12 bwian-MS-7C35 kernel: [ 2124.796003] audit: type=1400 audit(1685346492.916:118): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libreoffice-soffi
ce" pid=18143 comm="apparmor_parser"
May 29 09:48:12 bwian-MS-7C35 kernel: [ 2124.822358] audit: type=1400 audit(1685346492.944:119): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libreoffice-soffi
ce//gpg" pid=18143 comm="apparmor_parser"
May 29 09:48:12 bwian-MS-7C35 kernel: [ 2124.846377] audit: type=1400 audit(1685346492.968:120): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" pro
file="unconfined" name="libreoffice-xpdfimport" pid=18182 comm="apparmor_parser"
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] Linux version 5.19.0-42-generic (buildd@lcy02-amd64-019) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutil
s for Ubuntu) 2.39) #43-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 18:21:28 UTC 2023 (Ubuntu 5.19.0-42.43-generic 5.19.17)
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.19.0-42-generic root=UUID=ea1660b0-ea10-41d0-baa8-bc942fb21e02 ro quiet splash vt.handoff=7
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] KERNEL supported cpus:
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000]   Intel GenuineIntel
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000]   AMD AuthenticAMD
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000]   Hygon HygonGenuine
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000]   Centaur CentaurHauls
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000]   zhaoxin   Shanghai  
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] signal: max sigframe size: 1776
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-provided physical RAM map:
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000009d81fff] usable
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x0000000009d82000-0x0000000009ffffff] reserved
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x000000000a000000-0x000000000a1fffff] usable
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x000000000a200000-0x000000000a20ffff] ACPI NVS
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x000000000a210000-0x00000000cacb0fff] usable
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x00000000cacb1000-0x00000000cb0a8fff] reserved
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x00000000cb0a9000-0x00000000cb10cfff] ACPI data
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x00000000cb10d000-0x00000000ccc0cfff] ACPI NVS
May 29 11:04:23 bwian-MS-7C35 kernel: [    0.000000] BIOS-e820: [mem 0x00000000ccc0d000-0x00000000cdbfefff] reserved

Notas Adicionais:

  • Não há nada que eu possa fazer para reproduzir o problema
  • A temperatura parece boa, a pasta térmica foi trocada no ano passado e estou usando um cooler Noctua NX-15 de alto desempenho. O PC está livre de poeira.
  • O sistema está atualizado
  • Às vezes, o problema está relacionado à carga pesada, tanto na GPUoua CPU
  • ...Mas os testes de estresse não reproduziram a falha
  • Memtest não encontrou nenhum problema
  • Eu também usei alguns programas de benchmark e testes de estresse para tentar colocar alguma carga na PSU, mas eles não tinham assistente ou testes automatizados e não tenho certeza se fiz isso corretamente, ficaria feliz se alguém me fornecesse alguma maneira de testar meu fonte de alimentação
  • Os registros parecem claros, como se o sistema não conseguisse gravar nada a tempo antes de reiniciar

Minha suspeita é que a fonte de alimentação ou a placa-mãe possam estar falhando, mas nunca consegui ter certeza de qual é o problema.

Como posso garantir qual é o hardware com falha?

Responder1

  1. Substitua uma fonte de alimentação em funcionamento para verificar se o Ubuntu ainda trava.

  2. Se você não tiver um ou não puder emprestar um, instale o Windows 10 para uma configuração de inicialização dupla. Existem mais opções de programas de teste de estresse fáceis de usar no Windows. Execute-os para ver se o Windows trava. Se funcionar sem problemas, você quase pode descartar um problema de hardware.

  3. Alterne entre os drivers Nouveau e de GPU proprietários e experimente também versões diferentes.

  4. Considere experimentar o Ubuntu 20.04 ou usar ambientes de desktop diferentes do Cinnamon, como KDE, Xfce ou LXDE, para ver se o problema persiste.

informação relacionada