
He estado usando pop os (actualmente tengo la última versión basada en Ubuntu 20.04) con mi lenovo x1 extreme gen 1 muy felizmente por un tiempo (2-3 años), pero recientemente me encontré con lo que posiblemente sea un problema de hardware relacionado con el SSD (computadora portátil). falla aleatoriamente y genera errores ext4-fs y systemd-journald) que persisten después de una nueva instalación. Adjunto algunas capturas de pantalla a continuación, pero también colocaré un registro de errores que pude encontrar en el directorio de registros a continuación.
Diagnóstico:
fdisk
, fsck
:
pop-os@pop-os:~$ sudo fdisk -l
Disk /dev/loop0: 2.24 GiB, 2400944128 bytes, 4689344 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVLB1T0HALR-000L7
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5FCEDA12-D1BA-4EEF-B174-C7F4C4F7ACFC
Device Start End Sectors Size Type
/dev/nvme0n1p1 4096 1023998 1019903 498M EFI System
/dev/nvme0n1p2 1024000 9412606 8388607 4G Microsoft basic data
/dev/nvme0n1p3 9412608 1992016558 1982603951 945.4G Linux filesystem
/dev/nvme0n1p4 1992016560 2000405166 8388607 4G Linux swap
(Nota: los datos básicos de Microsoft son la partición de recuperación restante de Windows)
sudo fsck -CvMf /dev/nvme0n1p3
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
366095 inodes used (0.59%, out of 61964288)
2849 non-contiguous files (0.8%)
412 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 326954/107
11214214 blocks used (4.53%, out of 247825493)
0 bad blocks
2 large files
287132 regular files
36849 directories
7 character device files
0 block device files
0 fifos
91242 links
42092 symbolic links (39013 fast symbolic links)
6 sockets
------------
457328 files
Prueba INTELIGENTE:
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZVLB1T0HALR-000L7
Firmware Version: 5L2QEXA7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Utilization: 47,027,638,272 [47.0 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 8881b2cb9e
Local Time is: Mon Aug 3 16:34:10 2020 UTC
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 81 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.02W - - 0 0 0 0 0 0
1 + 6.30W - - 1 1 1 1 0 0
2 + 3.50W - - 2 2 2 2 0 0
3 - 0.0760W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 22,730,197 [11.6 TB]
Data Units Written: 39,001,161 [19.9 TB]
Host Read Commands: 280,072,901
Host Write Commands: 496,008,535
Controller Busy Time: 1,454
Power Cycles: 2,705
Power On Hours: 1,567
Unsafe Shutdowns: 226
Media and Data Integrity Errors: 0
Error Information Log Entries: 2,071
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 39 Celsius
Temperature Sensor 2: 41 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
Para obtener información más detallada, revisé los registros en busca de palabras clave. Miré nvme, ext4-fs en los registros. Destacan las entradas como
/var/log/kern.log:Aug 3 19:01:43 pop-os kernel: [ 237.251085] blk_update_request: I/O error, dev nvme0n1, sector 1209397344 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
...
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 2.115859] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.868483] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.868483] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.894018] EXT4-fs (nvme0n1p3): orphan cleanup on readonly fs
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.904196] EXT4-fs (nvme0n1p3): 227 orphan inodes deleted
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.904197] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.916157] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 4.235950] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 5.580150] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 5.580956] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 3 22:39:48 pop-os kernel: [ 47.658007] blk_update_request: I/O error, dev nvme0n1, sector 1209399024 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 2.018779] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 3.839434] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 4.149146] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 5.006306] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 5.006685] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 2.105116] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 3.892947] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.183333] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.682363] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.683046] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 2.111633] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.817532] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.817532] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.827850] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.832040] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 4.169487] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 5.442449] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 5.444632] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 2.078927] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 3.845395] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 3.845396] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026435] EXT4-fs (nvme0n1p3): orphan cleanup on readonly fs
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026557] EXT4-fs (nvme0n1p3): 16 orphan inodes deleted
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026557] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.037091] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.352561] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 5.140268] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 5.176295] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 2.063656] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.861041] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.861041] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.876059] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.880170] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 4.200170] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 5.109084] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 5.131469] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
grep: /var/log/private: Is a directory
grep: /var/log/speech-dispatcher: Is a directory
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 2.092722] nvme0n1: p1 p2 p3 p4
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 3.780347] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 4.089493] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
Binary file /var/log/syslog matches
En dmesg, también vi un montón de errores relacionados con la temperatura, aunque no sé qué tan grave es (aunque dado que el umbral es 81 ° C, eso es un poco preocupante).
[ 3.417048] kernel: mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417049] kernel: mce: CPU9: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417050] kernel: mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417050] kernel: mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417091] kernel: mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417093] kernel: mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417094] kernel: mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417095] kernel: mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417096] kernel: mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417097] kernel: mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417098] kernel: mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417099] kernel: mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417100] kernel: mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417101] kernel: mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1)
Finalmente, cuando instalo popos (lo cual hice muchas veces durante el último mes debido a este problema), tal vez uno de cada dos intentos el instalador falle en la etapa de extracción. Funciona cuando lo vuelvo a intentar un par de veces sin cambiar nada con el USB en vivo o cualquiera de las configuraciones de instalación, por lo que también parece un error aleatorio de lectura/escritura. El registro de instalación también parece revelar un error de entrada/salida. Cabe destacar entradas como:
Jul 31 21:30:16 pop-os kernel: [ 163.161995] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 31 21:30:16 pop-os kernel: [ 163.254016] nvme 0000:71:00.0: Refused to change power state, currently in D3
Jul 31 21:30:16 pop-os kernel: [ 163.254502] nvme nvme0: Removing after probe failure status: -19
Jul 31 21:30:16 pop-os kernel: [ 163.346070] blk_update_request: I/O error, dev nvme0n1, sector 38805760 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
Jul 31 21:30:16 pop-os kernel: [ 163.347594] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:309: I/O error 10 writing to inode 33423367 (offset 9043968 size 2306048 starting block 1219744)
Jul 31 21:30:16 pop-os kernel: [ 163.347601] Buffer I/O error on device nvme0n1p3, logical block 43168
Jul 31 21:30:16 pop-os kernel: [ 163.347610] Buffer I/O error on device nvme0n1p3, logical block 43169
Información extra:
También hice algunas de las pruebas de diagnóstico de memoria que vienen incluidas con la computadora portátil, ninguna de las cuales arrojó ningún error (no las publico aquí a menos que alguien lo solicite).
Después de cada reinstalación de Linux, el problema reaparece después de un tiempo (aunque creo que tarda más en aparecer si intento mantener el disco lo más vacío posible). Además, la opción "actualizar" en la instalación no parece ayudar. Tengo que reinstalar Linux por completo.
INTENTOS DE ARREGLO:
Cuando hago una nueva instalación de Linux, este problema suele reaparecer después de unos días y parece estar relacionado con la cantidad de cosas que pongo en mi disco duro. Si intento intencionalmente mantenerlo al mínimo, parece que tomará más tiempo antes de que ocurra el bloqueo. El último bloqueo ocurrió cuando estaba realizando algunas operaciones de lectura y escritura semiintensivas (un par de 100 MB) a través de Python.
Hay una pista en la wiki de arch-linux (https://wiki.archlinux.org/index.php/Solid_state_drive/NVMe) que dice
Errores de la unidad Samsung en Linux 4.10
En Linux 4.10, pueden ocurrir errores en la unidad y causar inestabilidad en el sistema. Esto parece ser el resultado de un estado de ahorro de energía que la unidad no puede utilizar. Agregar el parámetro del kernel nvme_core.default_ps_max_latency_us=5500[4][5] deshabilita el estado de ahorro de energía más bajo, evitando errores de escritura.
El mío también es Samsung (ver más abajo para más detalles), así que hice lo que se sugiere, pero esto no parece ayudar.
Actualicé todo mi firmware a través de lfvs, aunque no había actualizaciones de SSD allí, principalmente BIOS. Esto resolvió algunos otros problemas pero no este problema en particular.
Sin embargo, no tengo muchas ideas sobre cómo proceder, ya que casi no tengo conocimiento sobre hardware y no quiero ingresar parámetros aleatorios del kernel de los que no sé nada.
Puedo actualizar registros completos si así lo solicita.
Respuesta1
Parece que tienes un problema con el dispositivo de hardware NVMe. Podría intentar arrancar desde una imagen USB de recuperación y ejecutarlo badblocks
en el dispositivo NVMe, o si hay una herramienta de diagnóstico de Samsung para borrar/probar el dispositivo, ¿podría ejecutarla?