![Lenovo x1 extreme gen 1 の ext4-fs および system-journald エラーが新規インストール後も継続する](https://rvso.com/image/178474/Lenovo%20x1%20extreme%20gen%201%20%E3%81%AE%20ext4-fs%20%E3%81%8A%E3%82%88%E3%81%B3%20system-journald%20%E3%82%A8%E3%83%A9%E3%83%BC%E3%81%8C%E6%96%B0%E8%A6%8F%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB%E5%BE%8C%E3%82%82%E7%B6%99%E7%B6%9A%E3%81%99%E3%82%8B.png)
私はしばらくの間(2~3年)、Lenovo X1 Extreme Gen 1でpop os(現在はUbuntu 20.04ベースの最新バージョン)を非常に満足して使用してきましたが、最近、おそらくSSD関連のハードウェア問題(ラップトップがランダムにクラッシュし、ext4-fsおよびsystemd-journaldエラーが発生する)に遭遇しました。これは、新規インストール後も解決しません。以下にスクリーンショットをいくつか添付しますが、以下のログディレクトリで見つかったエラーのログも掲載します。
診断:
fdisk
、fsck
:
pop-os@pop-os:~$ sudo fdisk -l
Disk /dev/loop0: 2.24 GiB, 2400944128 bytes, 4689344 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVLB1T0HALR-000L7
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5FCEDA12-D1BA-4EEF-B174-C7F4C4F7ACFC
Device Start End Sectors Size Type
/dev/nvme0n1p1 4096 1023998 1019903 498M EFI System
/dev/nvme0n1p2 1024000 9412606 8388607 4G Microsoft basic data
/dev/nvme0n1p3 9412608 1992016558 1982603951 945.4G Linux filesystem
/dev/nvme0n1p4 1992016560 2000405166 8388607 4G Linux swap
(注: Microsoft 基本データは、Windows から残っている回復パーティションです)
sudo fsck -CvMf /dev/nvme0n1p3
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
366095 inodes used (0.59%, out of 61964288)
2849 non-contiguous files (0.8%)
412 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 326954/107
11214214 blocks used (4.53%, out of 247825493)
0 bad blocks
2 large files
287132 regular files
36849 directories
7 character device files
0 block device files
0 fifos
91242 links
42092 symbolic links (39013 fast symbolic links)
6 sockets
------------
457328 files
SMARTテスト:
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZVLB1T0HALR-000L7
Firmware Version: 5L2QEXA7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Utilization: 47,027,638,272 [47.0 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 8881b2cb9e
Local Time is: Mon Aug 3 16:34:10 2020 UTC
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 81 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.02W - - 0 0 0 0 0 0
1 + 6.30W - - 1 1 1 1 0 0
2 + 3.50W - - 2 2 2 2 0 0
3 - 0.0760W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 22,730,197 [11.6 TB]
Data Units Written: 39,001,161 [19.9 TB]
Host Read Commands: 280,072,901
Host Write Commands: 496,008,535
Controller Busy Time: 1,454
Power Cycles: 2,705
Power On Hours: 1,567
Unsafe Shutdowns: 226
Media and Data Integrity Errors: 0
Error Information Log Entries: 2,071
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 39 Celsius
Temperature Sensor 2: 41 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
より詳しい情報を得るために、キーワードのログを調べました。ログでnvme、ext4-fsを調べました。注目すべきは次のようなエントリです。
/var/log/kern.log:Aug 3 19:01:43 pop-os kernel: [ 237.251085] blk_update_request: I/O error, dev nvme0n1, sector 1209397344 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
...
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 2.115859] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.868483] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.868483] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.894018] EXT4-fs (nvme0n1p3): orphan cleanup on readonly fs
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.904196] EXT4-fs (nvme0n1p3): 227 orphan inodes deleted
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.904197] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 3.916157] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 4.235950] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 5.580150] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 3 22:39:08 pop-os kernel: [ 5.580956] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 3 22:39:48 pop-os kernel: [ 47.658007] blk_update_request: I/O error, dev nvme0n1, sector 1209399024 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 2.018779] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 3.839434] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 4.149146] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 5.006306] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 5 07:16:47 pop-os kernel: [ 5.006685] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 2.105116] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 3.892947] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.183333] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.682363] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 9 15:03:31 pop-os kernel: [ 4.683046] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 2.111633] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.817532] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.817532] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.827850] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 3.832040] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 4.169487] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 5.442449] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 10 13:35:55 pop-os kernel: [ 5.444632] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 2.078927] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 3.845395] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 3.845396] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026435] EXT4-fs (nvme0n1p3): orphan cleanup on readonly fs
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026557] EXT4-fs (nvme0n1p3): 16 orphan inodes deleted
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.026557] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.037091] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 4.352561] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 5.140268] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 00:03:10 pop-os kernel: [ 5.176295] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 2.063656] nvme0n1: p1 p2 p3 p4
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.861041] EXT4-fs (nvme0n1p3): INFO: recovery required on readonly filesystem
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.861041] EXT4-fs (nvme0n1p3): write access will be enabled during recovery
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.876059] EXT4-fs (nvme0n1p3): recovery complete
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 3.880170] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 4.200170] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 5.109084] FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
/var/log/kern.log:Aug 11 10:12:22 pop-os kernel: [ 5.131469] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
grep: /var/log/private: Is a directory
grep: /var/log/speech-dispatcher: Is a directory
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 2.092722] nvme0n1: p1 p2 p3 p4
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 3.780347] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
/var/log/syslog:Aug 3 18:58:00 pop-os kernel: [ 4.089493] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro
Binary file /var/log/syslog matches
dmesg では、温度に関連するエラーも多数確認しましたが、それがどれほど深刻なのかはわかりません (ただし、しきい値が 81 ℃ であることを考えると、少し心配です)。
[ 3.417048] kernel: mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417049] kernel: mce: CPU9: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417050] kernel: mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417050] kernel: mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417091] kernel: mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417093] kernel: mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417094] kernel: mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417095] kernel: mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417096] kernel: mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417097] kernel: mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417098] kernel: mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417099] kernel: mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417100] kernel: mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 3.417101] kernel: mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1)
最後に、popos をインストールしているとき (この問題のため、先月かなり何度もインストールしました)、おそらく 2 回に 1 回はインストーラーが抽出段階で失敗します。ライブ USB やインストール設定を一切変更せずに数回再試行すると機能するので、ランダムな読み取り/書き込みエラーのようです。インストール ログにも、入力/出力エラーが記録されているようです。注目すべきエントリは次のとおりです。
Jul 31 21:30:16 pop-os kernel: [ 163.161995] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 31 21:30:16 pop-os kernel: [ 163.254016] nvme 0000:71:00.0: Refused to change power state, currently in D3
Jul 31 21:30:16 pop-os kernel: [ 163.254502] nvme nvme0: Removing after probe failure status: -19
Jul 31 21:30:16 pop-os kernel: [ 163.346070] blk_update_request: I/O error, dev nvme0n1, sector 38805760 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
Jul 31 21:30:16 pop-os kernel: [ 163.347594] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:309: I/O error 10 writing to inode 33423367 (offset 9043968 size 2306048 starting block 1219744)
Jul 31 21:30:16 pop-os kernel: [ 163.347601] Buffer I/O error on device nvme0n1p3, logical block 43168
Jul 31 21:30:16 pop-os kernel: [ 163.347610] Buffer I/O error on device nvme0n1p3, logical block 43169
追加情報:
また、ラップトップに同梱されているメモリ診断テストもいくつか実行しましたが、いずれもエラーは返されませんでした (誰かに求められない限り、ここには投稿しません)。
Linux を再インストールするたびに、しばらくすると問題が再発します (ただし、ディスクをできるだけ空にしておくと、問題が発生するまでの時間が長くなると思います)。また、インストールの「更新」オプションも役に立たないようで、Linux を完全に再インストールする必要があります。
修正の試み:
Linux を新規インストールすると、通常、この問題は数日後に再発します。これは、ハード ディスクにどれだけのデータを格納したかに関係しているようです。意図的に最小限に抑えようとすると、クラッシュが発生するまでの時間が長くなるようです。前回のクラッシュは、Python で中程度の負荷 (100 MB 程度) の読み取りおよび書き込み操作を実行したときに発生しました。
Arch Linux Wikiにヒントが1つあります(https://wiki.archlinux.org/index.php/ソリッドステートドライブ/NVMe)は
Linux 4.10 での Samsung ドライブ エラー
Linux 4.10では、ドライブエラーが発生し、システムが不安定になることがあります。これは、ドライブが使用できない省電力状態の結果であると思われます。カーネルパラメータnvme_core.default_ps_max_latency_us=5500[4][5]を追加すると、最低の省電力状態が無効になり、書き込みエラーを防ぐことができます。
私のも Samsung 製です (詳細は下記を参照) ので、提案されたとおりに実行しましたが、効果がないようです。
lfvs を通じてすべてのファームウェアを更新しましたが、そこには SSD の更新はなく、主に BIOS でした。これにより、他のいくつかの問題は解決されましたが、この特定の問題は解決されませんでした。
ただし、ハードウェアに関する知識がほとんどなく、何も知らないランダムなカーネル パラメータを入力したくないため、どのように進めればよいかについてはあまりアイデアがあり
ません。要求があれば、完全なログを更新できます。
答え1
NVMe ハードウェア デバイスに問題があるようです。リカバリ USB イメージから起動してbadblocks
NVMe デバイスで実行してみるか、デバイスを消去/テストする Samsung 診断ツールがある場合はそれを実行してみてはいかがでしょうか。