我想弄清楚我的硬碟是否快要死了。我研究了智能值,看起來可能是這樣,但它仍然可以很好地讀取和寫入數據,並且沒有出現新的錯誤。
先前197 Current_Pending_Sector
的值為 8,但將驅動器清除後,該值恢復為 0,並且196 Reallocated_Event_Count
為 0。
這是否意味著驅動器本身沒有問題,只是暫時的系統問題?
另外值得關注的是188 Command_Timeout
它的值為 1,其定義為:
由於 HDD 逾時而中止的操作的計數。通常該屬性值應等於零,如果該值遠大於零,則很可能存在電源或數據線氧化的嚴重問題。
我一直在進行一些低階編程,並且不得不強制關閉電腦大約 50 次。
我假設191 G-Sense_Error_Rate
438 的值很好,我認為這是在硬碟開啟時移動筆記型電腦造成的。
真正有趣的是我的 Windows 分割區停止啟動並且無法安裝到另一台 Windows 或 Linux 機器上,但它在 OSX 上安裝得很好,允許我恢復我的檔案。我重新安裝並將資料複製到其中,它似乎運作得很好。 OSX 在另一個驅動器上。
H2O:~ jeremiah$ smartctl -a /dev/disk1
smartctl 6.3 2014-07-26 r3976 [x86_64-apple-darwin14.1.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HGST HTS541075A9E680
Serial Number: JD13021X0A00GK
LU WWN Device Id: 5 000cca 764c48bc4
Firmware Version: JA2OA590
User Capacity: 750,156,374,016 bytes [750 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Mar 11 21:59:30 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 45) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 164) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 086 062 Pre-fail Always - 0
2 Throughput_Performance 0x0025 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0023 169 100 033 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 981
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 2586
10 Spin_Retry_Count 0x0033 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 851
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 001 000 Old_age Always - 144929376764360
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 069 050 045 Old_age Always - 31 (Min/Max 24/31)
191 G-Sense_Error_Rate 0x0032 099 099 000 Old_age Always - 438
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 2031647
193 Load_Cycle_Count 0x0032 089 089 000 Old_age Always - 115337
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x002a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 456 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 456 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 38 8d 62 00 Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 38 8d 62 40 00 00:00:34.282 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:30.471 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:26.660 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:22.849 READ DMA EXT
2f 00 01 10 00 00 00 00 00:00:22.849 READ LOG EXT
Error 455 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 38 8d 62 00 Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 38 8d 62 40 00 00:00:30.471 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:26.660 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:22.849 READ DMA EXT
2f 00 01 10 00 00 00 00 00:00:22.849 READ LOG EXT
60 08 a8 38 8d 62 40 00 00:00:19.060 READ FPDMA QUEUED
Error 454 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 38 8d 62 00 Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 38 8d 62 40 00 00:00:26.660 READ DMA EXT
25 00 08 38 8d 62 40 00 00:00:22.849 READ DMA EXT
2f 00 01 10 00 00 00 00 00:00:22.849 READ LOG EXT
60 08 a8 38 8d 62 40 00 00:00:19.060 READ FPDMA QUEUED
60 08 a0 30 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
Error 453 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 38 8d 62 00 Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 38 8d 62 40 00 00:00:22.849 READ DMA EXT
2f 00 01 10 00 00 00 00 00:00:22.849 READ LOG EXT
60 08 a8 38 8d 62 40 00 00:00:19.060 READ FPDMA QUEUED
60 08 a0 30 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
60 08 98 28 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
Error 452 occurred at disk power-on lifetime: 2548 hours (106 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 08 38 8d 62 00 Error: UNC at LBA = 0x00628d38 = 6458680
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 a8 38 8d 62 40 00 00:00:19.060 READ FPDMA QUEUED
60 08 a0 30 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
60 08 98 28 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
60 08 90 20 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
60 08 88 18 8d 62 40 00 00:00:19.059 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
答案1
第 197 章 Current_Pending_Sector 的值為 8,但在將磁碟機清除後,該值恢復為 0,且第 196 章 Reallated_Event_Count 為 0。
這意味著在某一時刻,驅動器在讀取某些扇區時遇到問題,但自從您將驅動器清除後,這些扇區就沒有出現任何問題。當您用新資料覆蓋整個磁碟機時,磁區從待重新分配變為正常,且磁碟機可能對寫入感到滿意,因為此時磁區尚未重新指派。您應該運行長時間的 SMART 自檢(通常包括表面掃描)來驗證,但這很可能是一個故障,可能與驅動器運行時移動計算機有關。
另外值得注意的是 188 Command_Timeout,其值為 1,其定義為:
不值得擔心。該驅動器報告了近 2600 小時的開機時間,並且在該時間段內出現了單一命令逾時。命令逾時由作業系統透過重試失敗的命令或使 I/O 操作失敗來處理,因此如果這是一個持續存在的問題,您就會知道它。可能與 8 個待處理磁區相關,也可能無關。
如果這個數字開始顯著上升,我會擔心,但是單位數的超時次數並且沒有其他系統運行問題的跡像不會讓我擔心。
我一直在進行一些低階編程,並且不得不強制關閉電腦大約 50 次。
這不應該影響任何值得擔心的級別的實體驅動器,儘管它可能會影響邏輯資料一致性(檔案系統損壞等)。
另外,從鋸末的評論:
您應該運行短期和擴展自檢。大量 ID#187 Reported_Un Correct 錯誤表示有問題。大約 40 小時前,似乎存在大量無法糾正的讀取錯誤。
這是一個很好的觀點,但是我們不知道原始值的編碼。我們可以知道的是,「值」目前是標準化的 100,最差值為 1,閾值(用於報告驅動器已發生故障或故障即將發生)為 0。在目前的時間驅動器並不覺得這個值有什麼值得擔心的。 1.45e14 的讀取錯誤聽起來幾乎不可能高;據其自己承認,該驅動器大約有(750 GB,4 KiB/扇區)183,000 個扇區。為了獲得作為原始值報告的讀取失敗次數,每個扇區在報告的 2,586 個通電小時內必須失敗 791,000 次,或者說該扇區有一次徹底的讀取失敗。全部的每 11 秒浮出一次。這簡直是個荒謬的數字(十秒鐘內你就可以讀僅佔整個磁碟表面的一小部分),因此我們可以高度肯定地得出結論,對於此磁碟機和屬性 187,原始值為其他的東西比簡單的整數計數。原始值可能被分成兩部分,高位或低位編碼實際值,其他位編碼其他內容。此屬性的原始值的十六進位值為 83D0 0005 01C8,其中中間的零串確實指示了這種編碼;雖然當然有可能,但隨機錯誤計數似乎不太可能在中間有這麼長的一串零。例如,如果我們採用較低位元(501C8 十六進位),則結果為 328,136 個報告錯誤,儘管仍然有很多聽起來很多更可信。
底線,SMART 可以是一個很好的監控工具,但它並不是為了捕捉和報告所有問題而設計的。有些驅動器即使在 SMART 表明它們應該完全失效很久之後仍能正常運行,而有些驅動器卻出現災難性故障,即使 SMART 表示即使在故障後一切都很好。了解 SMART 資料的本質,即預警系統和狀態報告,不是關於驅動器健康狀況的某種絕對事實。此外,您必須以挑剔的眼光讀取原始值,因為這些值的編碼是實現定義的。相當,您應該查看報告的“值”與驅動器的“閾值”值的比較,因為這些值應該由製造商為特定驅動器進行有意義的定義。
如果您擔心那些較早的待處理(這基本上意味著“發現很難閱讀”)扇區,透過 SMART 運行全表面掃描。如果它們返回為“待處理”,那麼可能值得考慮是否更換驅動器,但簡單的事實是,幾乎所有驅動器都會發展一些壞扇區在其使用壽命內,並且有許多備用扇區可以透過重新分配壞扇區來補償。然而,重新分配確實需要知道數據,因此如果某個磁區發生故障,則只能在寫入該磁區期間重新分配它。