簡單 mdadm RAID 1 未啟動備用

Question 1

這樣做只是將磁碟機放入陣列中，而沒有實際對其進行任何操作，即它是陣列的成員，但在陣列中不處於活動狀態。預設情況下，這會將其變成備用：

sudo mdadm /dev/md0 --add /dev/sdb1

如果您有備用驅動器，則可以透過強制陣列的活動驅動器數量增加來增加它。有 3 個驅動器和 2 個預計是活動，您需要將活動計數增加到 3。

mdadm --grow /dev/md0 --raid-devices=3

raid 陣列驅動程式會注意到您「缺少」驅動器，然後尋找備用驅動器。找到備用驅動器，它會將其作為活動驅動器整合到陣列中。打開一個備用終端並讓這個相當粗糙的命令列在其中運行，以密切注意重新同步進度。請務必將其鍵入一行或使用換行符號 (\)，重建完成後，只需在終端機中鍵入 Ctrl-C 即可。

while true; do sleep 60; clear; sudo mdadm --detail /dev/md0; echo; cat /proc/mdstat; done

您的陣列現在將有兩個同步的活動驅動器，但由於沒有 3 個驅動器，因此它不會 100% 乾淨。刪除故障的驅動器，然後調整陣列大小。請注意，該--grow標誌有點用詞不當——它可能意味著任何一個成長或縮小：

sudo mdadm /dev/md0 --fail /dev/{failed drive}
sudo mdadm /dev/md0 --remove /dev/{failed drive}
sudo mdadm --grow /dev/md0 --raid-devices=2

關於錯誤，驅動器的連結問題（即 PATA/SATA 連接埠、電纜或驅動器連接器）不足以觸發熱備用的故障轉移，因為核心通常會切換到使用其他「良好」的驅動器驅動器，同時重置與「壞”驅動器的連結。我知道這一點是因為我運行一個 3 驅動器陣列，2 個熱驅動器，1 個備用驅動器，其中一個驅動器最近決定在日誌中吐出一點。當我測試陣列中的所有驅動器時，所有 3 個驅動器都通過了 SMART 測試的“長”版本，因此盤片、機械組件或板載控制器都不是問題 - 留下了片狀鏈路電纜或SATA端口壞。也許這就是你所看到的。嘗試將驅動器切換到不同的主機板端口，或使用不同的電纜，看看情況是否有所改善。

後續：我完成了將鏡像擴展至 3 個驅動器，失敗並從 md 陣列中移除了片狀驅動器，將電纜熱插拔為新驅動器（主機板支援此操作）並重新添加了驅動器。重新添加後，它立即開始重新同步驅動器。迄今為止，日誌中沒有出現任何錯誤儘管該驅動器被大量使用。所以，是的，驅動電纜可能會變得片狀。

Answer

這樣做只是將磁碟機放入陣列中，而沒有實際對其進行任何操作，即它是陣列的成員，但在陣列中不處於活動狀態。預設情況下，這會將其變成備用：

sudo mdadm /dev/md0 --add /dev/sdb1

如果您有備用驅動器，則可以透過強制陣列的活動驅動器數量增加來增加它。有 3 個驅動器和 2 個預計是活動，您需要將活動計數增加到 3。

mdadm --grow /dev/md0 --raid-devices=3

raid 陣列驅動程式會注意到您「缺少」驅動器，然後尋找備用驅動器。找到備用驅動器，它會將其作為活動驅動器整合到陣列中。打開一個備用終端並讓這個相當粗糙的命令列在其中運行，以密切注意重新同步進度。請務必將其鍵入一行或使用換行符號 (\)，重建完成後，只需在終端機中鍵入 Ctrl-C 即可。

while true; do sleep 60; clear; sudo mdadm --detail /dev/md0; echo; cat /proc/mdstat; done

您的陣列現在將有兩個同步的活動驅動器，但由於沒有 3 個驅動器，因此它不會 100% 乾淨。刪除故障的驅動器，然後調整陣列大小。請注意，該--grow標誌有點用詞不當——它可能意味著任何一個成長或縮小：

sudo mdadm /dev/md0 --fail /dev/{failed drive}
sudo mdadm /dev/md0 --remove /dev/{failed drive}
sudo mdadm --grow /dev/md0 --raid-devices=2

關於錯誤，驅動器的連結問題（即 PATA/SATA 連接埠、電纜或驅動器連接器）不足以觸發熱備用的故障轉移，因為核心通常會切換到使用其他「良好」的驅動器驅動器，同時重置與「壞”驅動器的連結。我知道這一點是因為我運行一個 3 驅動器陣列，2 個熱驅動器，1 個備用驅動器，其中一個驅動器最近決定在日誌中吐出一點。當我測試陣列中的所有驅動器時，所有 3 個驅動器都通過了 SMART 測試的“長”版本，因此盤片、機械組件或板載控制器都不是問題 - 留下了片狀鏈路電纜或SATA端口壞。也許這就是你所看到的。嘗試將驅動器切換到不同的主機板端口，或使用不同的電纜，看看情況是否有所改善。

後續：我完成了將鏡像擴展至 3 個驅動器，失敗並從 md 陣列中移除了片狀驅動器，將電纜熱插拔為新驅動器（主機板支援此操作）並重新添加了驅動器。重新添加後，它立即開始重新同步驅動器。迄今為止，日誌中沒有出現任何錯誤儘管該驅動器被大量使用。所以，是的，驅動電纜可能會變得片狀。

Question 2

我遇到了完全相同的問題，在我的例子中，我發現活動的 raid 磁碟在同步期間遇到了讀取錯誤。因此，新磁碟較新且已成功同步，因此被標記為備用。

您可能需要檢查 /var/log/messages 和其他系統日誌是否有錯誤。此外，檢查磁碟的 SMART 狀態也是一個好主意：
1) 執行簡短測試：

“smartctl -t 短 /dev/sda”

2）顯示測試結果：

“smartctl -l 自我測試 /dev/sda”

就我而言，這返回瞭如下內容：

=== READ SMART DATA SECTION 開始 ===
SMART 自檢日誌結構修訂號 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 延長離線已完成：讀取失敗 90% 7564 27134728
2 短時脫機已完成：讀取失敗90% 7467 1408449701

我必須啟動即時發行版，然後手動將資料從有缺陷的磁碟複製到新的（目前為「備用」）磁碟。

Answer

我遇到了完全相同的問題，在我的例子中，我發現活動的 raid 磁碟在同步期間遇到了讀取錯誤。因此，新磁碟較新且已成功同步，因此被標記為備用。

您可能需要檢查 /var/log/messages 和其他系統日誌是否有錯誤。此外，檢查磁碟的 SMART 狀態也是一個好主意：
1) 執行簡短測試：

“smartctl -t 短 /dev/sda”

2）顯示測試結果：

“smartctl -l 自我測試 /dev/sda”

就我而言，這返回瞭如下內容：

=== READ SMART DATA SECTION 開始 ===
SMART 自檢日誌結構修訂號 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 延長離線已完成：讀取失敗 90% 7564 27134728
2 短時脫機已完成：讀取失敗90% 7467 1408449701

我必須啟動即時發行版，然後手動將資料從有缺陷的磁碟複製到新的（目前為「備用」）磁碟。

Question 3

我遇到了完全相同的問題，並且一直認為我想要重新添加到陣列中的第二個磁碟有錯誤。但我原來的磁碟有讀取錯誤。

您可以使用檢查smartctl -t short /dev/sdX並在幾分鐘後使用查看結果smartctl -l selftest /dev/sdX。對我來說，它看起來像這樣：

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       20%     25151         734566647

我嘗試用這個來修復它們手動的。蠻好玩的：-）。我知道您已經檢查了兩個磁碟是否有錯誤，但我認為您的問題是，仍在 md 陣列中的磁碟有讀取錯誤，因此添加第二個磁碟失敗。

更新

您應該另外運行smartctl -a /dev/sdX 如果您看到 Current_Pending_Sector > 0 則有問題

197 Current_Pending_Sector 0x0012 098 098 000 Old_age 總是 - 69

對我來說，這絕對是問題所在，我只是為了測試而從 raid 中刪除了磁碟，並且由於讀取失敗而無法完成重新同步。同步中止了。當我檢查仍在 raid 陣列中的磁碟時，smartctl 報告了問題。

我可以用上面的手冊修復它們，並看到待處理扇區的數量減少了。但有很多，這是一個漫長而無聊的過程，所以我使用我的備份並在不同的伺服器上恢復資料。

由於您沒有機會使用 SMART，我猜您的自我檢查沒有顯示那些損壞的扇區。

對我來說，這是一個教訓：在從陣列中刪除磁碟之前檢查磁碟。

Answer

我遇到了完全相同的問題，並且一直認為我想要重新添加到陣列中的第二個磁碟有錯誤。但我原來的磁碟有讀取錯誤。

您可以使用檢查smartctl -t short /dev/sdX並在幾分鐘後使用查看結果smartctl -l selftest /dev/sdX。對我來說，它看起來像這樣：

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       20%     25151         734566647

我嘗試用這個來修復它們手動的。蠻好玩的：-）。我知道您已經檢查了兩個磁碟是否有錯誤，但我認為您的問題是，仍在 md 陣列中的磁碟有讀取錯誤，因此添加第二個磁碟失敗。

更新

您應該另外運行smartctl -a /dev/sdX 如果您看到 Current_Pending_Sector > 0 則有問題

197 Current_Pending_Sector 0x0012 098 098 000 Old_age 總是 - 69

對我來說，這絕對是問題所在，我只是為了測試而從 raid 中刪除了磁碟，並且由於讀取失敗而無法完成重新同步。同步中止了。當我檢查仍在 raid 陣列中的磁碟時，smartctl 報告了問題。

我可以用上面的手冊修復它們，並看到待處理扇區的數量減少了。但有很多，這是一個漫長而無聊的過程，所以我使用我的備份並在不同的伺服器上恢復資料。

由於您沒有機會使用 SMART，我猜您的自我檢查沒有顯示那些損壞的扇區。

對我來說，這是一個教訓：在從陣列中刪除磁碟之前檢查磁碟。

Question 4

更新（2015 年 5 月 24 日）：三年後，我調查了 RAID 1 陣列降級的真正原因。

長話短說： 其中一個驅動器壞了，我沒有註意到這一點，因為我只對好的驅動器進行了全面的表面測試。

三年前，我沒想過要檢查任何有關 I/O 問題的日誌。如果我想檢查一下，當我放棄重建數組/var/log/syslog時，我會看到這樣的東西：mdadm

May 24 14:08:32 node51 kernel: [51887.853786] sd 8:0:0:0: [sdi] Unhandled sense code
May 24 14:08:32 node51 kernel: [51887.853794] sd 8:0:0:0: [sdi]
May 24 14:08:32 node51 kernel: [51887.853798] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 24 14:08:32 node51 kernel: [51887.853802] sd 8:0:0:0: [sdi]
May 24 14:08:32 node51 kernel: [51887.853805] Sense Key : Medium Error [current]
May 24 14:08:32 node51 kernel: [51887.853812] sd 8:0:0:0: [sdi]
May 24 14:08:32 node51 kernel: [51887.853815] Add. Sense: Unrecovered read error
May 24 14:08:32 node51 kernel: [51887.853819] sd 8:0:0:0: [sdi] CDB:
May 24 14:08:32 node51 kernel: [51887.853822] Read(10): 28 00 00 1b 6e 00 00 00 01 00
May 24 14:08:32 node51 kernel: [51887.853836] end_request: critical medium error, dev sdi, sector 14381056
May 24 14:08:32 node51 kernel: [51887.853849] Buffer I/O error on device sdi, logical block 1797632

為了在日誌中取得該輸出，我使用以下命令來尋找第一個有問題的 LBA（在我的範例中為 14381058）：

root@node51 [~]# dd if=/dev/sdi of=/dev/zero bs=512 count=1 skip=14381058
dd: error reading ‘/dev/sdi’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 7.49287 s, 0.0 kB/s

難怪md放棄了！它無法從損壞的驅動器重建陣列。

新技術（更好的smartmontools硬體相容性？）讓我能夠聰明的驅動器中的信息，包括最後五個錯誤（到目前為止共有 1393 個錯誤）：

root@node51 [~]# smartctl -a /dev/sdi
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-43-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K3000
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    ML2220FA040K9E
LU WWN Device Id: 5 000cca 36ac1d394
Firmware Version: ML6OA800
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5940 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun May 24 14:13:35 2015 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (21438) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 358) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   136   136   054    Pre-fail  Offline      -       93
  3 Spin_Up_Time            0x0007   172   172   024    Pre-fail  Always       -       277 (Average 362)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       174
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       8
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       22419
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       161
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       900
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       900
194 Temperature_Celsius     0x0002   127   127   000    Old_age   Always       -       47 (Min/Max 19/60)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       8
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       30
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       2

SMART Error Log Version: 1
ATA Error Count: 1393 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1393 occurred at disk power-on lifetime: 22419 hours (934 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 02 70 db 00  Error: UNC 6 sectors at LBA = 0x00db7002 = 14381058

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 00 70 db 40 00   1d+03:59:34.096  READ DMA EXT
  25 00 08 00 70 db 40 00   1d+03:59:30.334  READ DMA EXT
  b0 d5 01 09 4f c2 00 00   1d+03:57:59.057  SMART READ LOG
  b0 d5 01 06 4f c2 00 00   1d+03:57:58.766  SMART READ LOG
  b0 d5 01 01 4f c2 00 00   1d+03:57:58.476  SMART READ LOG

Error 1392 occurred at disk power-on lifetime: 22419 hours (934 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 02 70 db 00  Error: UNC 6 sectors at LBA = 0x00db7002 = 14381058

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 00 70 db 40 00   1d+03:59:30.334  READ DMA EXT
  b0 d5 01 09 4f c2 00 00   1d+03:57:59.057  SMART READ LOG
  b0 d5 01 06 4f c2 00 00   1d+03:57:58.766  SMART READ LOG
  b0 d5 01 01 4f c2 00 00   1d+03:57:58.476  SMART READ LOG
  b0 d5 01 00 4f c2 00 00   1d+03:57:58.475  SMART READ LOG

Error 1391 occurred at disk power-on lifetime: 22419 hours (934 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 02 70 db 00  Error: UNC 6 sectors at LBA = 0x00db7002 = 14381058

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 00 70 db 40 00   1d+03:56:28.228  READ DMA EXT
  25 00 08 00 70 db 40 00   1d+03:56:24.549  READ DMA EXT
  25 00 08 00 70 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 10 f0 71 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 f0 00 71 db 40 00   1d+03:56:06.710  READ DMA EXT

Error 1390 occurred at disk power-on lifetime: 22419 hours (934 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 02 70 db 00  Error: UNC 6 sectors at LBA = 0x00db7002 = 14381058

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 00 70 db 40 00   1d+03:56:24.549  READ DMA EXT
  25 00 08 00 70 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 10 f0 71 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 f0 00 71 db 40 00   1d+03:56:06.710  READ DMA EXT
  25 00 10 f0 70 db 40 00   1d+03:56:06.687  READ DMA EXT

Error 1389 occurred at disk power-on lifetime: 22419 hours (934 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 02 70 db 00  Error: UNC 6 sectors at LBA = 0x00db7002 = 14381058

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 00 70 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 10 f0 71 db 40 00   1d+03:56:06.711  READ DMA EXT
  25 00 f0 00 71 db 40 00   1d+03:56:06.710  READ DMA EXT
  25 00 10 f0 70 db 40 00   1d+03:56:06.687  READ DMA EXT
  25 00 f0 00 70 db 40 00   1d+03:56:03.026  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     21249         14381058

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

啊……這樣就可以了。

現在，我透過三個簡單的步驟解決了這個問題：

三年內成為系統管理員。
檢查日誌。
回到超級用戶並嘲笑我三年前的做法。

更新（2015 年 7 月 19 日）：對於任何好奇的人來說，驅動器最終耗盡了需要重新映射的扇區：

root@node51 [~]# smartctl -a /dev/sdg
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-43-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K3000
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    ML2220FA040K9E
LU WWN Device Id: 5 000cca 36ac1d394
Firmware Version: ML6OA800
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5940 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Jul 19 14:00:33 2015 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
                                        was aborted by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 117) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (21438) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 358) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       2
  2 Throughput_Performance  0x0005   136   136   054    Pre-fail  Offline      -       93
  3 Spin_Up_Time            0x0007   163   163   024    Pre-fail  Always       -       318 (Average 355)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       181
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1978
  7 Seek_Error_Rate         0x000b   086   086   067    Pre-fail  Always       -       1245192
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       23763
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       167
192 Power-Off_Retract_Count 0x0032   092   092   000    Old_age   Always       -       10251
193 Load_Cycle_Count        0x0012   092   092   000    Old_age   Always       -       10251
194 Temperature_Celsius     0x0002   111   111   000    Old_age   Always       -       54 (Min/Max 19/63)
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       2927
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       33
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       2

SMART Error Log Version: 1
ATA Error Count: 2240 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2240 occurred at disk power-on lifetime: 23763 hours (990 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 f0 18 0f 2f 00  Error: IDNF 240 sectors at LBA = 0x002f0f18 = 3084056

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 18 0f 2f 40 00      00:25:01.942  WRITE DMA EXT
  35 00 f0 28 0e 2f 40 00      00:25:01.168  WRITE DMA EXT
  35 00 f0 38 0d 2f 40 00      00:25:01.157  WRITE DMA EXT
  35 00 f0 48 0c 2f 40 00      00:25:01.147  WRITE DMA EXT
  35 00 f0 58 0b 2f 40 00      00:25:01.136  WRITE DMA EXT

Error 2239 occurred at disk power-on lifetime: 23763 hours (990 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 5a 4e f7 2e 00  Error: IDNF 90 sectors at LBA = 0x002ef74e = 3077966

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 b8 f6 2e 40 00      00:24:57.967  WRITE DMA EXT
  35 00 f0 c8 f5 2e 40 00      00:24:57.956  WRITE DMA EXT
  35 00 f0 d8 f4 2e 40 00      00:24:57.945  WRITE DMA EXT
  35 00 f0 e8 f3 2e 40 00      00:24:57.934  WRITE DMA EXT
  35 00 f0 f8 f2 2e 40 00      00:24:57.924  WRITE DMA EXT

Error 2238 occurred at disk power-on lifetime: 23763 hours (990 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 40 a8 c6 2e 00  Error: IDNF 64 sectors at LBA = 0x002ec6a8 = 3065512

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 f8 c5 2e 40 00      00:24:49.444  WRITE DMA EXT
  35 00 f0 08 c5 2e 40 00      00:24:49.433  WRITE DMA EXT
  35 00 f0 18 c4 2e 40 00      00:24:49.422  WRITE DMA EXT
  35 00 f0 28 c3 2e 40 00      00:24:49.412  WRITE DMA EXT
  35 00 f0 38 c2 2e 40 00      00:24:49.401  WRITE DMA EXT

Error 2237 occurred at disk power-on lifetime: 23763 hours (990 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 ea be ba 2e 00  Error: IDNF 234 sectors at LBA = 0x002ebabe = 3062462

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 b8 ba 2e 40 00      00:24:39.263  WRITE DMA EXT
  35 00 f0 c8 b9 2e 40 00      00:24:38.885  WRITE DMA EXT
  35 00 f0 d8 b8 2e 40 00      00:24:38.874  WRITE DMA EXT
  35 00 f0 e8 b7 2e 40 00      00:24:38.862  WRITE DMA EXT
  35 00 f0 f8 b6 2e 40 00      00:24:38.852  WRITE DMA EXT

Error 2236 occurred at disk power-on lifetime: 23763 hours (990 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 86 c2 2a 2e 00  Error: IDNF 134 sectors at LBA = 0x002e2ac2 = 3025602

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 58 2a 2e 40 00      00:24:25.605  WRITE DMA EXT
  35 00 f0 68 29 2e 40 00      00:24:25.594  WRITE DMA EXT
  35 00 f0 78 28 2e 40 00      00:24:25.583  WRITE DMA EXT
  35 00 f0 88 27 2e 40 00      00:24:25.572  WRITE DMA EXT
  35 00 f0 98 26 2e 40 00      00:24:25.561  WRITE DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       50%     23763         869280
# 2  Extended offline    Completed without error       00%     22451         -
# 3  Short offline       Completed without error       00%     22439         -
# 4  Extended offline    Completed: read failure       90%     21249         14381058
1 of 2 failed self-tests are outdated by newer successful extended offline self-test # 2

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Answer