automatische Trennung des RAID-Arrays

automatische Trennung des RAID-Arrays

Ein per USB an mein System (Debian 7) angeschlossenes RAID-Array trennt sich häufig ohne ersichtlichen Grund. Beim ersten Verbinden mit dem System wird das Gerät erkannt und das Array kann ganz normal initialisiert, gemountet, gelesen, beschrieben und demontiert werden. Nach kurzer Zeit (Minuten bis Stunden) verschwinden die Komponentenfestplatten jedoch ausnahmslos aus /devder Liste fdisk -lund bleiben unzugänglich, bis das Gerät (d. h. das RAID-Gehäuse) aus- und wieder eingeschaltet wird.

Der Ausgabe von nach zu urteilen /var/log/messages, liegt das Problem anscheinend beim Zurücksetzen des USB-Geräts. Nach dem unprovozierten Zurücksetzen versucht das System wiederholt, das Gerät erneut anzuschließen, weist ihm eine höhere USB-Gerätenummer zu und bricht schließlich nach fünf Rücksetzversuchen ab.

Was ist dafür verantwortlich, dass sich das Gerät selbst zurücksetzt? Ich vermute, der Fehler liegt beim USB-Controller. Wie kann das unerwünschte Auto-Reset-Verhalten vermieden werden?

Die folgenden Auszüge /var/log/messageszeigen das typische Verhalten des Arrays nach der Initialisierung und dem anschließenden Reset:

Erstverbindung:

Jun 19 19:38:51 hostname kernel: [406823.308418] usb 1-1.3: new high-speed USB device number 24 using ehci_hcd
Jun 19 19:38:51 hostname kernel: [406823.401317] usb 1-1.3: New USB device found, idVendor=152d, idProduct=2351
Jun 19 19:38:51 hostname kernel: [406823.401330] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Jun 19 19:38:51 hostname kernel: [406823.401338] usb 1-1.3: Product: USB to ATA/ATAPI Bridge
Jun 19 19:38:51 hostname kernel: [406823.401345] usb 1-1.3: Manufacturer: JMicron
Jun 19 19:38:51 hostname kernel: [406823.401350] usb 1-1.3: SerialNumber: DCC3..........
Jun 19 19:38:51 hostname kernel: [406823.402469] scsi16 : usb-storage 1-1.3:1.0
Jun 19 19:38:51 hostname mtp-probe: checking bus 1, device 24: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3"
Jun 19 19:38:51 hostname mtp-probe: bus: 1, device: 24 was not an MTP device
Jun 19 19:38:52 hostname kernel: [406824.400835] scsi 16:0:0:0: Direct-Access     WDC WD20 EFRX-68AX9N0          PQ: 0 ANSI: 5
Jun 19 19:38:52 hostname kernel: [406824.401450] scsi 16:0:0:1: Direct-Access     WDC WD20 EFRX-68AX9N0          PQ: 0 ANSI: 5
Jun 19 19:38:52 hostname kernel: [406824.402433] sd 16:0:0:0: Attached scsi generic sg2 type 0
Jun 19 19:38:52 hostname kernel: [406824.402583] sd 16:0:0:1: Attached scsi generic sg3 type 0
Jun 19 19:38:52 hostname kernel: [406824.662288] sd 16:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Jun 19 19:38:52 hostname kernel: [406824.662789] sd 16:0:0:1: [sde] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Jun 19 19:38:52 hostname kernel: [406824.663573] sd 16:0:0:0: [sdd] Write Protect is off
Jun 19 19:38:52 hostname kernel: [406824.664356] sd 16:0:0:1: [sde] Write Protect is off
Jun 19 19:38:52 hostname kernel: [406824.665087] sd 16:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Jun 19 19:38:52 hostname kernel: [406824.666295] sd 16:0:0:1: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Jun 19 19:38:52 hostname kernel: [406824.705355]  sdd: sdd1
Jun 19 19:38:52 hostname kernel: [406824.740148]  sde: sde1
Jun 19 19:38:52 hostname kernel: [406824.743667] sd 16:0:0:0: [sdd] Attached SCSI disk
Jun 19 19:38:52 hostname kernel: [406824.746756] sd 16:0:0:1: [sde] Attached SCSI disk

Beim Zurücksetzen:

Jun 19 20:05:25 hostname kernel: [408416.587392] usb 1-1.3: reset high-speed USB device number 24 using ehci_hcd
Jun 19 20:05:25 hostname kernel: [408416.679688] usb 1-1.3: device firmware changed
Jun 19 20:05:25 hostname kernel: [408416.679852] sd 16:0:0:0: Device offlined - not ready after error recovery
Jun 19 20:05:25 hostname kernel: [408416.679942] usb 1-1.3: USB disconnect, device number 24
Jun 19 20:05:25 hostname kernel: [408416.767366] usb 1-1.3: new high-speed USB device number 25 using ehci_hcd
Jun 19 20:05:25 hostname kernel: [408416.860214] usb 1-1.3: New USB device found, idVendor=152d, idProduct=2351
Jun 19 20:05:25 hostname kernel: [408416.860225] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Jun 19 20:05:25 hostname kernel: [408416.860232] usb 1-1.3: Product: USB to ATA/ATAPI Bridge
Jun 19 20:05:25 hostname kernel: [408416.860237] usb 1-1.3: Manufacturer: JMicron
Jun 19 20:05:25 hostname kernel: [408416.860241] usb 1-1.3: SerialNumber: 152D.............
Jun 19 20:05:25 hostname kernel: [408416.861634] scsi17 : usb-storage 1-1.3:1.0
Jun 19 20:05:25 hostname mtp-probe: checking bus 1, device 25: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3"
Jun 19 20:05:25 hostname mtp-probe: bus: 1, device: 25 was not an MTP device
Jun 19 20:05:47 hostname kernel: [408438.591825] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:05:57 hostname kernel: [408448.750695] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:06:02 hostname kernel: [408453.911854] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:06:03 hostname kernel: [408454.359608] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:06:03 hostname kernel: [408454.807394] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:06:04 hostname kernel: [408455.295168] usb 1-1.3: reset high-speed USB device number 25 using ehci_hcd
Jun 19 20:06:04 hostname kernel: [408455.711201] scsi 17:0:0:0: Device offlined - not ready after error recovery
Jun 19 20:06:04 hostname kernel: [408455.711448] usb 1-1.3: USB disconnect, device number 25
Jun 19 20:06:04 hostname kernel: [408455.786917] usb 1-1.3: new high-speed USB device number 26 using ehci_hcd
Jun 19 20:06:05 hostname kernel: [408456.234679] usb 1-1.3: new high-speed USB device number 27 using ehci_hcd
Jun 19 20:06:05 hostname kernel: [408456.686418] usb 1-1.3: new high-speed USB device number 28 using ehci_hcd
Jun 19 20:06:06 hostname kernel: [408457.174018] usb 1-1.3: new high-speed USB device number 29 using ehci_hcd

Der Zuweisungsversuch des USB-Geräts Nummer 29 ist der letzte Versuch, das Disk-Array wiederzubeleben. Um das Gerät an diesem Punkt wieder anzuschließen, ist ein Aus- und Wiedereinschalten des RAID-Gehäuses oder eine Trennung/Wiederherstellung der Verbindung zwingend erforderlich.

Update: Vor kurzem schien das Gerät kurz nach einer Neusynchronisierung zurückgesetzt zu werden. Ich bin nicht sicher, ob das hilfreich ist, aber die darin enthaltenen Fehlermeldungen /var/log/messagessind unten aufgeführt:

    Jul  5 02:55:02 hdac kernel: [135732.758796] md: md0: resync done.
Jul  5 03:12:04 hdac kernel: [136754.176970] usb 1-1.3: reset high-speed USB device number 10 using ehci_hcd
Jul  5 03:12:04 hdac kernel: [136754.269537] usb 1-1.3: device firmware changed
Jul  5 03:12:04 hdac kernel: [136754.269995] usb 1-1.3: USB disconnect, device number 10
Jul  5 03:12:04 hdac kernel: [136754.269998] sd 6:0:0:1: Device offlined - not ready after error recovery
Jul  5 03:12:04 hdac kernel: [136754.348882] usb 1-1.3: new high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:04 hdac kernel: [136754.442408] usb 1-1.3: New USB device found, idVendor=152d, idProduct=2351
Jul  5 03:12:04 hdac kernel: [136754.442419] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Jul  5 03:12:04 hdac kernel: [136754.442425] usb 1-1.3: Product: USB to ATA/ATAPI Bridge
Jul  5 03:12:04 hdac kernel: [136754.442430] usb 1-1.3: Manufacturer: JMicron
Jul  5 03:12:04 hdac kernel: [136754.442434] usb 1-1.3: SerialNumber: 152D....
Jul  5 03:12:04 hdac kernel: [136754.443581] scsi7 : usb-storage 1-1.3:1.0
Jul  5 03:12:04 hdac mtp-probe: checking bus 1, device 11: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3"
Jul  5 03:12:04 hdac mtp-probe: bus: 1, device: 11 was not an MTP device
Jul  5 03:12:26 hdac kernel: [136776.197464] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:36 hdac kernel: [136786.356068] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:41 hdac kernel: [136791.517429] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:42 hdac kernel: [136791.965221] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:42 hdac kernel: [136792.412974] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:43 hdac kernel: [136792.900701] usb 1-1.3: reset high-speed USB device number 11 using ehci_hcd
Jul  5 03:12:43 hdac kernel: [136793.316865] scsi 7:0:0:0: Device offlined - not ready after error recovery
Jul  5 03:12:43 hdac kernel: [136793.317120] usb 1-1.3: USB disconnect, device number 11
Jul  5 03:12:43 hdac kernel: [136793.388401] usb 1-1.3: new high-speed USB device number 12 using ehci_hcd
Jul  5 03:12:44 hdac kernel: [136793.836359] usb 1-1.3: new high-speed USB device number 13 using ehci_hcd
Jul  5 03:12:44 hdac kernel: [136794.283982] usb 1-1.3: new high-speed USB device number 14 using ehci_hcd
Jul  5 03:12:45 hdac kernel: [136794.776418] usb 1-1.3: new high-speed USB device number 15 using ehci_hcd
Jul  5 07:46:51 hdac rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2234" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Jul  5 07:46:51 hdac kernel: [153233.051604] md: super_written gets error=-19, uptodate=0
Jul  5 07:46:51 hdac kernel: [153233.080561] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.082218] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.088909] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.088972] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089030] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089084] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089139] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089193] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089247] lost page write due to I/O error on md0
Jul  5 07:46:51 hdac kernel: [153233.089300] lost page write due to I/O error on md0
Jul  5 07:46:52 hdac kernel: [153233.308299] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:02 hdac kernel: [179450.340233] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:02 hdac kernel: [179450.340549] quiet_error: 101 callbacks suppressed
Jul  5 15:04:02 hdac kernel: [179450.340566] lost page write due to I/O error on md0
Jul  5 15:04:02 hdac kernel: [179450.340774] lost page write due to I/O error on md0
Jul  5 15:04:03 hdac kernel: [179450.541182] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:08 hdac kernel: [179455.698562] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:08 hdac kernel: [179455.699059] lost page write due to I/O error on md0
Jul  5 15:04:08 hdac kernel: [179455.902387] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:32 hdac kernel: [179479.848336] md: super_written gets error=-19, uptodate=0
Jul  5 15:04:32 hdac kernel: [179479.848803] lost page write due to I/O error on md0
Jul  5 15:04:32 hdac kernel: [179479.848832] lost page write due to I/O error on md0
Jul  5 15:04:32 hdac kernel: [179480.049689] md: super_written gets error=-19, uptodate=0
Jul  5 15:20:11 hdac kernel: [180418.849041] md: super_written gets error=-19, uptodate=0
Jul  5 15:20:11 hdac kernel: [180418.852710] lost page write due to I/O error on md0
Jul  5 15:20:12 hdac kernel: [180419.056405] md: super_written gets error=-19, uptodate=0

verwandte Informationen