Seguimiento a¿Cómo puedo comprobar si hay bloques defectuosos en un volumen físico LVM?
El título lo resume en gran medida. Básicamente, tengo una caja particionada con una /boot
partición normal y luego un volumen físico LVM que llena el resto del disco. En LVM tengo un grupo de volúmenes con una partición raíz, una /home
partición y una partición de intercambio.
Cuando LVM crea los nodos del dispositivo en /dev/mapper
, crea bien las particiones de intercambio y de inicio. Sin embargo, normalmente se bloquea al intentar crear el nodo del dispositivo raíz. Esto sucede desde un CD en vivo ( pvscan; vgscan; vgchange -ay
es lo que usé, IIRC) y también desde el disco RAM inicial, lo que impide que la caja arranque. También lo intenté desde el shell de recuperación initrd ( lvm pvscan; lvm vgscan; lvm vgchange -ay
es lo que usé, IIRC), que también falla de la misma manera.
A veces, vgchange -ay
en realidad crea el nodo del dispositivo raíz (después de un retraso muy largo) pero nunca sale, lo que me deja eliminarlo manualmente. Cuando esto sucede intento montar el dispositivo, pero siempre se cuelga indefinidamente. Tenga en cuenta que mientras se ejecutan ambos comandos, la consola muestra un montón de mensajes sobre el comando fallido "LEER DMA" o algo así.
He corrido smartctl -a /dev/sda
un par de veces. Cada vez muestra una buena cantidad de errores sobre bloques defectuosos (IIRC), pero finalmente dice que la unidad está en buenas condiciones.
he puestouna papelerade dmesg
la máquina afectada. Los registros provienen del inicio de un Live CD de Arch Linux y luego de la ejecución de pvscan; vgscan; vgchange -ay
. vgchange -ay
Colgué para siempre esta vez, y finalmente lo maté. Aquí está el final de dmesg
, para la posteridad (y por eso yo [no uso un contenedor de pasta]2):
[ 46.332920] end_request: I/O error, dev fd0, sector 0
[ 58.503496] end_request: I/O error, dev fd0, sector 0
[167992.304649] EXT4-fs (sda1): recovery complete
[167992.304660] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168092.874016] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168163.318923] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168459.839738] end_request: I/O error, dev fd0, sector 0
[168472.010337] end_request: I/O error, dev fd0, sector 0
[168614.642035] bio: create slab <bio-2> at 2
[168630.045526] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168630.045649] ata1.00: BMDMA stat 0x65
[168630.045710] ata1.00: failed command: READ DMA
[168630.045787] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168630.046006] ata1.00: status: { DRDY ERR }
[168630.046071] ata1.00: error: { UNC }
[168630.066286] ata1.00: configured for UDMA/100
[168630.079493] ata1.01: configured for UDMA/66
[168630.079514] sd 0:0:0:0: [sda] Unhandled sense code
[168630.079517] sd 0:0:0:0: [sda]
[168630.079520] Result: hostbyte=0x00 driverbyte=0x08
[168630.079523] sd 0:0:0:0: [sda]
[168630.079525] Sense Key : 0x3 [current] [descriptor]
[168630.079530] Descriptor sense data with sense descriptors (in hex):
[168630.079532] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168630.079544] 06 10 10 00
[168630.079549] sd 0:0:0:0: [sda]
[168630.079551] ASC=0x11 ASCQ=0x4
[168630.079554] sd 0:0:0:0: [sda] CDB:
[168630.079556] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168630.079567] end_request: I/O error, dev sda, sector 101715968
[168630.079665] Buffer I/O error on device dm-3, logical block 0
[168630.079775] ata1: EH complete
[168634.564062] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168634.564165] ata1.00: BMDMA stat 0x64
[168634.564225] ata1.00: failed command: READ DMA
[168634.564301] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168634.564527] ata1.00: status: { DRDY ERR }
[168634.564592] ata1.00: error: { IDNF }
[168634.584336] ata1.00: configured for UDMA/100
[168634.597559] ata1.01: configured for UDMA/66
[168634.597578] ata1: EH complete
[168639.087353] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168639.087462] ata1.00: BMDMA stat 0x64
[168639.087521] ata1.00: failed command: READ DMA
[168639.087596] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168639.087822] ata1.00: status: { DRDY ERR }
[168639.087886] ata1.00: error: { IDNF }
[168639.105791] ata1.00: configured for UDMA/100
[168639.118999] ata1.01: configured for UDMA/66
[168639.119017] ata1: EH complete
[168645.896986] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168645.897095] ata1.00: BMDMA stat 0x64
[168645.897155] ata1.00: failed command: READ DMA
[168645.900373] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:00:83:0f:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168645.906936] ata1.00: status: { DRDY ERR }
[168645.910263] ata1.00: error: { UNC }
[168645.931315] ata1.00: configured for UDMA/100
[168645.944504] ata1.01: configured for UDMA/66
[168645.944525] sd 0:0:0:0: [sda] Unhandled sense code
[168645.944529] sd 0:0:0:0: [sda]
[168645.944531] Result: hostbyte=0x00 driverbyte=0x08
[168645.944534] sd 0:0:0:0: [sda]
[168645.944537] Sense Key : 0x3 [current] [descriptor]
[168645.944541] Descriptor sense data with sense descriptors (in hex):
[168645.944543] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168645.944554] 06 10 0f 83
[168645.944559] sd 0:0:0:0: [sda]
[168645.944561] ASC=0x11 ASCQ=0x4
[168645.944564] sd 0:0:0:0: [sda] CDB:
[168645.944566] cdb[0]=0x28: 28 00 06 10 0f 80 00 00 08 00
[168645.944578] end_request: I/O error, dev sda, sector 101715843
[168645.947946] Buffer I/O error on device dm-2, logical block 10485744
[168645.951439] ata1: EH complete
[168650.445911] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168650.449275] ata1.00: BMDMA stat 0x65
[168650.452579] ata1.00: failed command: READ DMA
[168650.455873] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168650.462537] ata1.00: status: { DRDY ERR }
[168650.465714] ata1.00: error: { UNC }
[168650.486063] ata1.00: configured for UDMA/100
[168650.499326] ata1.01: configured for UDMA/66
[168650.499344] sd 0:0:0:0: [sda] Unhandled sense code
[168650.499348] sd 0:0:0:0: [sda]
[168650.499350] Result: hostbyte=0x00 driverbyte=0x08
[168650.499353] sd 0:0:0:0: [sda]
[168650.499355] Sense Key : 0x3 [current] [descriptor]
[168650.499360] Descriptor sense data with sense descriptors (in hex):
[168650.499362] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168650.499373] 06 10 10 00
[168650.499378] sd 0:0:0:0: [sda]
[168650.499380] ASC=0x11 ASCQ=0x4
[168650.499383] sd 0:0:0:0: [sda] CDB:
[168650.499385] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168650.499396] end_request: I/O error, dev sda, sector 101715968
[168650.502757] Buffer I/O error on device dm-3, logical block 0
[168650.506189] ata1: EH complete
[168798.816025] usb 9-2: new high-speed USB device number 2 using ehci-pci
Este es solo el final del registro, donde comenzaron los errores, porque alcancé el límite de publicaciones. Para verlo todo, mira el Pastebin.
Disculpas por no dar información específica, pero no estoy frente al cuadro afectado en este momento.
Respuesta1
Según la información adicional que proporcionaste, parece que tienes un disco defectuoso (bloques defectuosos). Puede intentar solucionar estos problemas si lo desea, pero consideraría seriamente reemplazar la unidad.
Si desea solucionar el problema, básicamente tendrá que encontrar las extensiones físicas de LVM que se encuentran encima de los bloques defectuosos y agregar esas extensiones físicas a un volumen lógico que no debe usarse.
De hecho, hay una cadena de correo electrónico bastante reciente en la lista de correo de linux-lvm sobre este tema (había leído toda la cadena, contiene mucha información):
https://www.redhat.com/archives/linux-lvm/2012-November/msg00033.html
En este mensaje específico, parece que alguien creó un script en Python para ayudar con la tarea:
https://www.redhat.com/archives/linux-lvm/2012-November/msg00038.html
Después de haber ayudado a personas en tales situaciones (donde al menos Internet funcionaba), utilicé el script adjunto para ayudar a encontrar LV y archivos afectados.
#!/usr/bin/python
# Identify partition, LV, file containing a sector
# Copyright (C) 2010,2012 Stuart D. Gathman
# Shared under GNU Public License v2 or later
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
import sys
from subprocess import Popen,PIPE
ID_LVM = 0x8e
ID_LINUX = 0x83
ID_EXT = 0x05
ID_RAID = 0xfd
def idtoname(id):
if id == ID_LVM: return "Linux LVM"
if id == ID_LINUX: return "Linux Filesystem"
if id == ID_EXT: return "Extended Partition"
if id == ID_RAID: return "Software RAID"
return hex(id)
class Segment(object):
__slots__ = ('pe1st','pelst','lvpath','le1st','lelst')
def __init__(self,pe1st,pelst):
self.pe1st = pe1st;
self.pelst = pelst;
def __str__(self):
return "Seg:%d-%d:%s:%d-%d" % (
self.pe1st,self.pelst,self.lvpath,self.le1st,self.lelst)
def cmdoutput(cmd):
p = Popen(cmd, shell=True, stdout=PIPE)
try:
for ln in p.stdout:
yield ln
finally:
p.stdout.close()
p.wait()
def icheck(fs,blk):
"Return inum from block number, or 0 if free space."
for ln in cmdoutput("debugfs -R 'icheck %d' '%s' 2>/dev/null"%(blk,fs)):
b,i = ln.strip().split(None,1)
if not b[0].isdigit(): continue
if int(b) == blk:
if i.startswith('<'):
return 0
return int(i)
raise ValueError('%s: invalid block: %d'%(fs,blk))
def ncheck(fs,inum):
"Return filename from inode number, or None if not linked."
for ln in cmdoutput("debugfs -R 'ncheck %d' '%s' 2>/dev/null"%(inum,fs)):
i,n = ln.strip().split(None,1)
if not i[0].isdigit(): continue
if int(i) == inum:
return n
return None
def blkid(fs):
"Return dictionary of block device attributes"
d = {}
for ln in cmdoutput("blkid -o export '%s'"%fs):
k,v = ln.strip().split('=',1)
d[k] = v
return d
def getpvmap(pv):
pe_start = 192 * 2
pe_size = None
seg = None
segs = []
for ln in cmdoutput("pvdisplay --units k -m %s"%pv):
a = ln.strip().split()
if not a: continue
if a[0] == 'Physical' and a[4].endswith(':'):
pe1st = int(a[2])
pelst = int(a[4][:-1])
seg = Segment(pe1st,pelst)
elif seg and a[0] == 'Logical':
if a[1] == 'volume':
seg.lvpath = a[2]
elif a[1] == 'extents':
seg.le1st = int(a[2])
seg.lelst = int(a[4])
segs.append(seg)
elif a[0] == 'PE' and a[1] == 'Size':
if a[2] == "(KByte)":
pe_size = int(a[3]) * 2
elif a[3] == 'KiB':
pe_size = int(float(a[2])) * 2
if segs:
for ln in cmdoutput("pvs --units k -o+pe_start %s"%pv):
a = ln.split()
if a[0] == pv:
lst = a[-1]
if lst.lower().endswith('k'):
pe_start = int(float(lst[:-1]))*2
return pe_start,pe_size,segs
return None
def findlv(pv,sect):
res = getpvmap(pv)
if not res: return None
pe_start,pe_size,m = res
if sect < pe_start:
raise Exception("Bad sector in PV metadata area")
pe = int((sect - pe_start)/pe_size)
pebeg = pe * pe_size + pe_start
peoff = sect - pebeg
for s in m:
if s.pe1st <= pe <= s.pelst:
le = s.le1st + pe - s.pe1st
return s.lvpath,le * pe_size + peoff
def getmdmap():
with open('/proc/mdstat','rt') as fp:
m = []
for ln in fp:
if ln.startswith('md'):
a = ln.split(':')
raid = a[0].strip()
devs = []
a = a[1].split()
for d in a[2:]:
devs.append(d.split('[')[0])
m.append((raid,devs))
return m
def parse_sfdisk(s):
for ln in s:
try:
part,desc = ln.split(':')
if part.startswith('/dev/'):
d = {}
for p in desc.split(','):
name,val = p.split('=')
name = name.strip()
if name.lower() == 'id':
d[name] = int(val,16)
else:
d[name] = int(val)
yield part.strip(),d
except ValueError:
continue
def findpart(wd,lba):
s = cmdoutput("sfdisk -d %s"%wd)
parts = [ (part,d['start'],d['size'],d['Id']) for part,d in parse_sfdisk(s) ]
for part,start,sz,Id in parts:
if Id == ID_EXT: continue
if start <= lba < start + sz:
return part,lba - start,Id
return None
if __name__ == '__main__':
wd = sys.argv[1]
lba = int(sys.argv[2])
print wd,lba,"Whole Disk"
res = findpart(wd,lba)
if not res:
print "LBA is outside any partition"
sys.exit(1)
part,sect,Id = res
print part,sect,idtoname(Id)
if Id == ID_LVM:
bd,sect = findlv(part,sect)
# FIXME: problems if LV is snapshot
elif Id == ID_LINUX:
bd = part
else:
if Id == ID_RAID:
for md,devs in getmdmap():
for dev in devs:
if part == "/dev/"+dev:
part = "/dev/"+md
break
else: continue
break
res = findlv(part,sect)
if res:
print "PV =",part
bd,sect = res
else:
bd = part
blksiz = 4096
blk = int(sect * 512 / blksiz)
p = blkid(bd)
try:
t = p['TYPE']
except:
print bd,p
raise
print "fs=%s block=%d %s"%(bd,blk,t)
if t.startswith('ext'):
inum = icheck(bd,blk)
if inum:
fn = ncheck(bd,inum)
print "file=%s inum=%d"%(fn,inum)
else:
print "<free space>"