LVM trava ao tentar criar meu nó de dispositivo raiz

LVM trava ao tentar criar meu nó de dispositivo raiz

Siga paraComo posso verificar se há blocos defeituosos em um volume físico LVM?

O título resume principalmente tudo. Basicamente, tenho uma caixa particionada com uma /bootpartição normal e, em seguida, um volume físico LVM preenchendo o restante da unidade. No LVM eu tenho um grupo de volumes com uma partição raiz, uma /homepartição e uma partição swap.

Quando o LVM cria os nós do dispositivo /dev/mapper, ele cria as partições swap e home perfeitamente. No entanto, geralmente ele trava ao tentar criar o nó do dispositivo raiz. Isso acontece a partir de um live CD ( pvscan; vgscan; vgchange -ayfoi o que usei, IIRC) e também do ramdisk inicial, evitando que a caixa inicialize. Também tentei no shell de recuperação initrd ( lvm pvscan; lvm vgscan; lvm vgchange -ayé o que usei, IIRC), que também falha da mesma maneira.

Às vezes, vgchange -ayele cria o nó do dispositivo raiz (depois de um atraso muito longo), mas nunca sai, deixando-me matá-lo manualmente. Quando isso acontece tento montar o aparelho, mas ele sempre trava indefinidamente. Observe que enquanto ambos os comandos estão em execução, o console exibe um monte de mensagens sobre falha no comando "READ DMA" ou algo assim.

Já corri smartctl -a /dev/sdaalgumas vezes. Cada vez que ele apresenta uma boa quantidade de erros sobre blocos defeituosos (IIRC), mas no final das contas indica que a unidade está em boas condições.

eu coloqueiuma pastana dmesgmáquina afetada. Os logs vêm da inicialização de um live cd do Arch Linux e da execução do pvscan; vgscan; vgchange -ay. vgchange -aydesta vez ficou pendurado para sempre e acabei matando-o. Aqui está o final de dmesg, para a posteridade (e então eu [não uso um pastebin2):

[   46.332920] end_request: I/O error, dev fd0, sector 0
[   58.503496] end_request: I/O error, dev fd0, sector 0
[167992.304649] EXT4-fs (sda1): recovery complete
[167992.304660] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168092.874016] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168163.318923] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168459.839738] end_request: I/O error, dev fd0, sector 0
[168472.010337] end_request: I/O error, dev fd0, sector 0
[168614.642035] bio: create slab <bio-2> at 2
[168630.045526] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168630.045649] ata1.00: BMDMA stat 0x65
[168630.045710] ata1.00: failed command: READ DMA
[168630.045787] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
         res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168630.046006] ata1.00: status: { DRDY ERR }
[168630.046071] ata1.00: error: { UNC }
[168630.066286] ata1.00: configured for UDMA/100
[168630.079493] ata1.01: configured for UDMA/66
[168630.079514] sd 0:0:0:0: [sda] Unhandled sense code
[168630.079517] sd 0:0:0:0: [sda]  
[168630.079520] Result: hostbyte=0x00 driverbyte=0x08
[168630.079523] sd 0:0:0:0: [sda]  
[168630.079525] Sense Key : 0x3 [current] [descriptor]
[168630.079530] Descriptor sense data with sense descriptors (in hex):
[168630.079532]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[168630.079544]         06 10 10 00 
[168630.079549] sd 0:0:0:0: [sda]  
[168630.079551] ASC=0x11 ASCQ=0x4
[168630.079554] sd 0:0:0:0: [sda] CDB: 
[168630.079556] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168630.079567] end_request: I/O error, dev sda, sector 101715968
[168630.079665] Buffer I/O error on device dm-3, logical block 0
[168630.079775] ata1: EH complete
[168634.564062] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168634.564165] ata1.00: BMDMA stat 0x64
[168634.564225] ata1.00: failed command: READ DMA
[168634.564301] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
         res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168634.564527] ata1.00: status: { DRDY ERR }
[168634.564592] ata1.00: error: { IDNF }
[168634.584336] ata1.00: configured for UDMA/100
[168634.597559] ata1.01: configured for UDMA/66
[168634.597578] ata1: EH complete
[168639.087353] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168639.087462] ata1.00: BMDMA stat 0x64
[168639.087521] ata1.00: failed command: READ DMA
[168639.087596] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
         res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168639.087822] ata1.00: status: { DRDY ERR }
[168639.087886] ata1.00: error: { IDNF }
[168639.105791] ata1.00: configured for UDMA/100
[168639.118999] ata1.01: configured for UDMA/66
[168639.119017] ata1: EH complete
[168645.896986] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168645.897095] ata1.00: BMDMA stat 0x64
[168645.897155] ata1.00: failed command: READ DMA
[168645.900373] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
         res 51/40:00:83:0f:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168645.906936] ata1.00: status: { DRDY ERR }
[168645.910263] ata1.00: error: { UNC }
[168645.931315] ata1.00: configured for UDMA/100
[168645.944504] ata1.01: configured for UDMA/66
[168645.944525] sd 0:0:0:0: [sda] Unhandled sense code
[168645.944529] sd 0:0:0:0: [sda]  
[168645.944531] Result: hostbyte=0x00 driverbyte=0x08
[168645.944534] sd 0:0:0:0: [sda]  
[168645.944537] Sense Key : 0x3 [current] [descriptor]
[168645.944541] Descriptor sense data with sense descriptors (in hex):
[168645.944543]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[168645.944554]         06 10 0f 83 
[168645.944559] sd 0:0:0:0: [sda]  
[168645.944561] ASC=0x11 ASCQ=0x4
[168645.944564] sd 0:0:0:0: [sda] CDB: 
[168645.944566] cdb[0]=0x28: 28 00 06 10 0f 80 00 00 08 00
[168645.944578] end_request: I/O error, dev sda, sector 101715843
[168645.947946] Buffer I/O error on device dm-2, logical block 10485744
[168645.951439] ata1: EH complete
[168650.445911] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168650.449275] ata1.00: BMDMA stat 0x65
[168650.452579] ata1.00: failed command: READ DMA
[168650.455873] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
         res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168650.462537] ata1.00: status: { DRDY ERR }
[168650.465714] ata1.00: error: { UNC }
[168650.486063] ata1.00: configured for UDMA/100
[168650.499326] ata1.01: configured for UDMA/66
[168650.499344] sd 0:0:0:0: [sda] Unhandled sense code
[168650.499348] sd 0:0:0:0: [sda]  
[168650.499350] Result: hostbyte=0x00 driverbyte=0x08
[168650.499353] sd 0:0:0:0: [sda]  
[168650.499355] Sense Key : 0x3 [current] [descriptor]
[168650.499360] Descriptor sense data with sense descriptors (in hex):
[168650.499362]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[168650.499373]         06 10 10 00 
[168650.499378] sd 0:0:0:0: [sda]  
[168650.499380] ASC=0x11 ASCQ=0x4
[168650.499383] sd 0:0:0:0: [sda] CDB: 
[168650.499385] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168650.499396] end_request: I/O error, dev sda, sector 101715968
[168650.502757] Buffer I/O error on device dm-3, logical block 0
[168650.506189] ata1: EH complete
[168798.816025] usb 9-2: new high-speed USB device number 2 using ehci-pci

Este é apenas o fim do log, onde começaram os erros, pois atingi o limite de postagens. Para ver tudo, dê uma olhada no pastebin.

Peço desculpas por não fornecer informações específicas, mas não estou na frente da caixa afetada no momento.

Responder1

Pelas informações extras que você forneceu, parece que você tem uma unidade defeituosa (blocos defeituosos). Você pode tentar solucionar esses problemas, se desejar, mas consideraria seriamente substituir a unidade.

Se você quiser contornar o problema, basicamente você terá que encontrar as extensões físicas do LVM que ficam sobre os blocos defeituosos e adicionar essas extensões físicas a um volume lógico que não deve ser usado.

Na verdade, existe uma cadeia de e-mail bastante recente na lista de discussão linux-lvm sobre esse assunto (eu li toda a cadeia, ela contém muitas informações):

https://www.redhat.com/archives/linux-lvm/2012-November/msg00033.html


Nesta mensagem específica, parece que alguém criou um script python para ajudar na tarefa:

https://www.redhat.com/archives/linux-lvm/2012-November/msg00038.html

Tendo ajudado pessoas em tais situações (onde pelo menos a Internet estava funcionando), usei o script anexo para ajudar a encontrar LVs e arquivos afetados.

#!/usr/bin/python
# Identify partition, LV, file containing a sector 

# Copyright (C) 2010,2012 Stuart D. Gathman
# Shared under GNU Public License v2 or later
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.

#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.

#   You should have received a copy of the GNU General Public License along
#   with this program; if not, write to the Free Software Foundation, Inc.,
#   51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

import sys
from subprocess import Popen,PIPE

ID_LVM = 0x8e
ID_LINUX = 0x83
ID_EXT = 0x05
ID_RAID = 0xfd

def idtoname(id):
  if id == ID_LVM: return "Linux LVM"
  if id == ID_LINUX: return "Linux Filesystem"
  if id == ID_EXT: return "Extended Partition"
  if id == ID_RAID: return "Software RAID"
  return hex(id)

class Segment(object):
  __slots__ = ('pe1st','pelst','lvpath','le1st','lelst')
  def __init__(self,pe1st,pelst):
    self.pe1st = pe1st;
    self.pelst = pelst;
  def __str__(self):
    return "Seg:%d-%d:%s:%d-%d" % (
      self.pe1st,self.pelst,self.lvpath,self.le1st,self.lelst)

def cmdoutput(cmd):
  p = Popen(cmd, shell=True, stdout=PIPE)
  try:
    for ln in p.stdout:
      yield ln
  finally:
    p.stdout.close()
    p.wait()

def icheck(fs,blk):
  "Return inum from block number, or 0 if free space."
  for ln in cmdoutput("debugfs -R 'icheck %d' '%s' 2>/dev/null"%(blk,fs)):
    b,i = ln.strip().split(None,1)
    if not b[0].isdigit(): continue
    if int(b) == blk:
      if i.startswith('<'):
    return 0
      return int(i)
  raise ValueError('%s: invalid block: %d'%(fs,blk))

def ncheck(fs,inum):
  "Return filename from inode number, or None if not linked."
  for ln in cmdoutput("debugfs -R 'ncheck %d' '%s' 2>/dev/null"%(inum,fs)):
    i,n = ln.strip().split(None,1)
    if not i[0].isdigit(): continue
    if int(i) == inum:
      return n
  return None

def blkid(fs):
  "Return dictionary of block device attributes"
  d = {}
  for ln in cmdoutput("blkid -o export '%s'"%fs):
    k,v = ln.strip().split('=',1)
    d[k] = v
  return d

def getpvmap(pv):
  pe_start = 192 * 2
  pe_size = None
  seg = None
  segs = []
  for ln in cmdoutput("pvdisplay --units k -m %s"%pv):
    a = ln.strip().split()
    if not a: continue
    if a[0] == 'Physical' and a[4].endswith(':'):
      pe1st = int(a[2])
      pelst = int(a[4][:-1])
      seg = Segment(pe1st,pelst)
    elif seg and a[0] == 'Logical':
      if a[1] == 'volume':
    seg.lvpath = a[2]
      elif a[1] == 'extents':
    seg.le1st = int(a[2])
    seg.lelst = int(a[4])
    segs.append(seg)
    elif a[0] == 'PE' and a[1] == 'Size':
      if a[2] == "(KByte)":
    pe_size = int(a[3]) * 2
      elif a[3] == 'KiB':
    pe_size = int(float(a[2])) * 2
  if segs:
    for ln in cmdoutput("pvs --units k -o+pe_start %s"%pv):
      a = ln.split()
      if a[0] == pv:
        lst = a[-1]
    if lst.lower().endswith('k'):
      pe_start = int(float(lst[:-1]))*2
      return pe_start,pe_size,segs
  return None

def findlv(pv,sect):
  res = getpvmap(pv)
  if not res: return None
  pe_start,pe_size,m = res
  if sect < pe_start:
    raise Exception("Bad sector in PV metadata area")
  pe = int((sect - pe_start)/pe_size)
  pebeg = pe * pe_size + pe_start
  peoff = sect - pebeg
  for s in m:
    if s.pe1st <= pe <= s.pelst:
      le = s.le1st + pe - s.pe1st
      return s.lvpath,le * pe_size + peoff

def getmdmap():
  with open('/proc/mdstat','rt') as fp:
    m = []
    for ln in fp:
      if ln.startswith('md'):
    a = ln.split(':')
    raid = a[0].strip()
    devs = []
    a = a[1].split()
    for d in a[2:]:
      devs.append(d.split('[')[0])
    m.append((raid,devs))
    return m

def parse_sfdisk(s):
  for ln in s:
    try:
      part,desc = ln.split(':')
      if part.startswith('/dev/'):
        d = {}
        for p in desc.split(','):
      name,val = p.split('=')
      name = name.strip()
      if name.lower() == 'id':
        d[name] = int(val,16)
      else:
        d[name] = int(val)
    yield part.strip(),d
    except ValueError:
      continue

def findpart(wd,lba):
  s = cmdoutput("sfdisk -d %s"%wd)
  parts = [ (part,d['start'],d['size'],d['Id']) for part,d in parse_sfdisk(s) ]
  for part,start,sz,Id in parts:
    if Id == ID_EXT: continue
    if start <= lba < start + sz:
      return part,lba - start,Id
  return None

if __name__ == '__main__':
  wd = sys.argv[1]
  lba = int(sys.argv[2])
  print wd,lba,"Whole Disk"
  res = findpart(wd,lba)
  if not res:
    print "LBA is outside any partition"
    sys.exit(1)
  part,sect,Id = res
  print part,sect,idtoname(Id)
  if Id == ID_LVM:
    bd,sect = findlv(part,sect)
    # FIXME: problems if LV is snapshot
  elif Id == ID_LINUX:
    bd = part
  else:
    if Id == ID_RAID:
      for md,devs in getmdmap():
    for dev in devs:
      if part == "/dev/"+dev:
        part = "/dev/"+md
        break
        else: continue
    break
    res = findlv(part,sect)
    if res:
      print "PV =",part
      bd,sect = res
    else:
      bd = part
  blksiz = 4096
  blk = int(sect * 512 / blksiz)
  p = blkid(bd)
  try:
    t = p['TYPE']
  except:
    print bd,p
    raise
  print "fs=%s block=%d %s"%(bd,blk,t)
  if t.startswith('ext'):
    inum = icheck(bd,blk)
    if inum:
      fn = ncheck(bd,inum)
      print "file=%s inum=%d"%(fn,inum)
    else:
      print "<free space>"

informação relacionada