Desvincular la GPU de los controladores Radeon sin reiniciar lightdm

Desvincular la GPU de los controladores Radeon sin reiniciar lightdm

Utilizo un script bash para volver a vincular mi GPU para el paso a través de KVM PCI, pero requiere que detenga lightdm para desvincularlo/vincularlo de los controladores radeon. Si no detengo lightdm, todo el sistema se bloquea después de unos segundos y ni siquiera puedo iniciar sesión mediante SSH para ver qué está pasando. Debe haber alguna forma de separar los controladores de forma segura. Estoy usando el kernel 4.1.6, ya que 4.2 actualmente rompe el paso PCI.

Intenté eliminar el controlador Radeon antes de desvincularlo, pero no funcionó.

modprobe --remove-dependencies radeon

Sospecho que esto se debe a que lo están utilizando aquellos que, por alguna razón, no se eliminaron:

lsmod | grep radeon
radeon               1589248  0
ttm                    94208  1 radeon
i2c_algo_bit           16384  2 i915,radeon
drm_kms_helper        126976  2 i915,radeon
drm                   352256  7 ttm,i915,drm_kms_helper,radeon

Había muchos rastros de pila como éste. Algunos de sysfs/group.c y el resto de drm. Parece que esto es un problema con la gestión de la memoria. No estoy seguro de cómo haría para desvincularlo correctamente.

WARNING: CPU: 3 PID: 10935 at /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_object.c:83 radeon_ttm_bo_destroy+0xea/0xf0 [radeon]()
Modules linked in: pci_stub joydev binfmt_misc arc4 nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap ath9k ath9k_common intel_rapl iosf_mbi amdkfd x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek amd_iommu_v2 snd_hda_co$
CPU: 3 PID: 10935 Comm: echo Tainted: G        W       4.1.6-040106-generic #201508170230
Hardware name: ASUS All Series/Z97-E/USB 3.1, BIOS 0403 04/07/2015
ffffffffc08d62a0 ffff88010656fa38 ffffffff817d1363 0000000000000000
0000000000000000 ffff88010656fa78 ffffffff81079c3a ffff88012d9d1ec0
ffff880220a6f868 ffff880220a6f800 0000000000002480 ffff880220a6f868
Call Trace:
[<ffffffff817d1363>] dump_stack+0x45/0x57
[<ffffffff81079c3a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81079d2a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc07bf5ba>] radeon_ttm_bo_destroy+0xea/0xf0 [radeon]
[<ffffffffc042e4d9>] ttm_bo_release_list+0xa9/0x180 [ttm]
[<ffffffffc04351e0>] ? ttm_bo_man_put_node+0x40/0x50 [ttm]
[<ffffffffc042e6cd>] ttm_bo_release+0x11d/0x2b0 [ttm]
[<ffffffff81507816>] ? __dev_printk+0x46/0xa0
[<ffffffffc042e889>] ttm_bo_unref+0x29/0x30 [ttm]
[<ffffffffc07bfada>] radeon_bo_unref+0x2a/0x50 [radeon]
[<ffffffffc07d4cdb>] radeon_gem_object_free+0x4b/0x50 [radeon]
[<ffffffffc00254a7>] drm_gem_object_free+0x27/0x30 [drm]
[<ffffffffc07bff78>] radeon_bo_force_delete+0x128/0x130 [radeon]
[<ffffffffc07d4ebe>] radeon_gem_fini+0xe/0x10 [radeon]
[<ffffffffc083ebad>] si_fini+0xbd/0x110 [radeon]
[<ffffffffc07a1612>] radeon_device_fini+0x42/0x140 [radeon]
[<ffffffffc07a3d40>] radeon_driver_unload_kms+0x50/0x70 [radeon]
[<ffffffffc002a8cd>] drm_dev_unregister+0x2d/0xc0 [drm]
[<ffffffffc002af87>] drm_put_dev+0x27/0x80 [drm]
[<ffffffffc079f295>] radeon_pci_remove+0x15/0x20 [radeon]
[<ffffffff8140193f>] pci_device_remove+0x3f/0xc0
[<ffffffff8150b297>] __device_release_driver+0x87/0x120
[<ffffffff8150b353>] device_release_driver+0x23/0x30
[<ffffffff8150a04d>] unbind_store+0xbd/0xe0
[<ffffffff81509484>] drv_attr_store+0x24/0x40
[<ffffffff8127478d>] sysfs_kf_write+0x3d/0x50
[<ffffffff81273c3a>] kernfs_fop_write+0x12a/0x180
[<ffffffff811f8d98>] __vfs_write+0x28/0x100
[<ffffffff811fba19>] ? __sb_start_write+0x49/0xf0
[<ffffffff81320993>] ? security_file_permission+0x23/0xa0
[<ffffffff811f9499>] vfs_write+0xa9/0x1b0
[<ffffffff817d6f66>] ? mutex_lock+0x16/0x37
[<ffffffff811fa2a6>] SyS_write+0x46/0xb0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8f32>] system_call_fastpath+0x16/0x75

Aquí está mi guión actual para aquellos interesados. (Ejecutar desde la consola tty fuera de xsession)

#!/bin/bash

read -n3 -rsp "Restart lightdm to unbind the GPU? [yes] " res
test "$res" != 'yes' && exit 1
echo

sudo service lightdm stop
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo service lightdm start

echo "Rebind Audio"
sudo modprobe pci_stub
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/new_id
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/bind
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/remove_id

# Check if VM drive is mounted
if ! grep -qs '/media/ljosalfur/VM' /proc/mounts; then
echo "Attempting to mount VM drive. I don't know how though."
#sudo mkdir /media/ljosalfur/VM
#sudo mount /dev/disk/by-id/0BD253F0-EF7F-6F40-BDD8-FABF85161762 /media/ljosalfur/VM
fi

sudo kvm -monitor stdio -vnc :0 \
-m 6G -mem-path /dev/hugepages \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-cpu host -smp 6,sockets=1,cores=6,threads=1 \
-device vfio-pci,host=01:00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1 \
-device pci-assign,host=00:1b.0 \
-drive file=/media/ljosalfur/VM/vm7.img,format=raw,cache=writethrough \
-smb /media/ljosalfur \
-usb -usbdevice host:046d:c24a -show-cursor \
-usb -usbdevice host:1b1c:1b08

echo
echo "Re-Rebind Audio"
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/bind

echo "Unbind GPU from vfio-pci"
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind

read -n3 -rsp "Restart lightdm to rebind the GPU? [yes] " ress
test "$ress" != 'yes' && (exit 1)
echo
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/radeon/bind

Respuesta1

Finalmente encontré una manera de hacerlo usando la información de este hilo:https://www.reddit.com/r/VFIO/comments/41pm1q/my_bash_script_for_rebinding_a_ secondary_nvidia/

El guión de trabajo es:

#!/bin/bash
# Which device and which related HDMI audio device. They're usually in pairs.
export VGA_DEVICE=0000:01:00.0
export AUDIO_DEVICE=0000:01:00.1
 
export VGA_DRIVER=radeon
export AUDIO_DRIVER=snd_hda_intel
 
# Passing through USB devices. Querying bus address and feeding that to QEMU
# instead of the device ID, so you can yank and replug the keyboard to regain
# control.
export KEYBOARD="1b1c:1b08"
export MOUSE="046d:c24a"
 
# Unbinds a device and loads the driver specified.
flipdriver() {
    dev="$1"
    driver="$2"
 
    if [ -z $driver ] | [ -z $dev ];
    then
        return 1
    fi
 
    vendor=$(cat /sys/bus/pci/devices/${dev}/vendor)
    device=$(cat /sys/bus/pci/devices/${dev}/device)
 
    echo -n Unbinding $vendor:$device ...
 
    if [ -e /sys/bus/pci/devices/${dev}/driver ]; then
        echo ${dev} > /sys/bus/pci/devices/${dev}/driver/unbind
        while [ -e /sys/bus/pci/devices/${dev}/driver ]; do
            sleep 0.5
            echo -n .
        done
    fi
    echo " OK!"
 
    echo -n Binding \'$driver\' to $vendor:$device ...
    echo ${vendor} ${device} > /sys/bus/pci/drivers/${driver}/new_id
 
    echo " OK!"
 
    return 0
}
 
# Common error message
fliperror()
{
    echo "Couldn\'t perform required driver switch\'n\'bait!"
    exit 1
}
 
# Xorg shouldn't run.
if [ -n "$( ps -C xinit | grep xinit )" ];
then
    echo "Don\'t run this inside Xorg!"
    exit 1
fi
 
# Unbind specified graphics card and audio device.
echo "Pulling the plug on the specified passthrough devices..."
flipdriver $VGA_DEVICE vfio-pci
flipdriver $AUDIO_DEVICE vfio-pci
 
export QEMU_PA_SAMPLES=128
export QEMU_AUDIO_DRV=alsa
 
# Get the bus addresses for keyboard and mouse.
export QEMU_KEYB=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$KEYBOARD'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2/p' )
export QEMU_MOUS=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$MOUSE'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2 -show-cursor/p' )
 
# Check if VM drive is mounted
if ! grep -qs '/media/user/VM' /proc/mounts; then
echo "Attempting to mount VM drive."
sudo mount /dev/sdc1
fi
 
#network stuff
tunctl -t vmtap10
ip link set dev tap10 address 42:42:42:42:42:10
ifconfig vmtap10 192.168.42.1 up
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -A FORWARD -i vmtap10 -j ACCEPT
iptables -A FORWARD -o vmtap10 -m state --state RELATED,ESTABLISHED -j ACCEPT
 
#pactl set-sink-volume 2 50%
 
echo Starting virtual machine...
sleep 0.2
 
# QEMU stuff
sudo kvm -monitor stdio -vnc :1 -vga none \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-m 4G -mem-path /dev/hugepages \
-cpu host -smp sockets=1,cores=6,threads=1 \
-soundhw ac97 \
-device vfio-pci,host=01:00.0 \
-usb -usbdevice host:046d:c215 \
-usb -device nec-usb-xhci,id=xhci \
$QEMU_KEYB \
$QEMU_MOUS \
-net nic,macaddr=42:42:42:42:42:42 -net tap,ifname=vmtap10,script=no,downscript=no,vhost=on \
-drive file=/media/user/VM/vm10.img,format=qcow2,cache=writeback \
-smb /media/ljosalfur \
-cdrom /home/user/Downloads/virtio-win-0.1.105.iso
 
# Rebind the devices for the host.
echo Adios vfio, reloading the host drivers for the passedthrough devices...
flipdriver $AUDIO_DEVICE $AUDIO_DRIVER
flipdriver $VGA_DEVICE $VGA_DRIVER
 
iptables -F
iptables -t nat -F POSTROUTING
ip link delete vmtap10

información relacionada