lightdm을 다시 시작하지 않고 Radeon 드라이버에서 GPU 바인딩 해제

lightdm을 다시 시작하지 않고 Radeon 드라이버에서 GPU 바인딩 해제

KVM PCI 패스스루를 위해 GPU를 다시 바인딩하기 위해 bash 스크립트를 사용하지만, radeon 드라이버에서 바인딩을 해제/바인딩하려면 lightdm을 중지해야 합니다. lightdm을 중지하지 않으면 몇 초 후에 전체 시스템이 중단되고 SSH에 접속하여 무슨 일이 일어나고 있는지 확인할 수도 없습니다. 드라이버를 안전하게 분리할 수 있는 방법이 있어야 합니다. 4.2는 현재 PCI 패스스루를 중단하므로 커널 4.1.6을 사용하고 있습니다.

바인딩을 해제하기 전에 라데온 드라이버를 제거하려고 시도했지만 작동하지 않았습니다.

modprobe --remove-dependencies radeon

나는 이것이 어떤 이유로 제거되지 않은 이것들에 의해 사용되고 있기 때문이라고 생각합니다.

lsmod | grep radeon
radeon               1589248  0
ttm                    94208  1 radeon
i2c_algo_bit           16384  2 i915,radeon
drm_kms_helper        126976  2 i915,radeon
drm                   352256  7 ttm,i915,drm_kms_helper,radeon

이와 같은 스택 추적이 많이 있었습니다. 일부는 sysfs/group.c에서, 나머지는 drm에서 가져왔습니다. 메모리 관리에 문제가 있는 것 같습니다. 제대로 바인딩을 해제하는 방법을 잘 모르겠습니다.

WARNING: CPU: 3 PID: 10935 at /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_object.c:83 radeon_ttm_bo_destroy+0xea/0xf0 [radeon]()
Modules linked in: pci_stub joydev binfmt_misc arc4 nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap ath9k ath9k_common intel_rapl iosf_mbi amdkfd x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek amd_iommu_v2 snd_hda_co$
CPU: 3 PID: 10935 Comm: echo Tainted: G        W       4.1.6-040106-generic #201508170230
Hardware name: ASUS All Series/Z97-E/USB 3.1, BIOS 0403 04/07/2015
ffffffffc08d62a0 ffff88010656fa38 ffffffff817d1363 0000000000000000
0000000000000000 ffff88010656fa78 ffffffff81079c3a ffff88012d9d1ec0
ffff880220a6f868 ffff880220a6f800 0000000000002480 ffff880220a6f868
Call Trace:
[<ffffffff817d1363>] dump_stack+0x45/0x57
[<ffffffff81079c3a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81079d2a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc07bf5ba>] radeon_ttm_bo_destroy+0xea/0xf0 [radeon]
[<ffffffffc042e4d9>] ttm_bo_release_list+0xa9/0x180 [ttm]
[<ffffffffc04351e0>] ? ttm_bo_man_put_node+0x40/0x50 [ttm]
[<ffffffffc042e6cd>] ttm_bo_release+0x11d/0x2b0 [ttm]
[<ffffffff81507816>] ? __dev_printk+0x46/0xa0
[<ffffffffc042e889>] ttm_bo_unref+0x29/0x30 [ttm]
[<ffffffffc07bfada>] radeon_bo_unref+0x2a/0x50 [radeon]
[<ffffffffc07d4cdb>] radeon_gem_object_free+0x4b/0x50 [radeon]
[<ffffffffc00254a7>] drm_gem_object_free+0x27/0x30 [drm]
[<ffffffffc07bff78>] radeon_bo_force_delete+0x128/0x130 [radeon]
[<ffffffffc07d4ebe>] radeon_gem_fini+0xe/0x10 [radeon]
[<ffffffffc083ebad>] si_fini+0xbd/0x110 [radeon]
[<ffffffffc07a1612>] radeon_device_fini+0x42/0x140 [radeon]
[<ffffffffc07a3d40>] radeon_driver_unload_kms+0x50/0x70 [radeon]
[<ffffffffc002a8cd>] drm_dev_unregister+0x2d/0xc0 [drm]
[<ffffffffc002af87>] drm_put_dev+0x27/0x80 [drm]
[<ffffffffc079f295>] radeon_pci_remove+0x15/0x20 [radeon]
[<ffffffff8140193f>] pci_device_remove+0x3f/0xc0
[<ffffffff8150b297>] __device_release_driver+0x87/0x120
[<ffffffff8150b353>] device_release_driver+0x23/0x30
[<ffffffff8150a04d>] unbind_store+0xbd/0xe0
[<ffffffff81509484>] drv_attr_store+0x24/0x40
[<ffffffff8127478d>] sysfs_kf_write+0x3d/0x50
[<ffffffff81273c3a>] kernfs_fop_write+0x12a/0x180
[<ffffffff811f8d98>] __vfs_write+0x28/0x100
[<ffffffff811fba19>] ? __sb_start_write+0x49/0xf0
[<ffffffff81320993>] ? security_file_permission+0x23/0xa0
[<ffffffff811f9499>] vfs_write+0xa9/0x1b0
[<ffffffff817d6f66>] ? mutex_lock+0x16/0x37
[<ffffffff811fa2a6>] SyS_write+0x46/0xb0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8f32>] system_call_fastpath+0x16/0x75

관심 있는 분들을 위한 현재 스크립트는 다음과 같습니다. (xsession 외부의 tty 콘솔에서 실행)

#!/bin/bash

read -n3 -rsp "Restart lightdm to unbind the GPU? [yes] " res
test "$res" != 'yes' && exit 1
echo

sudo service lightdm stop
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo service lightdm start

echo "Rebind Audio"
sudo modprobe pci_stub
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/new_id
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/bind
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/remove_id

# Check if VM drive is mounted
if ! grep -qs '/media/ljosalfur/VM' /proc/mounts; then
echo "Attempting to mount VM drive. I don't know how though."
#sudo mkdir /media/ljosalfur/VM
#sudo mount /dev/disk/by-id/0BD253F0-EF7F-6F40-BDD8-FABF85161762 /media/ljosalfur/VM
fi

sudo kvm -monitor stdio -vnc :0 \
-m 6G -mem-path /dev/hugepages \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-cpu host -smp 6,sockets=1,cores=6,threads=1 \
-device vfio-pci,host=01:00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1 \
-device pci-assign,host=00:1b.0 \
-drive file=/media/ljosalfur/VM/vm7.img,format=raw,cache=writethrough \
-smb /media/ljosalfur \
-usb -usbdevice host:046d:c24a -show-cursor \
-usb -usbdevice host:1b1c:1b08

echo
echo "Re-Rebind Audio"
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/bind

echo "Unbind GPU from vfio-pci"
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind

read -n3 -rsp "Restart lightdm to rebind the GPU? [yes] " ress
test "$ress" != 'yes' && (exit 1)
echo
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/radeon/bind

답변1

마침내 이 스레드의 정보를 사용하여 이를 수행하는 방법을 찾았습니다.https://www.reddit.com/r/VFIO/comments/41pm1q/my_bash_script_for_rebound_a_secondary_nvidia/

작업 스크립트는 다음과 같습니다:

#!/bin/bash
# Which device and which related HDMI audio device. They're usually in pairs.
export VGA_DEVICE=0000:01:00.0
export AUDIO_DEVICE=0000:01:00.1
 
export VGA_DRIVER=radeon
export AUDIO_DRIVER=snd_hda_intel
 
# Passing through USB devices. Querying bus address and feeding that to QEMU
# instead of the device ID, so you can yank and replug the keyboard to regain
# control.
export KEYBOARD="1b1c:1b08"
export MOUSE="046d:c24a"
 
# Unbinds a device and loads the driver specified.
flipdriver() {
    dev="$1"
    driver="$2"
 
    if [ -z $driver ] | [ -z $dev ];
    then
        return 1
    fi
 
    vendor=$(cat /sys/bus/pci/devices/${dev}/vendor)
    device=$(cat /sys/bus/pci/devices/${dev}/device)
 
    echo -n Unbinding $vendor:$device ...
 
    if [ -e /sys/bus/pci/devices/${dev}/driver ]; then
        echo ${dev} > /sys/bus/pci/devices/${dev}/driver/unbind
        while [ -e /sys/bus/pci/devices/${dev}/driver ]; do
            sleep 0.5
            echo -n .
        done
    fi
    echo " OK!"
 
    echo -n Binding \'$driver\' to $vendor:$device ...
    echo ${vendor} ${device} > /sys/bus/pci/drivers/${driver}/new_id
 
    echo " OK!"
 
    return 0
}
 
# Common error message
fliperror()
{
    echo "Couldn\'t perform required driver switch\'n\'bait!"
    exit 1
}
 
# Xorg shouldn't run.
if [ -n "$( ps -C xinit | grep xinit )" ];
then
    echo "Don\'t run this inside Xorg!"
    exit 1
fi
 
# Unbind specified graphics card and audio device.
echo "Pulling the plug on the specified passthrough devices..."
flipdriver $VGA_DEVICE vfio-pci
flipdriver $AUDIO_DEVICE vfio-pci
 
export QEMU_PA_SAMPLES=128
export QEMU_AUDIO_DRV=alsa
 
# Get the bus addresses for keyboard and mouse.
export QEMU_KEYB=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$KEYBOARD'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2/p' )
export QEMU_MOUS=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$MOUSE'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2 -show-cursor/p' )
 
# Check if VM drive is mounted
if ! grep -qs '/media/user/VM' /proc/mounts; then
echo "Attempting to mount VM drive."
sudo mount /dev/sdc1
fi
 
#network stuff
tunctl -t vmtap10
ip link set dev tap10 address 42:42:42:42:42:10
ifconfig vmtap10 192.168.42.1 up
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -A FORWARD -i vmtap10 -j ACCEPT
iptables -A FORWARD -o vmtap10 -m state --state RELATED,ESTABLISHED -j ACCEPT
 
#pactl set-sink-volume 2 50%
 
echo Starting virtual machine...
sleep 0.2
 
# QEMU stuff
sudo kvm -monitor stdio -vnc :1 -vga none \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-m 4G -mem-path /dev/hugepages \
-cpu host -smp sockets=1,cores=6,threads=1 \
-soundhw ac97 \
-device vfio-pci,host=01:00.0 \
-usb -usbdevice host:046d:c215 \
-usb -device nec-usb-xhci,id=xhci \
$QEMU_KEYB \
$QEMU_MOUS \
-net nic,macaddr=42:42:42:42:42:42 -net tap,ifname=vmtap10,script=no,downscript=no,vhost=on \
-drive file=/media/user/VM/vm10.img,format=qcow2,cache=writeback \
-smb /media/ljosalfur \
-cdrom /home/user/Downloads/virtio-win-0.1.105.iso
 
# Rebind the devices for the host.
echo Adios vfio, reloading the host drivers for the passedthrough devices...
flipdriver $AUDIO_DEVICE $AUDIO_DRIVER
flipdriver $VGA_DEVICE $VGA_DRIVER
 
iptables -F
iptables -t nat -F POSTROUTING
ip link delete vmtap10

관련 정보