Linux 上的多個 GPU 和風扇

Linux 上的多個 GPU 和風扇

我在 Ubuntu 18.04 機器上有兩個 GTX 1080ti,都是 Founder 版本。我主要用它們來訓練神經網路。

現在,我基本上有兩個問題:

  1. 設定coolbits(即使使用--enable-all-gpus)可以讓我設定風扇速度和時鐘僅適用於連接到顯示器的 GPU

  2. 我不想靜態設定風扇速度:相反,我想設定動態配置文件,%fanspeed vs 溫度。請注意,在自動模式下,在負載下,一個 1080ti 經常會達到 89-90C,無論節流和機箱寬敞的事實如何。

有關我的配置的資訊:

inxi -b
System:    Host: nimrod Kernel: 4.15.0-46-generic x86_64 bits: 64
           Desktop: Xfce 4.12.3 Distro: Ubuntu 18.04.2 LTS
Machine:   Device: desktop Mobo: FUJITSU model: D3128-B2 v: S26361-D3128-B2 serial: N/A
           UEFI: FUJITSU // American Megatrends v: V4.6.5.4 R1.8.0 for D3128-B2x date: 06/28/2018
CPU:       10 core Intel Xeon E5-2680 v2 (-MT-MCP-) speed/max: 2269/3600 MHz
Graphics:  Card-1: Advanced Micro Devices [AMD/ATI] Park [Mobility Radeon HD 5430]
           Card-2: NVIDIA GP102 [GeForce GTX 1080 Ti]
           Card-3: NVIDIA GP102 [GeForce GTX 1080 Ti]
           Display Server: x11 (X.Org 1.19.6 )
           drivers: modesetting,nvidia,ati,radeon,nouveau (unloaded: fbdev,vesa)
           Resolution: [email protected]
           OpenGL: renderer: GeForce GTX 1080 Ti/PCIe/SSE2
           version: 4.6.0 NVIDIA 415.27
Network:   Card: Intel 82579LM Gigabit Network Connection (Lewisville)
           driver: e1000e
Drives:    HDD Total Size: 2262.5GB (9.5% used)
Info:      Processes: 413 Uptime: 10 min Memory: 3677.2/96560.4MB
           Client: Shell (bash) inxi: 2.3.56 

Nvidia-smi:

Mon Mar 25 04:19:30 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27       Driver Version: 415.27       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   39C    P8    10W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0  On |                  N/A |
| 31%   57C    P0    69W / 250W |    204MiB / 11176MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1465      G   /usr/lib/xorg/Xorg                           201MiB |
+-----------------------------------------------------------------------------+

最後是我的 xorg.conf

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 415.27

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:3:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:4:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

請注意,coolbits 是為它們兩者設定的。

你能幫助我嗎?

謝謝! :)

答案1

上週經歷了完全相同的事情。這是司機的錯。試試390或430版本,這是我確認在arch上可以正常工作的兩個版本,有兩個1080ti。

確實很難找出問題所在。起初我以為是我的主機板不支援SLI,所以我使用了另一塊主機板並啟動了SLI,然後我可以為兩個GPU設定風扇速度。然而,當使用 SLI 時,兩個顯示卡在兩個 GPU 上使用相同的記憶體。這是不可接受的,因為 SLI 會使批次大小變小。然後我停用 SLI,並且無法再次設定兩張卡的風扇速度。所以我嘗試更改我的 nvidia 驅動程序,然後它就可以正常工作了。該死的nvidia,當我換另一塊主機板時,我把第一塊主機板上的LGA底座弄壞了,並因為底座壞了而燒毀了i5-9400f。我知道這是由於我的粗心,但如果不是 nvidia 驅動程式的 bug,我就不會受苦。

相關內容