nvidia-smi 必須由 root 運作才能被一般使用者使用

nvidia-smi 必須由 root 運作才能被一般使用者使用

在新建的 Ubuntu 16.04 機器上,nvidia-smi以普通使用者身分執行失敗

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

以 root 身分運行

$ sudo nvidia-smi
[sudo] password for hanxue: 
Fri Jul 19 10:05:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   38C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   31C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   31C    P0    28W / 250W |      0MiB / 16276MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

隨後以一般使用者身分運作即可

$ nvidia-smi
Fri Jul 19 10:09:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   40C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   35C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   33C    P0    27W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

nvidia-smi是否存在需要先由 root 使用者執行的錯誤配置,是否有解決方案?例如手動載入 NVIDIA 核心模組

相關內容