
Auf einem neu erstellten Ubuntu 16.04-Rechner schlägt die Ausführung nvidia-smi
als normaler Benutzer fehl
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Die Ausführung als Root funktioniert
$ sudo nvidia-smi
[sudo] password for hanxue:
Fri Jul 19 10:05:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 38C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 31C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 31C P0 28W / 250W | 0MiB / 16276MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
Und anschließend funktioniert die Ausführung als normaler Benutzer
$ nvidia-smi
Fri Jul 19 10:09:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 40C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 35C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 33C P0 27W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Gibt es eine Fehlkonfiguration, die nvidia-smi
zuerst von Root-Benutzern ausgeführt werden muss, und gibt es eine Lösung dafür? Beispielsweise das manuelle Laden der NVIDIA-Kernelmodule