
En una máquina Ubuntu 16.04 recién construida, la ejecución nvidia-smi
falla como usuario normal
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Ejecutar como root funciona
$ sudo nvidia-smi
[sudo] password for hanxue:
Fri Jul 19 10:05:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 38C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 31C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 31C P0 28W / 250W | 0MiB / 16276MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
Y posteriormente ejecutarlo como usuario normal funciona.
$ nvidia-smi
Fri Jul 19 10:09:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 40C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 35C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 33C P0 27W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
¿Existe una configuración errónea que nvidia-smi
los usuarios root deben ejecutar primero y existe una solución para ello? por ejemplo, cargar manualmente los módulos del kernel de NVIDIA