Configuré un clúster warewulf en centos 7 y instalé openmpi-x86_64-1.10.0-10.el7, además también instalé mpich. Cuando ejecuto mpirun con openmpi, aparece el siguiente error, lo mismo con mpich funciona perfectamente. Cambiar el n0000 al maestro del clúster también funciona, pero no se ejecuta en el nodo.
mpirun -n 1 -host n0000 echo $HOSTNAME
[n0000.cluster:01719] [[24772,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
A continuación puede encontrar las salidas de la dirección IP del clúster y del servidor. También he echado un vistazo ahttps://www.open-mpi.org/community/lists/users/2015/09/27643.phpdonde se describe un problema similar, pero no creo tener interfaces en la misma subred.
Server:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:2e:ee:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
valid_lft 85746sec preferred_lft 85746sec
inet6 fe80::a00:27ff:fe2e:eec2/64 scope link
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:76:b9:e7 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.1/24 brd 10.1.1.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe76:b9e7/64 scope link
valid_lft forever preferred_ft forever
Node:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/24 scope host lo
valid_lft forever preferred_lft forever
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:57:46:ce brd ff:ff:ff:ff:ff:ff
inet 10.1.1.10/24 brd 10.1.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe57:46ce/64 scope link
valid_lft forever preferred_lft forever
¿Alguna idea?
Respuesta1
¿Dónde instalaste OpenFOAM?
¿Lo instaló en /opt/ o /home/username/OpenFOAM?
Si hizo lo primero, el nodo de cálculo no puede encontrar su ubicación principal (/opt/)