mpi4py:在生成的进程之间进行通信

问题描述:

我有一个进程正在运行一个名为t1.py的程序,该程序会产生其他3个进程,所有这些进程都在运行t2.py.我想将等级为0的生成过程中的值广播到其他两个生成过程中.但是,当调用bcast时,程序将阻塞.知道为什么会这样吗?以及我该如何解决?

I have one process running a program called t1.py which spawns 3 other processes, all of which run t2.py. I want to broadcast a value from the spawned process with a rank of 0 to the two other spawned processes. However, when bcast is called, the program blocks. Any idea why this happens? And how do I fix it?

t1.py

from mpi4py import MPI
import sys

sub_comm = MPI.COMM_SELF.Spawn(sys.executable, args=['t2.py'], maxprocs=3)
print 'hi'

t2.py

from mpi4py import MPI

comm = MPI.Comm.Get_Parent()

print 'ho ', comm.Get_rank()
a = comm.bcast(comm.Get_rank(), root=0)
print a

输出

hi
ho  2
ho  0
ho  1

如果只希望孩子们互相交谈,可以使用MPI.COMM_WORLD:

If you just want the childs to talk to each other, you can use MPI.COMM_WORLD:

a = MPI.COMM_WORLD.bcast(MPI.COMM_WORLD.Get_rank(), root=0)

通过打印MPI.COMM_WORLD.Get_rank(), ' of ',MPI.COMM_WORLD.Get_size(),您可以检查孩子的MPI.COMM_WORLD是否限于孩子.

By printing MPI.COMM_WORLD.Get_rank(), ' of ',MPI.COMM_WORLD.Get_size(), you can check that the childs'MPI.COMM_WORLD is limited to the childs.

现在,让我们研究如果通过comm=MPI.Comm.Get_parent()获得commcomm.bcast(...)失败的原因.实际上,通过查看此通信器的大小和等级,它看起来与MPI.COMM_WORLD非常相似. 但是,相反,commMPI.COMM_WORLD非常不同:它是 MPI标准,尤其是5.2部分.2和5.2.3关于互通者的集体行动.关于bcast(),使用MPI.ROOTMPI.PROC_NULL代替广播公司root的等级来指定方向(父母到孩子到孩子到父母)和发送过程.最后,可以使用 Merge() (对应于 MPI_Intercomm_merge() ).在这种内部沟通者中,父母和孩子并不属于两个不同的群体:它们是像往常一样以其独特的等级为特征的过程.

Now, let's investigate the reason why comm.bcast(...) failed if comm is obtained by comm=MPI.Comm.Get_parent(). Indeed, by looking at the size and ranks of this communicator, it seems very similar to MPI.COMM_WORLD. But, on the contrary, comm is very different from MPI.COMM_WORLD: it is an intercommunicator. More precisely, it is the way a parent can talk to its childs. Collective communications can be used, but all processes, both the parent and its childs, must call the function. Please carrefully read the MPI standards, in particular the sections 5.2.2 and 5.2.3 about Intercommunicator Collective Operations. Regarding bcast(), MPI.ROOT and MPI.PROC_NULL are used instead of the rank of the broadcaster root to specify the direction (parent to child of child to parent) and the sending process. Lastly, an intracommunicator can be defined on the base of an intercommunicator by using Merge() (corresponding to MPI_Intercomm_merge()). In this intracommunicator, parents and childs do not belong to two different groups: they are processes characterized by their unique rank, as usual.

这是t1.py和t2.py的修改版本,其中执行了用于内部通信器的bcast().然后,对讲机为Merge(),并且照常调用生成的内部对讲机上的bcast().

Here are the modified versions of t1.py and t2.py, where a bcast() for a intercommunicator is performed. Then the intercommunicator is Merge() and a bcast() on the resulting intracommunicator is called as usual.

t1.py

from mpi4py import MPI
import sys

sub_comm = MPI.COMM_SELF.Spawn(sys.executable, args=['t2.py'], maxprocs=3)

val=42
sub_comm.bcast(val, MPI.ROOT)

common_comm=sub_comm.Merge(False)
print 'parent in common_comm ', common_comm.Get_rank(), ' of  ',common_comm.Get_size()
#MPI_Intercomm_merge(parentcomm,1,&intracomm);

val=13
c=common_comm.bcast(val, root=0)
print "value from rank 0 in common_comm", c

t2.py

from mpi4py import MPI

comm = MPI.Comm.Get_parent()

print 'ho ', comm.Get_rank(), ' of  ',comm.Get_size(),' ', MPI.COMM_WORLD.Get_rank(), ' of  ',MPI.COMM_WORLD.Get_size()
a = MPI.COMM_WORLD.bcast(MPI.COMM_WORLD.Get_rank(), root=0)
print "value from other child", a

print "comm.Is_inter", comm.Is_inter()
b = comm.bcast(comm.Get_rank(), root=0)
print "value from parent", b

common_comm=comm.Merge(True)
print "common_comm.Is_inter", common_comm.Is_inter()
print 'common_comm ', common_comm.Get_rank(), ' of  ',common_comm.Get_size()

c=common_comm.bcast(0, root=0)
print "value from rank 0 in common_comm", c