Issue with Gatherv on multiple nodes

17 views
Skip to first unread message

Ankit Ankit

unread,
Jun 30, 2020, 11:13:44 AM6/30/20
to mpi4py
Hi Lisandro,

The following is a code for gathering numpy arrays of varying size from different processes. When I run this code on a single node (PBS node with 48 cores), it works fine. However, when I run it on a multiple node, the gathered data is incorrect. Can you please help me solve this problem?

#--------------------------------------------------------------------------------------------------------
import numpy as np
from numpy.linalg import norm
from mpi4py import MPI

Comm = MPI.COMM_WORLD
N_Workers = Comm.Get_size()
Rank = Comm.Get_rank()

RefDataLen = int(1e4)
VecLenList = RefDataLen*np.arange(1, N_Workers+1)
VecDisplList = np.array([np.sum(VecLenList[:i]) for i in range(N_Workers)])
N_GatheredVec = np.sum(VecLenList)

DataList = []
for i in range(N_Workers):
    
    Data = np.arange(VecLenList[i])*1e-3
    DataList.append(Data)


for i in range(10):

    if Rank == 0:

        GatheredVec = np.zeros(N_GatheredVec)
        Comm.Gatherv(DataList[Rank], (GatheredVec, VecLenList, VecDisplList, MPI.DOUBLE), 0)
        
        print(norm(GatheredVec-np.hstack(DataList)))
        
    else:     Comm.Gatherv(DataList[Rank],None,0)

#--------------------------------------------------------------------------------------------------------

Output for single node run (<=48 cores):
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

Output for multiple node run (>48 cores):
0.0
0.0
990524.488609363
990524.488609363
990524.488609363
990524.488609363
990524.488609363
990524.488609363
990524.488609363
990524.488609363


Kind regards,
Ankit






Lisandro Dalcin

unread,
Jun 30, 2020, 2:13:51 PM6/30/20
to mpi...@googlegroups.com
On Tue, 30 Jun 2020 at 18:13, Ankit Ankit <ankitn...@gmail.com> wrote:
Hi Lisandro,

The following is a code for gathering numpy arrays of varying size from different processes. When I run this code on a single node (PBS node with 48 cores), it works fine. However, when I run it on a multiple node, the gathered data is incorrect.

If it works fine in a single node but now on multiple nodes, then the problem is most likely deep down in the backend MPI implementation. 
 
Can you please help me solve this problem?

You provided almost no additional information, though if you did, I doubt I would say anything, as I do not have access to the machine. You should really ask for help to the IT staff of the computing infrastructure you are using.

PS: As time passes, I get more and more emails (and many times to my personal address) with large code pastes asking me to spot bugs. I don't want to be hostile, but it is a bit too much, I'm not a human debugger!


--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Ankit Ankit

unread,
Jun 30, 2020, 8:30:47 PM6/30/20
to mpi...@googlegroups.com
No worries. Thanks Lisandro.

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/CAEcYPwDOkbCwqx0hdZbKrAjhk1tTrSNyc4AnhP2nxJXSqGvCcw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages