Populating DistMatrix<T,U1,V1> from file/Matrix<T>/DistMatrix<T,U2,V2>

ANJU KAMBADUR

unread,

May 6, 2013, 11:10:57 AM5/6/13

to Elemental Development

Hey Guys,

(1) Is there code to populate DistMatrix from a Matrix? Alternately, is there a way to populate a DistMatrix from another DistMatrix with a different grid/ communicator/ group?

(2) Would the folks here suggest using the [MC,MR] distribution as much as possible? Would many of the BLAS-2, BLAS-3, and LAPACK operations implicitly convert DistMatrices with other distributions into [MC,MR]?

I am trying to use Elemental to parallelize (quick and dirty) some operations in a MATLAB code-base. The only shared file system available is NFS, so there is not much point in setting up parallel IO. Right now, I am thinking of using HDF5 to go from MATLAB to C++. Any suggestions from folks who have faced a similar situation are welcome.

- Anju
--------------------------------------------
Prabhanjan Kambadur
Research Staff Member
Business Analytics and Mathematical Sciences
IBM TJ Watson Research Center
Room 30-229 A

Jack Poulson

unread,

May 6, 2013, 12:08:41 PM5/6/13

to elemen...@googlegroups.com

Dear Anju,

On Mon, May 6, 2013 at 8:10 AM, ANJU KAMBADUR <pkam...@us.ibm.com> wrote:

(1) Is there code to populate DistMatrix from a Matrix? Alternately, is there a way to populate a DistMatrix from another DistMatrix with a different grid/ communicator/ group?

Is the matrix populated by every MPI process, or just one?

If it is populated by all of the processes, then you need only perform local copies on each process and this is easy enough, e.g.:

// Assuming existing m x n matrix ASeq

DistMatrix<double> A( m, n, grid );
for( int jLocal=0; jLocal<A.LocalWidth(); ++jLocal )
{

const int j = A.RowShift() + jLocal*A.RowStride();

for( int iLocal=0; iLocal<A.LocalHeight(); ++iLocal )

{

const int i = A.ColShift() + iLocal*A.ColStride();

A.SetLocal( iLocal, jLocal, ASeq.Get(i,j) );

}
}

If the Matrix is only populated on a single process, then you ideally should perform a single MPI_Scatter(v) in order to appropriately spread the data amongst the processes. However, if you want a "quick and dirty" solution, you can first broadcast (via MPI_Bcast) the matrix and then run the previous code sample.

(2) Would the folks here suggest using the [MC,MR] distribution as much as possible? Would many of the BLAS-2, BLAS-3, and LAPACK operations implicitly convert DistMatrices with other distributions into [MC,MR]?

Most routines assume an [MC,MR] distribution since a 2D distribution is typically required for scalability. There are situations where this is not the case, e.g., tall-skinny matrix operations, but typically [MC,MR] is best.

Michael Grant is working towards something that would handle the automatic redistribution. You can see his fork here: https://github.com/mcg1969/Elemental

Once things are working and stabilized, the plan is to try to pull this effort into the main branch. The "Auto" prefix will likely be changed to avoid confusion with std::auto_ptr.

I am trying to use Elemental to parallelize (quick and dirty) some operations in a MATLAB code-base. The only shared file system available is NFS, so there is not much point in setting up parallel IO. Right now, I am thinking of using HDF5 to go from MATLAB to C++. Any suggestions from folks who have faced a similar situation are welcome.

Hopefully someone else will chime in, as I have no experience with this. In my humble opinion, it is usually best to rewrite applications for MPI rather than bolting MPI on top of existing codebases, but I understand that sometimes this is not an option.

Jack

ANJU KAMBADUR

unread,

May 6, 2013, 12:49:52 PM5/6/13

to elemen...@googlegroups.com

Thanks for the comments, Jack. For simplicity, let us assume that rank-0 is the only one that has access to the file system. Even for the quick and dirty method, MPI_Broadcast is too expensive, so MPI_Scatterv might be best. However, here is where I would like some help:

In your code sample, you assume that ASeq has the elements that are locally needed by each process. Files are typically written in column-major order. It is easier to scatter this for a [*,VC] distribution. However, to take a single contiguous buffer and scatter it to suit [MC,MR] seems non-trivial. So, we could read the file in as a *permuted* [*,VC] DistMatrix and then use DistMatrix conversion. However, it would be nice to do it without that additional communication step and not require the column permutation.

In short, it would be great to go from:

(1) An un-partitioned column-major buffer/file to a DistMatrix<T,MC,MR>
(2) A 1D partitioned column-major buffer/file to a DistMatrix<T,MC,MR>. In this case, each process would own a *contiguous* set of columns.

Any thoughts?

- Anju
--------------------------------------------
Prabhanjan Kambadur
Research Staff Member
Business Analytics and Mathematical Sciences
IBM TJ Watson Research Center
Room 30-229 A

Jack Poulson ---05/06/2013 12:18:19 PM---Dear Anju, On Mon, May 6, 2013 at 8:10 AM, ANJU KAMBADUR <pkam...@us.ibm.com> wrote:

From:	Jack Poulson <jack.p...@gmail.com>
To:	"elemen...@googlegroups.com" <elemen...@googlegroups.com>
Date:	05/06/2013 12:18 PM
Subject:	Re: [elemental] Populating DistMatrix<T,U1,V1> from file/Matrix<T>/DistMatrix<T,U2,V2>
Sent by:	elemen...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "elemental-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elemental-de...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jack Poulson

unread,

May 6, 2013, 1:14:48 PM5/6/13

to elemen...@googlegroups.com

Hi Anju,

On Mon, May 6, 2013 at 9:49 AM, ANJU KAMBADUR <pkam...@us.ibm.com> wrote:

Thanks for the comments, Jack. For simplicity, let us assume that rank-0 is the only one that has access to the file system. Even for the quick and dirty method, MPI_Broadcast is too expensive, so MPI_Scatterv might be best. However, here is where I would like some help:

I assume that it is too expensive because you have many MPI ranks per machine and cannot afford to store the entire matrix on every process (but you can store it on one process)?

In your code sample, you assume that ASeq has the elements that are locally needed by each process. Files are typically written in column-major order. It is easier to scatter this for a [*,VC] distribution. However, to take a single contiguous buffer and scatter it to suit [MC,MR] seems non-trivial. So, we could read the file in as a *permuted* [*,VC] DistMatrix and then use DistMatrix conversion. However, it would be nice to do it without that additional communication step and not require the column permutation.

It would be better to add in a local packing step which combines the portions of the Matrix which need to be shipped off to each process via the MPI_Scatter(v) call. Note that, since each process will receive roughly the same amount of data, it is best to avoid MPI_Scatterv by padding the message lengths up to the maximum.

In short, it would be great to go from:

(1) An un-partitioned column-major buffer/file to a DistMatrix<T,MC,MR>
(2) A 1D partitioned column-major buffer/file to a DistMatrix<T,MC,MR>. In this case, each process would own a *contiguous* set of columns.

Any thoughts?

Starting from a single file should be avoided as it is inherently unscalable, and so I do not plan on implementing it myself. However, if you or someone else would like to contribute code, I would be happy to check it into the library. A proper solution should allow for chunking the communications into smaller pieces since, in general, the entire matrix may not fit in core. I am not sure if this is the case for you.

Jack

Reply all

Reply to author

Forward