access all elements of a row of a trilinos sparse matrix

Simon

unread,

Dec 28, 2023, 2:22:38 PM12/28/23

to deal.II User Group

Dear all,

I want to build the scalar product between rows of a

TrilinosWrappers::SparseMatrix and a TrilinosWrappers::MPI::Vector.

I implemented this by looping over the elements of the rows via iterators

(it = systemMatrix.begin(row) , itend = systemMatrix.end(row)

-> while (it != itend) { ... it++; } )

Inside while, I calculate it->value (to get the matrix element) times vec(it->column())

(to get the corresponding vector element) and sum it up.

I calculate the scalar product for a set of rows and store the result in a FullMatrix

which I finally write to a file from the master process,

after gathering the local full matrices from all processors.

Attached is a small example which demonstrates this in practice.

In a parallel program, vec(it->column()) only works if it->column() is a locally owned element of the vector. So I wrapped this read access in

if( vec.in_local_range(it->column() ) .

Another problem is the following:

Say row 5 lives on processor 0 and has non-zero entries at columns 10,15,20,25,30.

Columns 10,15,20 live on processor 0 as well but columns 25,30 live on processor 1.

Then, I will never add the contributions to the scalar product from columns 25 and 30 because on processor 1,

systemMatrix.begin(5) == systemMatrix.end(5)

which is also stated in the docu.

How can I deal with this?

Best,

Simon

example(1).cc

Wolfgang Bangerth

unread,

Dec 29, 2023, 1:02:45 PM12/29/23

to dea...@googlegroups.com

On 12/28/23 12:22, Simon wrote:
>
> I want to build the scalar product between rows of a
> TrilinosWrappers::SparseMatrix and a TrilinosWrappers::MPI::Vector.

This sounds like you're computing a matrix-vector product -- the dot products
you compute are the elements of the resulting vector.

As you have learned, it isn't entirely trivial to determine how information
must be transmitted in parallel for these kinds of operations. But it is also
not necessary: The parallel matrix classes all have matrix-vector ("vmult")
operations implemented. Why not use those?

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Simon Wiesheier

unread,

Dec 29, 2023, 1:39:55 PM12/29/23

to dea...@googlegroups.com

Yes, basically I want to compute a matrix-vector product.

But if my system has say 500k DoFs, I only need say 1k elements of the result of vmult, because the number of rows I am interested in is quite small -- they correspond to a small set of DoFs living on the boundary.

For efficiency reasons, I decided to loop manually over the rows I am interested in.

In a serial program, that worked smoothly. But if I get the gist of your message, vmult and extracting the relevant elements is the only way to do this in parallel, right?

Thank you,

Simon

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/8e748b05-a6da-45d6-bd2a-23b552c818e6%40colostate.edu.

Wolfgang Bangerth

unread,

Dec 29, 2023, 5:33:38 PM12/29/23

to dea...@googlegroups.com

On 12/29/23 11:39, Simon Wiesheier wrote:
>
> For efficiency reasons, I decided to loop manually over the rows I am
> interested in.
> In a serial program, that worked smoothly. But if I get the gist of your
> message, vmult and extracting the relevant elements is the only way to do this
> in parallel, right?

If you only need certain rows, you still have to do the same basic steps a
parallel vmult has to do, except that you can restrict yourself to a subset of
the rows. What I meant to say is that it isn't trivial to determine which
elements you need to communicate. You might have to read through the source
code of PETSc's or Trilinos's vmult functions to understand the sequence of
events. I would not be surprised if a parallel vmult runs into several hundred
of lines of code.

It is certainly *easier* to just do the full vmult with a vector, and then
pick out the entries you actually need. Whether that's a performance problem
is something you should benchmark -- it is not worth optimizing operations for
which you do not already know that they are a bottleneck.

Reply all

Reply to author

Forward