tracing memory use with pbdR

36 views
Skip to first unread message

Cristina Montañola

unread,
Jul 4, 2019, 3:01:10 AM7/4/19
to rbigdatap...@googlegroups.com
Hello everyone,

I would like to know if there is any way to trace memory usage and consumption during execution in pbdR. I'm running different tests to check the performance of some simple code with different matrix sizes. The problem is that large matrices launch an out of memory error while according to my calculations it shouldn't be that way. So I suspect pbdR actually needs more memory to manage internal operations and I cannot find any documentation about this in the manuals.

Do you have any library or function in pbdR package that helps trace memory usage?

Best regards,

Cristina

Ostrouchov, George

unread,
Jul 4, 2019, 11:02:04 PM7/4/19
to Cristina Montañola, RBigDataProgramming

Hi Cristina,

 

Consider looking at https://github.com/shinra-dev/memuse. While this is not specific to distributed computing, you can get sizes of local objects. Then you control printing of the information from different ranks via comm.cat() or comm.print().

 

Also, all other R memory use tools, like those described in http://adv-r.had.co.nz/memory.html, are available for local memory use reporting.

 

Regards,

George

--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to the Google Groups "RBigDataProgramming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rbigdataprogram...@googlegroups.com.
To post to this group, send email to rbigdatap...@googlegroups.com.
Visit this group at https://groups.google.com/group/rbigdataprogramming.
To view this discussion on the web visit https://groups.google.com/d/msgid/rbigdataprogramming/CAP%3D55iYScvq_AU492wRhK0riZspiTuC_Ymg%2BrCVoPP2x%3DLFCPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

wrathematics

unread,
Jul 10, 2019, 11:04:32 AM7/10/19
to rbigdatap...@googlegroups.com
Hi Cristina,

Is it running out of memory during the construction of the distributed matrix, or in performing operations on it?

Some operations will be much more expensive than others. For example, all of the matrix factorizations modify the data. Because of R semantics, this requires a copy to be made, so the data is modified only in the copy. R itself has the same problem as pbdR in this sense.

Usually you don't want to consume more than 1/3 of your memory on any given rank with the matrix itself. If you can give me some more information about the operations you're using, I can try to give some insight into the number of copies/extra data the operation needs. You can also try using memuse::Sys.procmem() to track the memory consumption of the process in smaller scale tests.

Best
-Drew

Reply all
Reply to author
Forward
0 new messages