RAM usage on multiple GPUs

Mark Dunscomb

unread,

Aug 25, 2017, 9:41:50 AM8/25/17

to gprMax-users

Hi All,

Q: Is gprMax capable of combining RAM from several gpu's to process a single model?

The manual discusses using MPI in conjunction with gpu's but it's not clear whether or not the code is capable of combining RAM from multiple cards to process a single model. My understanding is that not all code is written to jointly use RAM from multiple gpu's so, before we build a multiple gpu system, my IT folks are asking to confirm that building one will help my RAM woes.

Thanks for any answers!!

Mark Dunscomb

unread,

Aug 25, 2017, 10:47:48 AM8/25/17

to gprMax-users

Additional Background:

The video cards are individuals components and can’t share their onboard RAM. Dual processor like the Intel Xeon can share each other’s RAM because each processor has a RAM controller that is linked to the other processors RAM controller. This doesn’t t exist on the video cards. Even when used in SLI configuration where they are linked together in a master slave arrangement they are still treated as individual compute components and do not have access to each other’s onboard RAM. (I.e. the master divides the compute tasks among the individual slave cards who do the compute on their segment and transfer the result back to the means for presentation).

A number of applications do use multiple GPUs for computing. I suggest you check with the company supplying the application to see if they have a way of access the video card RAM as a single logical pool.

Antonis Giannopoulos

unread,

Aug 25, 2017, 12:57:45 PM8/25/17

to gprMax-users

Hi Mark,

The quick answer is NO.

The only way to use multiple computing units - in this case CPUs/Cores - at the moment is using the OpenMP parallelisation (i.e. multi-threading). The benefit there is that communication between CPUs/Cores is happening directly and easily as they have access to the same memory (RAM). Such parallelisation is not easy if every computing unit has its own memory and cannot only be achieved by exchanging appropriate information using some kind of a message passing protocol (e.g. MPI). In essence, a domain decomposition parallelisation approach is a lot trickier and it will be as slow as the slowest part of the network that is used to exchange messages and data that is connecting the computing units. In essence, MPI is supposed to be used for such a parallelisation approach that will allow you to run massive models on clusters etc. The current MPI implementation in gprMax is just a simple task farming of independent models that do not require any communication while they are executing. In essence, we use MPI to create "efficiently" a job array.

In a similar way, you can see the multi GPU problem. You can use many cards in parallel to run independent jobs (i.e. models) as the task farming approach used at the moment in gprMax, but you cannot split a big model amongst a number of GPUs without having as a bottleneck the communication at every iteration step of parts of the memory of each of the GPUs. In principle, you can build the algorithm using similar tools (maybe MPI) but it will not be as efficient. This obviously is not available in gprMax.

gprMax does not do any of the domain decomposition stuff but we would like to have a go with the multi-node big cluster implementation which we know that works. Actually, an MSc student tried this year as part of his thesis project but we kind of run out of time.

Best

Antonis

PS It is not a trivial coding effort to robustly do proper MPI domain decomposition parallelisation including all the extra stuff that gprMax does (i.e. PMLs, dispersive media, etc.)

Mark Dunscomb

unread,

Aug 25, 2017, 1:24:16 PM8/25/17

to gprMax-users

Thanks for your quick answer Antonis!! I'm jumping into the deep end of HPC and wasn't quite sure how the bus from multiple GPUs worked regarding RAM and the code. Your response makes perfect sense (unfortunately :)). It seems then the only effective way to substantially improve overall processing speed for a model requiring more RAM than on a single gpu card is to use MPI with multiple nodes, Phi Coprocessors, or some other similar scenario. Does that sound correct to you and have you tried using coprocessors?

Regards,

Mark

Antonis Giannopoulos

unread,

Aug 25, 2017, 1:42:07 PM8/25/17

to gprMax-users

It is certainly possible to use multiple GPUs in the same paradigm that one will use multiple CPU nodes in a cluster. The coding will be a bit trickier but the speed will depend on the slowest part of the network communicating parts of the memory amongst the GPUs RAM modules at every iteration. In essence, you need to communicate field information at every step. This is obviously not the complete model but everything else waits until this process completes.

It is an interesting idea if you want or need to model big challenge problems that require more than 1T memory. For problems that are just out of the grasp on the current GPUs I think in a year or so we will have GPUs with 32GB or 64GB RAM. These will not solve the big challenge problems but will allow pretty big models to run easily ...

Best

Antonis

Mark Dunscomb

unread,

Aug 25, 2017, 2:05:10 PM8/25/17

to gprMax-users

Thanks again for your help. I need to continue working this issue since running a B scan based on a model that needs 24 hrs per A scan isn't reasonable.

I also have a separate question regarding using a peplinski model in a fractal box for soil, B scans, and a hdf5 format geometry file. I'd hoped to save a modelspace geometry so that variation in the peplinski model isn't developed anew between each A scan (and to save time). However, I'm hitting an error that seems to indicate the hdf5 model doesn't contain the peplinski information. I need to investigate further and will post on the issues page on github.

Cheers,

Mark

Craig Warren

unread,

Sep 6, 2017, 3:16:25 AM9/6/17

to gprMax-users

Hi Mark,

Saving the geometry and reloading it should certainly save time when creating a B-scan. We have done this with the soil model/fractal box. Let me know if you have any specific problems.

Kind regards,

Craig

Mark Dunscomb

unread,

Sep 6, 2017, 10:24:01 AM9/6/17

to gprMax-users

Thanks Craig,

That certainly helps.

FYI, I'm about to run benchmarking on a cluster of Xeon Platnum 8180 processor nodes and will let you know the result. In the mean time...

The #geometry_objects_read: command throws an error if you use the last argument for incorporating dielectric smoothing. The error states that the command configured to accept exactly five arguments and dielectric smoothing is the sixth.

Cheers and thanks,

Mark

Craig Warren

unread,

Sep 6, 2017, 12:29:38 PM9/6/17

to gprMax-users

Hi Mark,

Yes, definitely interested to see results from the Intel® Xeon® Platinum 8180. RRP $10,000!

Are you looking at the most up-to-date docs? There was a mistake in the docs for the #geometry_objects_read command which I recently corrected. It shouldn't have the dielectric smoothing option - explained here at https://github.com/gprMax/gprMax/issues/122

Kind regards,

Craig

Mark Dunscomb

unread,

Sep 6, 2017, 1:56:37 PM9/6/17

to gprMax-users

Thanks, that's exactly what happened; I had an obsolete doc.

Reply all

Reply to author

Forward

Message has been deleted