MPI with multi-model-spectrum-channel

JTL

unread,

Jan 24, 2017, 1:33:41 PM1/24/17

to ns-3-users

Hi, NS3 experts,

I am trying to find ways to speed up Wifi simulations. MPI is the first thing that came across my mind. I had some successful experience using MPI with clusters to build parallel cellular network simulations from eons ago.

I read this other thread about using MPI with LTE: https://groups.google.com/forum/#!searchin/ns-3-users/mpi$20lte%7Csort:relevance/ns-3-users/0ObxHnrIT8M/uk-UqVxqP7QJ

A couple of questions that I am seeking advices from the experts here:

Is it sufficient to just modify multi-model-spectrum-channel to allow packets to be sent over to different LPs while each LPs serves a few complete Wifi nodes?
Since WiFi nodes don't interact with each other all that much comparing to LTE eNBs, I expect most of the cross-LP communications should happen in SpectrumChannel, Did I miss anything by making this assumption (especially if we were to run on-off applications across our wifi nodes)?
My past experience with MPI was great, we were able to speed up the simulation that usually takes more than a week down to within an hour. But things may have changed, has anyone done any wireless network simulations that involves 100's of nodes and able to comment on the performance improvement?
Can anyone help provide some pointers to past projects that had adapted NS3 for wireless network using MPI so we can learn from them?
Any advices against or for MPI to speed up the large scale simulation?

Any advice is highly appreciated.

Thank you very much.

JT

Tommaso Pecorella

unread,

Jan 29, 2017, 6:56:53 PM1/29/17

to ns-3-users

Hi JT,

the problem with using MPI and wireless channels is that you can not really partition the program in sub-instances working in parallel.

The ns-3 MPI module works with a special wired channel (a point-to-point). In this way we know exactly what are the nodes that will "connect" the different processes. In a wireless simulation, all the nodes are (potentially) interacting.

What you could do (but it's not for the fainted hearth) is to modify the SpectrumModel channels in order to find if the system could be partitioned in any way.

As an example, you could safely assume that two Wi-Fi systems, one on channel 1 and one on channel 11, are not interacting.

Nevertheless, if a node changes its channel number, you'll need to re-partition the system.

Moreover, if there's a 3rd system that interacts with *both* channels, you'll not be able to partition the system.

The same goes for for wireless systems that are "too far away": you never know that a node will start moving (or transmitting with a greater power).

A better (maybe, MAYBE) approach would be to use CUDA. Since most operations are performed on multiple targets (same operation, multiple data), this could be a promising idea.

In any case, i strongly suggest to start by performing an in-depth code profiling, and to focus on the most time-consuming operation.

Cheers,

T.

JTL

unread,

Feb 3, 2017, 4:28:02 AM2/3/17

to ns-3-users

Hi,

First, thanks for your reply and the interesting suggestion of using CUDA. I am looking into that now though I have zero experience with it.

For MPI, my assumption is that calculations involved in the transmit/receive of PHY packets through the propagation channel is the most time consuming piece in the simulation especially if we were to consider fast fading. Even if that assumption is wrong, channel is really the only common layer that all packets from the nodes are plugged in and interact with each other (whether they can see each other should probably be left to the channel to decide). I am thinking of modifying SpectrumModel channels in a way such that it is "common and synchronous" to all LPs. This means that

all LPs will maintain a complete list of RxSpectrumPhy; the Ptr<RxSpectrumPhy> probably needs to be replaced by a map and iterating through the key instead of the pointer unless NULL is allow in the Ptr
all LPs will maintain RxSpectrumModel and the info map, etc
scheduler invoked inside StartTx() should be able to address nodes reside in remote LPs. It seems the distributed scheduler is currently hard-wired to point-to-point only.

T, do you know of any expert we might engage as consultant or contractor that knows NS3 & MPI as much as you do? (Of course, if you are willing to work with us, we would be thrilled). If you have further information about the potential candidate, let me know if and how I could start a private email conversation with you as there are things I cannot discuss on a public forum.

Looking forward to hearing your advice. Thanks

JT

Tom Henderson

unread,

Feb 3, 2017, 9:51:23 AM2/3/17

to ns-3-...@googlegroups.com

Hello JT, some further comments inline below.

On 02/03/2017 01:28 AM, JTL wrote:
> Hi,
>
> First, thanks for your reply and the interesting suggestion of using
> CUDA. I am looking into that now though I have zero experience with it.
>
> For MPI, my assumption is that calculations involved in the
> transmit/receive of PHY packets through the propagation channel is the
> most time consuming piece in the simulation especially if we were to
> consider fast fading.

In my profiling experience, the MAC layer methods involving channel
access and queue management, as well as packet handling in the
InterferenceHelper, rise to the top of the list. I recommend that you
try to profile your scenario using instructions in the wiki:
https://www.nsnam.org/wiki/HOWTO_use_oprofile

There isn't presently a fast fading error model for Wi-Fi.

Please keep in mind that a key to obtaining faster parallel simulation
is to find spots in your scenario where there is a good amount of
lookahead (i.e. the different LPs can execute on their own for a while
without having to synchronize). My understanding is that the local
wireless channel is a poor place to look for this. A better strategy
would be to put each wireless subnet (channel) in its own LP, and have
some propagation delay between these clusters (in point-to-point links)
if your scenario permits it. That is, if there isn't much interaction
in the scenario between clusters of nodes, those are good candidates for
parallelization, and maybe you are able to tolerate the injection of
some additional propagation delay between them in order to get some more
lookahead.

I think a better immediate strategy, if it pertains to you, is to
eliminate the generation of events from the channel due to weak
signals. The channels, by default, send each packet to all attached
nodes, but perhaps they could filter based on presumed receiver signal
strength to suppress receive events on receivers who will obtain a
packet whose signal is buried in the noise. However, if your simulation
involves every node within carrier sense range of every other node, this
strategy is not going to be helpful.

Another strategy is to make simplifications at the PHY such as the
simple wireless model that was presented at WNS3 last year, if you do
not care so much about PHY fidelity.

In summary, to speed up local wireless networks, one is likely going to
have to resort to abstractions (simplifications) of the pieces that are
not central to your particular study, and look towards reducing the
number of events, and try to parallelize across clusters of nodes
(channels) that do not interact much with one another.

- Tom

Reply all

Reply to author

Forward