rateId out of range causes NS_ASSERT failure in MinstrelHtWifiManager::GetLowestIndex with ns-3-dev

93 views
Skip to first unread message

william....@intel.com

unread,
Jan 13, 2017, 7:14:54 PM1/13/17
to ns-3-users
The easiest way to reproduce the problem is to modify examples/wireless/80211e-txop.cc in ns-3-dev (changeset 12526:47ec67af9904) to use Minstrel-HT and 802.11ac.  I have attached a patch and the complete file.  The patch file makes it a little easier to see what has changed.

When I run my modified 80211e-txop with `waf --run 80211e-txop`, I get an assertion failure on Line 1857 of minstrel-ht-wifi-manager.cc.  The assert happens pretty quickly -- I do not perceive a delay between waf saying that the build is complete and the assertion failure.  The NS_ASSERT is testing the condition

    station->m_groupsTable[groupId].m_supported && station->m_groupsTable[groupId].m_ratesTable[rateId].supported

When I stop on the assert failure in gdb, rateId is 10, which is also the size of m_ratesTable.  That is, rateId is out of range.  

The while loop on Lines 1853 through 1856 iterated through the whole table without finding a supported rate, and now rateId is indexing past the end of the table.  There should probably be a check for groupId < station->m_groupsTable.size() before indexing m_groupsTable and rateId < station->m_groupsTable[groupId].m_ratesTable.size() before indexing m_ratesTable and maybe in GetIndex or before the call to GetIndex so that non-debug builds are protected.  Line 1871 has similar logic, but only rateId can be out of range, assuming a valid groupId is passed in (I actually first ran into this bug on Line 1871, but my minimal test case to reproduce it hits Line 1857 instead.)

When I print station->m_groupsTable[groupId].m_ratesTable, all the numeric entries are 0 and boolean entries are false.  groupId is 16.  When I print station->m_groupsTable, it has 48 entries, and all except groupId 16 are zeroed.  I have attached a typescript with the debug session in case the call call stack or contents of variables are helpful.

Without knowing how it is supposed to work, it looks like maybe m_groupsTable and m_ratesTable are not being initialized?  If you would like, I can log this as a bug in bugzilla.

When I run my modified 80211e-txop script on ns-3.26, it runs through to the end of simulation time, prints the BE throughput for station A, then exits with code 1 (and does not print a log error message).  The same happens on ns-3-dev if I change the RemoteStationManager from ns3::MinstrelHtWifiManager back to ns3::IdealWifiManager.

Thanks,
Bill.
80211e-txop.cc
80211e-txop-w-ac-minstrel
80211e-txop-typescript2

Sebastien Deronne

unread,
Jan 14, 2017, 2:44:05 AM1/14/17
to ns-3-users
There are still some open bugs related to TXOP in our tracker, make sure you apply the proposed patches for those bugs on top of the latest ns-3-dev.
If the problem persists and is pointing to Minstrel HT or Ideal, I advise you to contact Matias Richart.

william....@intel.com

unread,
Jan 18, 2017, 12:54:02 PM1/18/17
to ns-3-users
Hi,

I applied the patches for 2604 "QosData frames with Block Ack policy should be separated by SIFS...", 2615 "TXOP Fragmentation", and 2367 "BlockAckManager does not remove iterator to freed items".  With these patches, I get a crash in ns3::MacLow::BlockAckTimeout when it tries to index into m_aggregateQueue.  Basically, the tid is 8, which is also the length of the m_aggregateQueue array, so it is indexing off the end of the array.

8 is a valid tid, if you are using HCCA.  I am guessing this is a bug, because I did not do anything to turn on HCCA.  GetTid is getting the tid value of 8 from the header.  I'll keep debugging it to see if where the tid of 8 is getting put into the packet.

I did not see any other patches that looked like they would impact this issue.  If there are more that I should apply, please let me know which ones.

Thanks,
Bill.

sebastie...@gmail.com

unread,
Jan 18, 2017, 12:57:57 PM1/18/17
to ns-3-...@googlegroups.com
I do not understand, HCCA is not supported, so how can you turn it on??

--
Posting to this group should follow these guidelines https://www.nsnam.org/wiki/Ns-3-users-guidelines-for-posting
---
You received this message because you are subscribed to the Google Groups "ns-3-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ns-3-users+...@googlegroups.com.
To post to this group, send email to ns-3-...@googlegroups.com.
Visit this group at https://groups.google.com/group/ns-3-users.
For more options, visit https://groups.google.com/d/optout.

william....@intel.com

unread,
Jan 18, 2017, 1:02:58 PM1/18/17
to ns-3-users
Sorry, I did not know whether it was supported or not.  I just meant to say, that if it is supported, I have not done anything to enable it.  8 is a valid tid if HCCA is in use, and I did not know if it was supported or not.

Knowing now that HCCA is definitely not supported, I am guessing that whatever set the tid to 8 walked through its array of  aggregate queues, didn't find what it was looking for, and set the tid to 8, because it went off the end.  I am going to see if I can find where that happens and try to catch it in the act...

Bill.

william....@intel.com

unread,
Jan 18, 2017, 3:07:50 PM1/18/17
to ns-3-users
FWIW, I cannot find anyplace where a TID of 8 gets put into a packet header.  I put NS_ASSERTs in CtrlBAckRequestHeader::Serialize, CtrlBAckRequestHeader::Deserialize, CtrlBAckResponseHeader::Serialize, and CtrlBAckResponseHeader::Deserialize to trigger when a tid >= 8.

An assert is triggering on CtrlBAckResponseHeader::Deserialize, but not for any of the Serialize methods.  Not sure how the TID could be getting deserialized out of a packet without first being serialized into one.  Maybe it is somehow being corrupted?  I'll keep looking.

Bill.

sebastie...@gmail.com

unread,
Jan 18, 2017, 3:12:43 PM1/18/17
to ns-3-...@googlegroups.com
I once hit the same issue that I fixed but I do not remember exactly how, I guess it was reading an unexisting header or that it was reading the wrong header.
Are you well using the latest ns-3-dev?

william....@intel.com

unread,
Jan 18, 2017, 3:24:29 PM1/18/17
to ns-3-users
I am running with ns-3-dev, pulled yesterday.  The latest commit from ns-3-dev is dbc1b3b501b1.  On top of that I have the patches for 2604, 2615, and 2367.  I started without the patch for 2367, then added it later and got the same error either way.

The header it is trying to use is MacLow::m_currentHdr in MacLow::BlockActTimeout.

Bill.

sebastie...@gmail.com

unread,
Jan 18, 2017, 3:27:03 PM1/18/17
to ns-3-...@googlegroups.com
How can I exactly reproduce the issue at my side? (I see 3 attached scripts)

william....@intel.com

unread,
Jan 18, 2017, 3:39:08 PM1/18/17
to ns-3-users
I've done a few things to the original script.  Let me clean it up and I'll attach an updated version.

Bill.

william....@intel.com

unread,
Jan 18, 2017, 6:04:47 PM1/18/17
to ns-3-users
I am building with patches from Bugs 2367, 2569, 2586, 2615, and 2604.  For this particular bug, For these tests, 2569 never gets used (if I set breakpoints in Address::Serialize and Address::Deseriailize they never get hit).  2569 is just there for other work I am doing.  The attached local-changes.patch are in my current working set, and wmm-tcp.cc is the script I am primarily working with.  You can reproduce what I am seeing with

waf --run "wmm-tcp --nBeStreams=0 --nBkStreams=0 --nViStreams=1 --nVoStreams=0"

wmm-tcp will run for a while before you hit the NS_ASSERT failure.  It is maybe 30 seconds or so for me, but probably depends how fast your CPU is.

Basically, local-changes.patch includes changes to examples/80211e-txop.cc to show the original bug with MinstrelHT.  That example fails due to the rateId problem described in the original post, both on the original ns-3-dev, and with all of the patches listed above.  I have commented out the exit(1) calls at the end of the script.  They were bailing out when throughput is out of range, which it always is, because I changed from 802.11a to 802.11ac.  The changes to minstrel-ht-wifi-manager.cc are just added asserts to show that rateId is out of range when it fails.

You can see that problem with:

waf --run 80211e-txop

I have stopped using 80211e-txop.cc as a test case, because I don't need to have MinstrelHT for what I am doing right now, and with IdealWifiManager it works fine.  In wmm-tcp.cc I have switched to ns3::IdealWifiManager to work around the MinstrelHT problem.

My remaining changes in local-changes.patch are NS_ASSERTs added to ctrl-headers.cc, mac-low.cc, and qos-utils.cc to catch out of range tid values.  I have only ever seen an out of range tid in ns3::CtrlBAckResponseHeader::SetBaControl, when it is called by ns3::CtrlBAckResponseHeader::Deserialize after the test runs for 30 seconds or so (i.e., it works correctly for quite a while).

Thanks,
Bill.
wmm-tcp.cc
local-changes.patch

william....@intel.com

unread,
Jan 18, 2017, 9:01:47 PM1/18/17
to ns-3-users
FWIW, if you run

    waf --run wmm-tcp --command-template="gdb --args %s --nBeStreams=0 --nBkStreams=0 --nViStreams=1 --nVoStreams=0"

then set a breakpoint in ns3::MacLow::BlockAckTimeout and condition that breakpoint on `m_currentHdr.m_ctrlType == 1 && m_currentHdr.m_ctrlSubtype == 9`.  Then, when you run, gdb will break in ns3:MacLow::BlockAckTimeout the time that it fails.  Without the condition, m_currentHdr.m_ctrlType is always 1 or 2, and m_currentHdr.m_ctrlSubtype is always 8.

That is, in the failing case, m_currentHdr thinks the last packet we sent was a BlockAckResponse and we are getting a BlockAckTimeout.

I suppose something odd might be going on if we get a BlockAckTimeout after responding to a BlockAck.

Bill.

Sebastien Deronne

unread,
Jan 20, 2017, 1:20:32 PM1/20/17
to ns-3-users
Are those two different issues? Please open one thread per issue, it is easier to follow.

Also, did you try first without apply any patch?

I get this assert when I run it with the latest ns-3-dev:
assert failed. cond="uid <= m_information.size () && uid != 0", file=../src/core/model/type-id.cc, line=447

william....@intel.com

unread,
Jan 20, 2017, 2:27:36 PM1/20/17
to ns-3-users
On Friday, January 20, 2017 at 10:20:32 AM UTC-8, Sebastien Deronne wrote:
Are those two different issues? Please open one thread per issue, it is easier to follow.
 
Yes.  They are two separate issues.  The original one with 80211e-txop and MinstrelHT exists both with and without patches.  I will open a second thread for the wmm-tcp.cc issue (without MinstrelHT).
 
Also, did you try first without apply any patch?

I get this assert when I run it with the latest ns-3-dev:
assert failed. cond="uid <= m_information.size () && uid != 0", file=../src/core/model/type-id.cc, line=447

 If you got that with wmm-tcp.cc, it is because you need the patch for 2586.  The first call to MobilityHelper::SetPositionAllocator passes in a SequentialRandomVariableStream with an Increment parameter set to a ConstantRandomVariable stream.  When I comment out the code that sets the "Theta" attribute on the position allocator, I get the same tid problem regardless of whether it is ns-3-dev or with my patches.  On plain ns-3-dev, the problem will manifest itself as a SIGSEGV in ns3::WifiMacQueue::GetSize because of an invalid `this` pointer (this == 0).  If you stop there in gdb and go up one call frame, you will see that tid == 8.

I will start a new thread for this issue.  If you want either issue logged in Bugzilla, I will create bug reports there, too.

Thanks,
Bill.

Sebastien Deronne

unread,
Jan 21, 2017, 4:36:40 AM1/21/17
to ns-3-users
New bug opened in the tracker as well:

I assigned Matias Richart to this bug since he is much more experienced with wifi rate managers.
Reply all
Reply to author
Forward
0 new messages