Hi all,
I'm running a wifi mesh setup with OLSR routing and 802.11e aggregation feature enabled. After a while of running the setup, the simulation crashes with:
"assert failed. cond="IsStateOk ()", file=../src/common/packet-metadata.cc, line=888"
I've looked into it, and got this gdb trace:
Program received signal SIGABRT, Aborted.
0x00007ffff4ac4425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff4ac4425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff4ac7b8b in __GI_abort () at abort.c:91
#2 0x00007ffff53bfe2d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff53bdf26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff53bdf53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff6cfcf5b in ns3::PacketMetadata::AddAtEnd (this=0x9e4820, o=...) at ../src/common/packet-metadata.cc:888
#6 0x00007ffff6d20d22 in ns3::Packet::AddAtEnd (this=0x9e47e0, packet=...) at ../src/common/packet.cc:329
#7 0x00007ffff72b4936 in ns3::MsduStandardAggregator::Aggregate (this=0x9ef3e0, packet=..., aggregatedPacket=..., src=..., dest=...) at ../src/devices/wifi/msdu-standard-aggregator.cc:78
#8 0x00007ffff72a3f39 in ns3::EdcaTxopN::NotifyAccessGranted (this=0x7427b0) at ../src/devices/wifi/edca-txop-n.cc:590
#9 0x00007ffff72ae83a in ns3::EdcaTxopN::Dcf::DoNotifyAccessGranted (this=0x742950) at ../src/devices/wifi/edca-txop-n.cc:111
#10 0x00007ffff722e783 in ns3::DcfState::NotifyAccessGranted (this=0x742950) at ../src/devices/wifi/dcf-manager.cc:145
#11 0x00007ffff722fb4c in ns3::DcfManager::DoGrantAccess (this=0x7422f0) at ../src/devices/wifi/dcf-manager.cc:434
#12 0x00007ffff722fc7a in ns3::DcfManager::AccessTimeout (this=0x7422f0) at ../src/devices/wifi/dcf-manager.cc:450
#13 0x00007ffff72325ee in ns3::EventMemberImpl0::Notify (this=0x949860) at debug/ns3/make-event.h:94
#14 0x00007ffff6c7b18c in ns3::EventImpl::Invoke (this=0x949860) at ../src/simulator/event-impl.cc:37
#15 0x00007ffff6c982d6 in ns3::DefaultSimulatorImpl::ProcessOneEvent (this=0x6dd4c0) at ../src/simulator/default-simulator-impl.cc:128
#16 0x00007ffff6c98486 in ns3::DefaultSimulatorImpl::Run (this=0x6dd4c0) at ../src/simulator/default-simulator-impl.cc:158
#17 0x00007ffff6c7bc6d in ns3::Simulator::Run () at ../src/simulator/simulator.cc:173
#18 0x000000000042d3dd in main (argc=9, argv=0x7fffffffe3e8) at ../scratch/fom.cc:1680
So when the aggregator is called the simulation crashes, but after many successful aggregations.
In the packet-metadata.h file there is this statement:
"
* This linked list is flattened in a byte buffer stored in
* struct PacketMetadata::Data. Each entry of the linked list is
* identified by an offset which identifies the first byte of the
* entry from the start of the data buffer. The size of this data
* buffer is 2^16-1 bytes maximum which somewhat limits the number
* of entries which can be stored in this linked list but it is
* quite unlikely to hit this limit in practice. "
Question: when do you thing is possible that 2^16-1 can be overflown?
I have logged the value of the m_used and m_size variables of the PacketMetadata class. It seams that at some point everything goes wrong and their values increase so much that they overflow the max limit.
Here are the last lines of log before the crash happened:
68.8601 OK: Line136 - m_used: 65042 m_size: 65042
68.8601 OK: Line136 - m_used: 65131 m_size: 65131
68.86:n:17:napi:3
68.8601 OK: Line136 - m_used: 65033 m_size: 65033
68.8601 OK: Line136 - m_used: 65042 m_size: 65131
68.8601 OK: Line136 - m_used: 65131 m_size: 65131
68.8601 OK: Line136 - m_used: 65220 m_size: 65220
68.86:n:17:napi:4
68.8601 OK: Line136 - m_used: 65033 m_size: 65033
68.8601 OK: Line136 - m_used: 65042 m_size: 65220
68.8601 OK: Line136 - m_used: 65220 m_size: 65220
68.8601 OK: Line136 - m_used: 65309 m_size: 65309
68.86:n:17:napi:5
68.8601 OK: Line136 - m_used: 65033 m_size: 65033
68.8601 OK: Line136 - m_used: 65042 m_size: 65309
68.8601 OK: Line136 - m_used: 65309 m_size: 65309
68.8601 OK: Line136 - m_used: 65398 m_size: 65398
68.86:n:17:napi:6
68.8601 OK: Line136 - m_used: 65033 m_size: 65033
68.8601 OK: Line136 - m_used: 65042 m_size: 65398
68.8601 OK: Line136 - m_used: 65398 m_size: 65398
68.8601 OK: Line136 - m_used: 65487 m_size: 65487
68.86:n:17:napi:7
68.8601 OK: Line136 - m_used: 65033 m_size: 65033
68.8601 OK: Line136 - m_used: 65042 m_size: 65487
68.8601 OK: Line136 - m_used: 65487 m_size: 65487
68.8601 NotOK: Line136 - m_used: 40 m_size: 11
assert failed. cond="IsStateOk ()", file=../src/common/packet-metadata.cc, line=897
terminate called without an active exception
Command ['/home/cristi/workspace/cristi_fom/build/debug/scratch/fom', '--codec=AMR', '--gridstep=125m', '--olsrType=auto', '--enable_dm=1', '--enable_aggr=1', '--enable_cac=0', '--cacDecisionBase=NA', '--seed=2'] terminated with signal SIGIOT. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").
You can see that the variables increased until overflow happened; "napi" stands for number of intermediary aggregated packets, meaning how many packets were aggregated in the current aggregatedPacket, but the aggregation is not over as there are more packets in the queue that need to be aggregated (aggregation feature implemented in the edca-txop-n.cc).
Attached are some plots of the value of the two variables over time which may give you a hint on why does the overflow happen.
However, when the aggregation feature is not enabled, nothing happens.
I'm using ns3.10. With and without the patch of the bug 1072 this happens. Also with the packet-metadata.{cc,h} from ns3.16 this still happens.
Any clue why this is happening?
br,
Cristian
On Tuesday, March 15, 2011 8:28:27 AM UTC, Mathieu Lacage wrote: