SIGSEGV in TCP multiple flow with DCE

132 views
Skip to first unread message

Esteban . Coma

unread,
May 18, 2015, 12:46:04 PM5/18/15
to ns-3-...@googlegroups.com
Hello everyone, 

I'm trying to create a simulation with 3 different networks (left, central, right) and a variable number of nodes on the left side of the network, like this.

C1  ----|
C2  ----|
            |-- R1 ----- R2 ---- Server
....   ----| 
Cn  ----|


Network with all clients <---> R1 is a CSMA with DataRate 1Gbps / Delay 1ms
Network R2 <---> Server is the same as left network
Central network ( R1 <---> R2)  acts as a slow link having DataRate  200Mbps /  Delay  98ms

An iperf server would be running in the Server, and all C1 to Cn would be sending traffic to the server to create congestion.
The C1 sends a file size of = 512K, and the rest of the Clients send background traffic of size = 2048K.

The idea is perform sets of simulations changing the number of nodes that send traffic from the left side starting from 2, 4, 6 and 8 nodes.

I am currently using DCE 1.5 and ns-3.22 and installing the Linux TCP stack in all nodes. The problem is, regardless the linux kernel version I am using (I am performing simulations in every stable kernel branch supported by DCE (2.6, 3.7, 3.10, 3.12). 

When I run simulations say for a  fixed kernel version: 2.6.36, I get SIGSEGV in a completely random way, and sometimes SIGBUS. 

I've traced and modified all parameters that I could think in the simulation,  and debugged the script, but the segmentation fault happens once the simulation
has already started after Simulator::Run(), and never when I run it with 2 nodes, but sometimes with 4 nodes, 6, 8, etc.. with no apparent fixed sequence.

Gdb states that the segmentation fault happens when trying to access a member of a list but inside a linux function. I can only think of any issue regarding putting through too much memory pressure, but it completely escapes my understanding so far.

I also attach my script in case someone checks something I could have missed. 
Please, anything would be of help, thank you.


sim3_multi_flow.cc

Esteban . Coma

unread,
May 18, 2015, 1:47:01 PM5/18/15
to ns-3-...@googlegroups.com
[EDIT]: I have been able to reproduce once again the sigsegv, happens in the once the simulator has already been some time executing.


Program received signal SIGSEGV, Segmentation fault.
[Changing  to Thread 0x7fffef87e700 (LWP 14187)]
__list_del (next=0x7fffee7002c0, prev=0x200200) at /home/n3w/gits/project_3.7.0/source/net-next-sim-3.7.0/include/linux/list.h:89
89 prev->next = next;


I have no idea why this can be happening. Thank you.

Eneko Atxutegi Narbona

unread,
Jun 25, 2015, 2:37:37 AM6/25/15
to ns-3-...@googlegroups.com
Hi Esteban!

You posted this problem long time ago and I wonder whether you resolved the issue or not. I'm having the same segmentation fault. My topology is also more or less the same and as you explained, the segmentation occur once the simulator has been running for some time. It will be very helpful for me if someone could give me a hint in this regard.

Thanks in advance.

Eneko


El lunes, 18 de mayo de 2015, 18:46:04 (UTC+2), Esteban . Coma escribió:

Hajime Tazaki

unread,
Jun 25, 2015, 3:51:19 AM6/25/15
to ns-3-...@googlegroups.com

Hi Esteban,

I guess this is the issue I fixed in the development branch
(sim-ns3-dev-branch).

https://github.com/direct-code-execution/net-next-sim/commit/793e2d6a3069aad93e40f362038fca0d67515fd7

you cannot apply the above patch to any other branch
(2.6.36-3.14) because the internal API and directory
structure are totally changed.

so I prepared the hotfix to all the branches. please try to
'git pull' at net-next-sim(-2.6.36), build it, and replace
liblinux.so: then see whether this fix the issue or not.

# I can't run your script because I don't have ../include/util.h.

-- Hajime

At Mon, 18 May 2015 10:47:01 -0700 (PDT),
Esteban . Coma wrote:
>
> [1 <multipart/alternative (7bit)>]
> [1.1 <text/plain; UTF-8 (quoted-printable)>]
> [EDIT]: I have been able to reproduce once again the sigsegv, happens in
> the once the simulator has already been some time executing.
>
>
> Program received signal SIGSEGV, Segmentation fault.
> [Changing to Thread 0x7fffef87e700 (LWP 14187)]
> __list_del (next=0x7fffee7002c0, prev=0x200200) at
> /home/n3w/gits/project_3.7.0/source/net-next-sim-3.7.0/include/linux/list.h:89
> 89 prev->next = next;
>
>
> I have no idea why this can be happening. Thank you.
>
>
> El lunes, 18 de mayo de 2015, 18:46:04 (UTC+2), Esteban . Coma escribió:
> >
> > Hello everyone,
> >
> > I'm trying to create a simulation with 3 different networks (left,
> > central, right) and a variable number of nodes on the left side of the
> > network, like this.
> >
> > C1 ----|
> > C2 ----|
> > |-- R1 ----- R2 ---- Server
> > .... ----|
> > Cn ----|
> >
> >
> > Network with all clients <---> R1 is a CSMA with DataRate *1Gbps */ Delay
> > *1ms*
> > Network R2 <---> Server is the same as left network
> > Central network ( R1 <---> R2) acts as a slow link having DataRate * 200Mbps
> > / * Delay * 98ms*
> >
> > An iperf server would be running in the Server, and all C1 to Cn would be
> > sending traffic to the server to create congestion.
> > The C1 sends a file size of = *512K*, and the rest of the Clients send
> > background traffic of size = *2048K.*
> >
> > The idea is perform sets of simulations changing the number of nodes that
> > send traffic from the left side starting from 2, 4, 6 and 8 nodes.
> >
> > I am currently using DCE 1.5 and ns-3.22 and installing the Linux TCP
> > stack in all nodes. The problem is, regardless the linux kernel version I
> > am using (I am performing simulations in every stable kernel branch
> > supported by DCE (*2.6*, *3.7*, *3.10*, *3.12*).
> >
> > When I run simulations say for a fixed kernel version: 2.6.36, I get
> > SIGSEGV in a completely random way, and sometimes SIGBUS.
> >
> > I've traced and modified all parameters that I could think in the
> > simulation, and debugged the script, but the segmentation fault happens
> > once the simulation
> > has already started after Simulator::Run(), and never when I run it with 2
> > nodes, but sometimes with 4 nodes, 6, 8, etc.. with no apparent fixed
> > sequence.
> >
> > Gdb states that the segmentation fault happens when trying to access a
> > member of a list but inside a linux function. I can only think of any issue
> > regarding putting through too much memory pressure, but it completely
> > escapes my understanding so far.
> >
> > I also attach my script in case someone checks something I could have
> > missed.
> > Please, anything would be of help, thank you.
> >
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups "ns-3-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ns-3-users+...@googlegroups.com.
> To post to this group, send email to ns-3-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/ns-3-users.
> For more options, visit https://groups.google.com/d/optout.
> [1.2 <text/html; UTF-8 (quoted-printable)>]
>

Eneko Atxutegi Narbona

unread,
Jun 25, 2015, 9:17:06 AM6/25/15
to ns-3-...@googlegroups.com
Hi Hajime!

Your updated 3.14 version works perfectly. I appreciate you quick reply, work and dedication. Thank you very much.

Cheers,

Eneko

Eneko Atxutegi Narbona

unread,
Oct 30, 2015, 4:51:38 AM10/30/15
to ns-3-users
Hi Hajime!

With the changes you made to enable TCP multiflows in DCE, 4 or 5 flows are perfectly supported throughout a long time. However, adding more nodes and flows (e.g 20 nodes and each of them handling a flow), we reach to the same SIGSEGV. I don't know whether this is solvable or not, but ovbiously you are the appropiate one to check with. If there is a solution, it would be great, otherwise I will adapt my model to DCE constraints. I attach the backtrace of the fault in case it is useful:


Program received signal SIGSEGV, Segmentation fault.
__list_del (next=0x7ffff1108f30 <g_pending_events>, prev=0x74737461)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/include/linux/list.h:89

89        prev->next = next;
(gdb) bt
#0  __list_del (next=0x7ffff1108f30 <g_pending_events>, prev=0x74737461)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/include/linux/list.h:89
#1  list_del (entry=0x7ffff1114ab8 <tcp_death_row+376>)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/include/linux/list.h:106
#2  del_timer (timer=timer@entry=0x7ffff1114ab8 <tcp_death_row+376>) at arch/sim/timer.c:139
#3  0x00007ffff0b81f9e in mod_timer (timer=timer@entry=0x7ffff1114ab8 <tcp_death_row+376>, expires=6455) at arch/sim/timer.c:204
#4  0x00007ffff0bf386c in inet_twsk_schedule (tw=tw@entry=0x7fffe118e810, twdr=twdr@entry=0x7ffff1114940 <tcp_death_row>,
    timeo=<optimized out>, timewait_len=timewait_len@entry=15000)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/inet_timewait_sock.c:414
#5  0x00007ffff0c0da7a in tcp_time_wait (sk=0x7fffe11b3a70, state=<optimized out>, timeo=<optimized out>)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/tcp_minisocks.c:339
#6  0x00007ffff0c02a69 in tcp_rcv_state_process (sk=sk@entry=0x7fffe11b3a70, skb=skb@entry=0x7fffe1122d10, th=0x7fffe12dfa52,
    len=<optimized out>) at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/tcp_input.c:5848
#7  0x00007ffff0c09f8a in tcp_v4_do_rcv (sk=sk@entry=0x7fffe11b3a70, skb=skb@entry=0x7fffe1122d10)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/tcp_ipv4.c:1825
#8  0x00007ffff0c0ca51 in tcp_v4_rcv (skb=0x7fffe1122d10)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/tcp_ipv4.c:2012
#9  0x00007ffff0be9d76 in ip_local_deliver_finish (skb=0x7fffe1122d10)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/ipv4/ip_input.c:216
#10 0x00007ffff0bc499e in __netif_receive_skb_core (skb=0x7fffe1122d10, pfmemalloc=<optimized out>)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/core/dev.c:3645
#11 0x00007ffff0bc4ee4 in process_backlog (napi=0x7ffff1111d30 <softnet_data+112>, quota=2)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/core/dev.c:4169
#12 0x00007ffff0bc517a in net_rx_action (h=<optimized out>)
    at /home/eneko/workspaceUpdated/bake/source/net-next-sim-3.14.0/net/core/dev.c:4375
#13 0x00007ffff0b80ef3 in do_softirq () at arch/sim/softirq.c:65
#14 0x00007ffff0b80f58 in softirq_task_function (context=<optimized out>) at arch/sim/softirq.c:21
#15 0x00007ffff7acdc08 in ns3::TaskManager::Trampoline (context=0x9f5ba0) at ../model/task-manager.cc:274
#16 0x00007ffff7ac7559 in ns3::UcontextFiberManager::Trampoline (a0=32767, a1=-139666492, a2=0, a3=10443680)
    at ../model/ucontext-fiber-manager.cc:199
#17 0x00007ffff13ea8b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb)

Thanks in advance,

Eneko
Reply all
Reply to author
Forward
0 new messages