SCTP COMM LOST - sctp channel terminates during extended sessions

723 views
Skip to first unread message

Faraz Khan

unread,
Jul 27, 2014, 3:18:19 AM7/27/14
to discuss...@googlegroups.com
Using C++ Native webrtc libs, some of our users have complained about their SCTP connections randomly terminating (while the RTP data channels stay okay). The errors obtained point to usrsctplib terminating the connection on the sender side. The webrtc logs state the following.Anything prefix with webrtc: is coming through webrtc directly. Note that SCTP channels work perfectly for a long time before getting into this state. Is there any debugging I could do?

We are using webrtc rev 6633 (chrome stable) and usrsctplib 8192 (also chrome stable). The data being carried over by SCTP is usually a bunch o small messages (< 1-2Kb). Ofcourse I am unable to reproduce this on my own. Is there anything we can do to figure out what the problem might be?


Jul 26, 2014, 8:34:03 PM: webrtc:Destroying NSS identity
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:05 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:06 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:06 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:06 PM: Report. outbitavg:4492.327586 toutbitrateavg: nan, kf: 0, sli: 0, fravg: 4.913793, qpavg: 4.048624, pktlossavg: 0.000000, rttavg: 7.250000
Jul 26, 2014, 8:34:07 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:07 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:07 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:07 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:07 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:08 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:10 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:13 PM: webrtc:Destroying NSS identity
Jul 26, 2014, 8:34:13 PM: webrtc:Destroying NSS identity
Jul 26, 2014, 8:34:14 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:20 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:20 PM: Outbound Keyframe!
Jul 26, 2014, 8:34:21 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:22 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:22 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:22 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:22 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:22 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:23 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:23 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:23 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:23 PM: webrtc:Destroying NSS identity
Jul 26, 2014, 8:34:23 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:24 PM: webrtc:Destroying NSS identity
Jul 26, 2014, 8:34:24 PM: webrtc:Error(sctpdataengine.cc:883): SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP: Gak, chk->snd_count:30 >= max:30 - send abort
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:SCTP_SEND_FAILED_EVENT
Jul 26, 2014, 8:34:25 PM: webrtc:Association change SCTP_COMM_LOST
Jul 26, 2014, 8:34:25 PM: webrtc:Error(sctpdataengine.cc:561): ERROR:SctpDataMediaChannel->SendData(...):  usrsctp_sendv:  : [0x00000002] No such file or directory
Jul 26, 2014, 8:34:25 PM: webrtc:Error(datachannel.cc:491): Closing the DataChannel due to a failure to send data, send_result = 1
Jul 26, 2014, 8:34:25 PM: sctp:Datachannel state changed : 2
Jul 26, 2014, 8:34:25 PM: webrtc:Error(sctpdataengine.cc:919): SctpDataMediaChannelFailed to send a stream reset for 1 streams : [0x00000002] No such file or directory
Jul 26, 2014, 8:34:25 PM: sctp:Datachannel state changed : 3
Jul 26, 2014, 8:34:25 PM: Got connection closed on channel: 2, timeout :0
Jul 26, 2014, 8:34:25 PM: Error Writing
Jul 26, 2014, 8:34:25 PM: Error Writing
Jul 26, 2014, 8:34:25 PM: Error Writing



Jiayang Liu

unread,
Jul 28, 2014, 1:42:13 PM7/28/14
to discuss...@googlegroups.com, ldi...@webrtc.org, Michael Tuexen
Adding Lucas and Michael for help.

Faraz Khan

unread,
Jul 28, 2014, 1:52:48 PM7/28/14
to discuss...@googlegroups.com, ldi...@webrtc.org, tue...@fh-muenster.de
Thanks Jiayang! Also - on the remote party's side the error is even weirder, seems like the SCTP socket just died without any debug logs:

webrtc:Association change SCTP_COMM_LOST
Jul 28, 2014, 3:27:55 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:01 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:05 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:11 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:15 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:21 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:22 AM: AppSharingApplication[setKeyLogActive:]
Jul 28, 2014, 3:30:22 AM: AppSharingApplication[startupKeyListener]
Jul 28, 2014, 3:30:22 AM: webrtc:Error(sctpdataengine.cc:561): ERROR:SctpDataMediaChannel->SendData(...):  usrsctp_sendv:  : [0x00000002] No such file or directory
Jul 28, 2014, 3:30:22 AM: webrtc:Error(datachannel.cc:491): Closing the DataChannel due to a failure to send data, send_result = 1
Jul 28, 2014, 3:30:22 AM: sctp:Datachannel state changed : 2
Jul 28, 2014, 3:30:22 AM: webrtc:Error(sctpdataengine.cc:919): SctpDataMediaChannelFailed to send a stream reset for 1 streams : [0x00000002] No such file or directory
Jul 28, 2014, 3:30:22 AM: sctp:Datachannel state changed : 3
Jul 28, 2014, 3:30:22 AM: sctp:Data channel closing, buffered amount is :0
Jul 28, 2014, 3:30:22 AM: Got connection closed on channel: 2, timeout :0
Jul 28, 2014, 3:30:22 AM: Error Writing
Jul 28, 2014, 3:30:22 AM: Error Writing

Lucas Dixon

unread,
Jul 28, 2014, 4:37:37 PM7/28/14
to Faraz Khan, discuss...@googlegroups.com, Michael Tuexen, Lally Singh
+Lally 

Might be worth checking if the problem happens on Canary as that has a much newer version of SCTP... without a repeatable test it's tricky to figure out... I'll try and reproduce it. 
--
Lucas Dixon | Google Ideas

Lally Singh

unread,
Jul 28, 2014, 4:53:50 PM7/28/14
to Lucas Dixon, Faraz Khan, discuss...@googlegroups.com, Michael Tuexen
The 'Gak, chk->snd_count:30 >= max:30 - send abort' message is from sctp_output.c, when it sees that it tried for 30 times to send the message, and it failed every time.  It shuts the connection down then.  The repeated 'SCTP seems to have made a packet that is bigger than its official MTU' message seems to indicate an attempt to send more than kSctpMtu, 1280 bytes.  We may be configured to disallow fragmentation.

Together, that'd explain what's going on.  To verify, Faraz: can you in any way limit your app's sends to < 1200 bytes, just to test if that clears up your problem?  If that's it, then we know what's wrong and can work on fixing it.



Cheers,
$ ls

Lally Singh |  Google Ideas | Engineering |  la...@google.com
 

Faraz Khan

unread,
Jul 28, 2014, 5:04:15 PM7/28/14
to discuss...@googlegroups.com, ldi...@google.com, fa...@screenhero.com, tue...@fh-muenster.de
Lally,
I guessed the same so assuming that sctp fragmentation was sometimes broken sent out a version to the affected users in which our app limits any message over 1150 bytes. They havent reported back with anything yet so I dont know if thats the fix, if it is I'll let you guys know.

Lucas - In case they report back saying that the above did not fix their issue, I'll get the usrsctplib revision that canary is using and try sending them a build with that to check. I'll report back with both peices of information.

Really appreciate the response on this guys! 

Michael Tüxen

unread,
Jul 29, 2014, 4:55:31 AM7/29/14
to discuss...@googlegroups.com
I guess that SCTP want to send a packet which is larger than the PMTU. If the lower layer when reporting
SCTP seems to have made a packet that is bigger than its official MTU
drops the packet, SCTP will retry it several times and finally will figure out that the DATA chunk
contained in that packet hasn't made it to the peer after 30 retries. It then sends an ABORT since
it considers the association broken. All buffered user data is given back to the upper layer.

Has SCTP be configured with a save MTU? This needs to be done by the upper layer.
Did the lower layer experienced a drop in the PMTU? Did this change get propagated to
SCTP? I don't think we have a clear API for it, since I'm not sure if the lower layer needs it.
And does the lower layer allow sending messages larger than the PMTU just falling back to
IP fragmentation? This is required. If the PMTU drops and you notify SCTP, it will fragment
new massages based on the lower limit, but it can't re-fragment messages which have already
been fragmented. See third paragraph of
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-dtls-encaps-05#section-5

Best regards
Michael

Michael Tüxen

unread,
Jul 29, 2014, 4:57:54 AM7/29/14
to discuss...@googlegroups.com, ldi...@webrtc.org, tue...@fh-muenster.de


On Monday, July 28, 2014 7:52:48 PM UTC+2, Faraz Khan wrote:
Thanks Jiayang! Also - on the remote party's side the error is even weirder, seems like the SCTP socket just died without any debug logs:

webrtc:Association change SCTP_COMM_LOST
Jul 28, 2014, 3:27:55 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:01 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:05 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:11 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:15 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:21 AM: webrtc:Destroying NSS identity
Jul 28, 2014, 3:30:22 AM: AppSharingApplication[setKeyLogActive:]
Jul 28, 2014, 3:30:22 AM: AppSharingApplication[startupKeyListener]
Jul 28, 2014, 3:30:22 AM: webrtc:Error(sctpdataengine.cc:561): ERROR:SctpDataMediaChannel->SendData(...):  usrsctp_sendv:  : [0x00000002] No such file or directory
Jul 28, 2014, 3:30:22 AM: webrtc:Error(datachannel.cc:491): Closing the DataChannel due to a failure to send data, send_result = 1
Jul 28, 2014, 3:30:22 AM: sctp:Datachannel state changed : 2
Jul 28, 2014, 3:30:22 AM: webrtc:Error(sctpdataengine.cc:919): SctpDataMediaChannelFailed to send a stream reset for 1 streams : [0x00000002] No such file or directory
Jul 28, 2014, 3:30:22 AM: sctp:Datachannel state changed : 3
Jul 28, 2014, 3:30:22 AM: sctp:Data channel closing, buffered amount is :0
Jul 28, 2014, 3:30:22 AM: Got connection closed on channel: 2, timeout :0
Jul 28, 2014, 3:30:22 AM: Error Writing
Jul 28, 2014, 3:30:22 AM: Error Writing
This makes perfectly makes sense... The peer has detected that the association is broken
(since the lower layer drops packets larger than the PMTU, I guess) and has sent an
ABORT chunk. This results in the COMM_LOST notification. After that you can't send anymore.

Best regards
Michael
 

Michael Tüxen

unread,
Jul 29, 2014, 5:01:14 AM7/29/14
to discuss...@googlegroups.com, ldi...@google.com, fa...@screenhero.com, tue...@fh-muenster.de


On Monday, July 28, 2014 10:53:50 PM UTC+2, Lally Singh wrote:
The 'Gak, chk->snd_count:30 >= max:30 - send abort' message is from sctp_output.c, when it sees that it tried for 30 times to send the message, and it failed every time.  It shuts the connection down then.  The repeated 'SCTP seems to have made a packet that is bigger than its official MTU' message seems to indicate an attempt to send more than kSctpMtu, 1280 bytes.  We may be configured to disallow fragmentation.

Together, that'd explain what's going on.  To verify, Faraz: can you in any way limit your app's sends to < 1200 bytes, just to test if that clears up your problem?  If that's it, then we know what's wrong and can work on fixing it.
The app should be allowed to send whatever it wants. The layer on top of SCTP needs to specify a save value of the PMTU.
According to the IDs, this should be 1200 byte minus IP, UDP and DTLS header size. Does this code exist?

Best regards
Michael

Michael Tüxen

unread,
Jul 29, 2014, 5:05:06 AM7/29/14
to discuss...@googlegroups.com, ldi...@google.com, fa...@screenhero.com, tue...@fh-muenster.de


On Monday, July 28, 2014 11:04:15 PM UTC+2, Faraz Khan wrote:
Lally,
I guessed the same so assuming that sctp fragmentation was sometimes broken sent out a version to the affected users in which our app limits any message over 1150 bytes. They havent reported back with anything yet so I dont know if thats the fix, if it is I'll let you guys know.
This indicates that the JS layer (or whatever is on top of SCTP) doesn't configure the PMTU correctly. Which value is used there?
You set this via the socket option SCTP_PEER_ADDR_PARAMS. See
http://tools.ietf.org/html/rfc6458#section-8.1.12
Please note that you can use a wildcard address...

Best regards
Michael

Jiayang Liu

unread,
Jul 29, 2014, 12:25:37 PM7/29/14
to discuss-webrtc, Lally Singh, Lucas Dixon, fa...@screenhero.com, Michael Tuexen
Lally/Lucas,

Why does SctpDataMediaChannel rejects packet larger than kSctpMtu? Is that necessary? I don't think the webrtc transport layer requires that.


--

---
You received this message because you are subscribed to a topic in the Google Groups "discuss-webrtc" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/discuss-webrtc/XIdVkoS4-Cs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to discuss-webrt...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lally Singh

unread,
Jul 29, 2014, 12:52:48 PM7/29/14
to Jiayang Liu, discuss-webrtc, Lucas Dixon, fa...@screenhero.com, Michael Tuexen
AFAICT, we only log it in sctpdataengine.  But (if the theory's right) we've set SCTP to not fragment, and the packet's bigger than the actual MTU (which may be close to kSctpMtu, or not).  So it gets dropped in usrsctp.

Michael Tüxen

unread,
Jul 29, 2014, 1:01:21 PM7/29/14
to discuss...@googlegroups.com, ji...@google.com, ldi...@google.com, fa...@screenhero.com, tue...@fh-muenster.de


On Tuesday, July 29, 2014 6:52:48 PM UTC+2, Lally Singh wrote:
AFAICT, we only log it in sctpdataengine.  But (if the theory's right) we've set SCTP to not fragment, and the packet's bigger than the actual MTU (which may be close to kSctpMtu, or not).  So it gets dropped in usrsctp.
Can you elaborate on that? What did you set and what do you expect?
If you set SCTP_DISABLE_FRAGMENTS, then a send call will fail for messages requiring
fragmentation. So this would no result in the messages being sent out and you wouldn't
see the error messages you saw.

Best regards
Michael

Faraz Khan

unread,
Jul 29, 2014, 1:09:38 PM7/29/14
to Michael Tüxen, discuss...@googlegroups.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de
Michael / Lally,
Just to clarify - the problem only manifests itself after 20-30 minutes of successful message passing. The connection always works perfectly in the beginning, but after a while it drops all of a sudden with the error messages above. I've tested SCTP in webrtc with messages of upto 256k and they get sent (and fragmented) correctly. So its something that happens at runtime.

Just to add: Our app sends a lot of very small messages (120-200 bytes) and some largish ones (2k and upto 10k). With the new version I've restricted anything over 1150 bytes just to test this theory.

That said I do have a dev version out there in the wild which restricts message sizes to < 1150 bytes. I'll be reporting back on that shortly.


--

--
Faraz Khan
Simple Collaboration Screensharing

Michael Tüxen

unread,
Jul 29, 2014, 1:27:08 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de


On Tuesday, July 29, 2014 7:09:38 PM UTC+2, Faraz Khan wrote:
Michael / Lally,
Just to clarify - the problem only manifests itself after 20-30 minutes of successful message passing. The connection always works perfectly in the beginning, but after a while it drops all of a sudden with the error messages above. I've tested SCTP in webrtc with messages of upto 256k and they get sent (and fragmented) correctly. So its something that happens at runtime.
It can be that the PMTU Timer fires (10 Minute Timer). This will increment the PMTU, so SCTP generate larger fragments which will
be dropped. Retransmissions will allow IP level fragmentation, but if the lower layer drops them, you end up in what you see.

So right now, you might want to disable PMTUD. At some time, I'll implement PMTUD which does not affect data transmissions,
but that is on the ToDo list but not at the top.
You also want that the lower layer does not discard packets which are larger than the PMTU.

Best regards
Michael

Faraz Khan

unread,
Jul 29, 2014, 1:52:49 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de
Michael / Lally,
I'm happy to make the change / compile / test for you guys - however I'm not too familiar with the sctp codebase.

From a cursory look - seems like I need to do something like this in sctpdataengine.cc in OpenSctpSocket:

struct sctp_paddrparams params;
params.spp_flags = SPP_PMTUD_DISABLE;

usrsctp_setsockopt(sock_, IPPROTO_SCTP, SCTP_PEER_ADDR_PARAMS , &params,
                           sizeof(params)

Something like that? Or should I get the sock opts first and then AND the MTUD_DISABLE bit?

Faraz Khan

unread,
Jul 29, 2014, 4:17:46 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de
So I added the following in sctpdataengine.cc:

    struct sctp_paddrparams params = { 0 };
    socklen_t params_length = 0;

    params.spp_flags = SPP_PMTUD_DISABLE;
    
    if (usrsctp_setsockopt(sock_, IPPROTO_SCTP, SCTP_PEER_ADDR_PARAMS, &params, sizeof(params)))
    {
        LOG_ERRNO(LS_ERROR) << debug_name_ << "Failed to set SCTP_PEER_ADDR_PARAMS:";
        return false;
    }

I have verified that with this set, time type 8 (SCTP_TIMER_TYPE_PATHMTURAISE) does not get scheduled and does not fire.

I have also verified that without it set, the timer DOES get fired (as you mentioned). To test the effect further, I reduced SCTP_DEF_PMTU_RAISE_SEC from 600 to 60 to make the timer get fired every minute. Sure enough, the next iteration of the timer takes MTU to 1492, and the third one to 1500. After this, trying to send a larger message (say 5-8kb) will result in logs being filled with webrtc's 'SCTP seems to have made a packet that is bigger' errors which makes sense.

However, this works just fine on my local lan (even with the above webrtc errors) - I'm no network expert but is it possible that the users network doesnt like a MTU of 1500?

Michael - do you recommend I just turn this off and test with it off for now?

Michael Tüxen

unread,
Jul 29, 2014, 5:02:16 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de


On Tuesday, July 29, 2014 7:52:49 PM UTC+2, Faraz Khan wrote:
Michael / Lally,
I'm happy to make the change / compile / test for you guys - however I'm not too familiar with the sctp codebase.

From a cursory look - seems like I need to do something like this in sctpdataengine.cc in OpenSctpSocket:

struct sctp_paddrparams params;
params.spp_flags = SPP_PMTUD_DISABLE;

usrsctp_setsockopt(sock_, IPPROTO_SCTP, SCTP_PEER_ADDR_PARAMS , &params,
                           sizeof(params)

Something like that? Or should I get the sock opts first and then AND the MTUD_DISABLE bit?
I would also suggest to  set the initial PMTU so something like 1200 minus the IP/UDP/DTLS overhead.
So add
params.spp_spp_pathmtu = 1100; /* save MTU */
That should give you what you want. You should be able to transfer even large messages (larger than the MTU)
for a long time.

Best regards
Michael

Michael Tüxen

unread,
Jul 29, 2014, 5:09:32 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de


On Tuesday, July 29, 2014 10:17:46 PM UTC+2, Faraz Khan wrote:
So I added the following in sctpdataengine.cc:

    struct sctp_paddrparams params = { 0 };
    socklen_t params_length = 0;

    params.spp_flags = SPP_PMTUD_DISABLE;
    
    if (usrsctp_setsockopt(sock_, IPPROTO_SCTP, SCTP_PEER_ADDR_PARAMS, &params, sizeof(params)))
    {
        LOG_ERRNO(LS_ERROR) << debug_name_ << "Failed to set SCTP_PEER_ADDR_PARAMS:";
        return false;
    }

I have verified that with this set, time type 8 (SCTP_TIMER_TYPE_PATHMTURAISE) does not get scheduled and does not fire.

I have also verified that without it set, the timer DOES get fired (as you mentioned). To test the effect further, I reduced SCTP_DEF_PMTU_RAISE_SEC from 600 to 60 to make the timer get fired every minute. Sure enough, the next iteration of the timer takes MTU to 1492, and the third one to 1500. After this, trying to send a larger message (say 5-8kb) will result in logs being filled with webrtc's 'SCTP seems to have made a packet that is bigger' errors which makes sense.
The lower layer of SCTP should accept packets larger than the MTU and allow IP level fragmentation to happen...

However, this works just fine on my local lan (even with the above webrtc errors) - I'm no network expert but is it possible that the users network doesnt like a MTU of 1500?
Well, 1500 bytes is OK. But you are setting it at the SCTP layer. The SCTP stack currently only takes the IP header into account.
So you need to take care of the UDP and DTLS header. SCTP also reserves some space for a SACK. Therefore it might be working
when using 1500, but it is by accident, because SCTP is conservative and the DTLS overhead was within that limit. So there are two issues:
1. Explicitly set a save MTU value.
2. The lower layer should allow for sending messages larger than the PMTU.
These two issues are related to the usage of the SCTP stack, not with the SCTP stack itself.
For safety and not affecting performance, PMTUD should be disabled until a version of it is implemented
which does not rely on ICMP feedback and does not affect the user message transfer.
This is something which needs to be improved within the SCTP stack.

Best regards
Michael

Justin Uberti

unread,
Jul 29, 2014, 5:13:18 PM7/29/14
to discuss-webrtc, Lally Singh, Lucas Dixon, fa...@screenhero.com, Michael Tuexen
We want to make sure that at the transport layer, we don't try to send a packet larger than a MTU (in our case, we assume a conservative value of 1280 bytes, since we still need to add DTLS/TURN/UDP/IP framing) - because IP fragmentation is not well supported on the Internet.

So we are surprised when a packet comes out bigger than the MTU. It sounds like we're not telling the stack about MTUs properly though.


--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Michael Tüxen

unread,
Jul 29, 2014, 5:19:50 PM7/29/14
to discuss...@googlegroups.com, la...@google.com, ldi...@google.com, fa...@screenhero.com, tue...@fh-muenster.de


On Tuesday, July 29, 2014 11:13:18 PM UTC+2, Justin Uberti wrote:
We want to make sure that at the transport layer, we don't try to send a packet larger than a MTU (in our case, we assume a conservative value of 1280 bytes, since we still need to add DTLS/TURN/UDP/IP framing) - because IP fragmentation is not well supported on the Internet.
Right. But SCTP will perform PMTUD and therefore must be able to send packets (the probe packets) which are larger than
the PMTU. BTW, the ID considers 1200 bytes save for IPv4, not 1280 as for IPv6.

So we are surprised when a packet comes out bigger than the MTU. It sounds like we're not telling the stack about MTUs properly though.
You might be surprised, but you should not discards the packets. In case of the current PMTUD, the probe packets are based on user data.
Once they are fragmented, they can't be refragmented. Therefore I suggest to disable PMTUD until an improved version has been
implemented. It will use a combination of HEARTBEAT chunks and PADDING chunks.

Best regards
Michael

Faraz Khan

unread,
Jul 29, 2014, 5:26:40 PM7/29/14
to discuss...@googlegroups.com, t00f...@googlemail.com, ji...@google.com, ldi...@google.com, tue...@fh-muenster.de
Michael,
Thanks for the heads up. I'm testing the attached patch - I'll let you know if this solves the issue for the users who were experiencing the problem.

Thank you for all the help!
sctpnodpmtu.patch

Justin Uberti

unread,
Jul 29, 2014, 9:17:09 PM7/29/14
to discuss-webrtc, Lally Singh, Lucas Dixon, fa...@screenhero.com, Michael Tuexen
On Tue, Jul 29, 2014 at 2:19 PM, Michael Tüxen <t00f...@googlemail.com> wrote:


On Tuesday, July 29, 2014 11:13:18 PM UTC+2, Justin Uberti wrote:
We want to make sure that at the transport layer, we don't try to send a packet larger than a MTU (in our case, we assume a conservative value of 1280 bytes, since we still need to add DTLS/TURN/UDP/IP framing) - because IP fragmentation is not well supported on the Internet.
Right. But SCTP will perform PMTUD and therefore must be able to send packets (the probe packets) which are larger than
the PMTU. BTW, the ID considers 1200 bytes save for IPv4, not 1280 as for IPv6.

So we are surprised when a packet comes out bigger than the MTU. It sounds like we're not telling the stack about MTUs properly though.
You might be surprised, but you should not discards the packets. In case of the current PMTUD, the probe packets are based on user data.

Well, we wanted to make sure we knew if we hit this case...

Once they are fragmented, they can't be refragmented. Therefore I suggest to disable PMTUD until an improved version has been
implemented. It will use a combination of HEARTBEAT chunks and PADDING chunks.

Agree we should disable PTMUD. Faraz, please advise if this works.

Faraz Khan

unread,
Jul 30, 2014, 1:36:58 AM7/30/14
to Justin Uberti, Michael Tuexen, discuss...@googlegroups.com, Lucas Dixon, Lally Singh

Justin,
Ive sent a Dev version (using the attached patch) on our Dev channel to a number of beta testers. So far so good ( generally the problem would happen within 30 minutes). I'll double check with all of them personally and let you know for sure tomorrow.

Michael - Thanks again for the help !

Faraz Khan

unread,
Jul 30, 2014, 5:25:24 PM7/30/14
to discuss...@googlegroups.com, jub...@google.com, tue...@fh-muenster.de, ldi...@google.com, la...@google.com
I can confirm this definitely turns of DPMTU and fixes the problem some of our users were having! Should I make a issue out of this to track it?
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrtc+unsubscribe@googlegroups.com.

Justin Uberti

unread,
Aug 1, 2014, 8:09:32 PM8/1/14
to Faraz Khan, discuss-webrtc, Michael Tuexen, Lucas Dixon, Lally Singh
Please do. It will help ensure it gets into Chrome 38.

Faraz Khan

unread,
Aug 4, 2014, 5:01:00 PM8/4/14
to discuss...@googlegroups.com, fa...@screenhero.com, tue...@fh-muenster.de, ldi...@google.com, la...@google.com
Justin,
Done:

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Xiaoming Ding

unread,
Aug 11, 2015, 12:55:29 PM8/11/15
to discuss-webrtc

I  thought this issue had already been settled down. But I still saw a lot of similar errors at the console log from my android mobile with chrome 44.0.2403.128 : 

...

08-11 23:32:30.315 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.316 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.319 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.321 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.323 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.325 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.326 E/chromium( 3438): [ERROR:sctpdataengine.cc(908)] SctpDataMediaChannel->OnPacketFromSctpToNetwork(...): SCTP seems to have made a packet that is bigger than its official MTU.
08-11 23:32:30.574 I/chromium( 3404): [INFO:CONSOLE(446)] "Data channel state is: closed"


Finally , the data channel was terminated somehow.  How come ?



Xiaoming Ding
  
在 2014年8月5日星期二 UTC+8上午5:01:00,Faraz Khan写道:
Justin,
Done:

Haseeb Abdul Qadir

unread,
Aug 11, 2015, 1:32:55 PM8/11/15
to discuss-webrtc
I believe there was a bug in the prior fix to set the MTU. Here's some code from our modifications that describes the problem and tries to fix it: 

// Actually sets the MTU - there is a bug in sctpdataengine's mtu setting
// code which applies SPP_PMTUD_DISABLE but doesn't set spp_pathmtu properly.
// There are two issues that cause the bug that the following code fixes:
// 1. The mtu setting code is called very early to set the MTU before SCTP sets the associations
// 2. The mtu settings code doesn't set params.spp_address to match the address of the connection 
// To fix these FixSetMtu() should be called _after_ sctp notifies us that the
// association it set.
void SctpDataMediaChannel::FixSetMtu()
{
  struct sctp_paddrparams params = {0};
  sockaddr_conn local_sconn = GetSctpSockAddr(local_port_);
  memcpy(&params.spp_address, &local_sconn, sizeof(local_sconn));
  params.spp_assoc_id = 0;
  params.spp_flags = SPP_PMTUD_DISABLE;
  params.spp_pathmtu = kSctpMtu - 48; // 48 is SCTP's packet overhead
  if (usrsctp_setsockopt(sock_, IPPROTO_SCTP, SCTP_PEER_ADDR_PARAMS, &params,
      sizeof(params))) {
    LOG_ERRNO(LS_ERROR) << debug_name_ << "Failed to set MTU.";
  }
}

and another patch here:

void SctpDataMediaChannel::OnNotificationAssocChange(
    const sctp_assoc_change& change) {
  switch (change.sac_state) {
    case SCTP_COMM_UP:
      LOG(LS_VERBOSE) << "Association change SCTP_COMM_UP";
      // we can only set the MTU once the association has been established
      FixSetMtu();
      break;
Reply all
Reply to author
Forward
0 new messages