On Sat, Apr 5, 2014 at 11:01 AM, Aaditeshwar Seth <as...@cse.iitd.ernet.in> wrote:
What I recall about these connection stalls is that:
1. The server (sender) would keep sending data and run into timeouts, but the receiver never received the data within the 0.5 RTT at that time but much much later
2. These stalls were triggered by a burst data release that was likely to be a combination of CUBIC stepping up the congestion window, delayed acks, and possibly TSO.
So a number of things need to be investigated:
- For 1, what is the RTO calculated by TCP at the sender when these stalls occur, to confirm that the RTO is similar to the RTTs and much smaller than the delay at which the data is actually received
We have gone through tcpdumps and Web10g logs. The observation is in sync with your suggestions. The RTO is similar to the RTTs, but slightly higher. The RTO observed in plots is less than half of the delay at which the data is received.
- For 2, can we precisely identify the trigger point, which could be based on the amount of congestion window increase and the rate of increase
The observations of dumps for the trigger point of stalls do not reveal any stepping up of the congestion window, in fact in many cases the congestion window shows gradual decrease before connection stall.
Specifically to understand the cause:
- Check that no middlebox in the way is buffering out-of-order packets, something that was noticed in the 'middleboxes in cellular networks' paper where firewalls were seen to be buffering out of order packets for deep packet inspection to check for virus signatures. In our case, we should check if out of order packets sent artificially are being buffered
The tests were performed and out-of-order packet buffering was not observed.
- Whether signal strength at the receiver is a problem. Do we have signal strength measurements during this time? Otherwise having multiple transfers going is a possibility, but why do you say upload? Here it is a download scenario. I don't think we investigated stalls in upload
Signal strength measurements are there, but the frequency is once in three seconds and reliability has not been ascertained.
Tests will be performed for competing flows: TCP Vs TCP to different destinations, TCP Vs UDP to same destination. These tests will be performed in the current week. Stalls are present in both uploads and downloads and we are investigating in both directions.
- How about the modem, could this be an issue?
This will be investigated once the above mentioned tests are completed.
Please also give timelines for these things.
I and Asheesh worked on these issues. Today I am going to Hyderabad on Official Duty and will be back on 19th Apr, I have detailed the further tests to Asheesh and have asked him to complete the task by the end of this week.
Arvind
Aaditeshwar
On 03-04-2014 20:44, Arvind Mahla wrote:
Sir,
We along with Zahir analysed Web10g logs for connection stalls. The Web10g variables such as congestion window size, RTT, RTO, were plotted along with tcpdumps but the reason for connection could not be established.
We need to explore further by looking answers to the following questions:
1. Are the connection stalls due to poor signal quality or fading? To get insight into this we need to run parallel UDP/TCP uploads along with ping. Signal fading will definitely affect the parallel flows. It will also reveal whether the stalling is protocol specific. Parallel uploads should be done using different servers simultaneously. It will help in isolating the reason for connection stalls.
2. Are the connection stalls OS specific? We need to run the tests on Windows machine.
Arvind
On Tue, Apr 1, 2014 at 9:27 AM, Aaditeshwar Seth <as...@cse.iitd.ernet.in> wrote:
Folks, Vinay is not available therefore please catch up among yourself and with Zahir, and send me a good report on web10g analysis especially if it can explain the connection stalling--
On 31-03-2014 08:31, Aaditeshwar Seth wrote:
Folks, Are you free tomorrow (Tuesday) at 5pm to discuss these and the web10g results? Arvind/Asheesh, please catch up before that and send us an updated report on the web10g results that Ashesh has sent.
On 31-03-2014 08:16, Arvind Mahla wrote:
Sir,
In the report I had sent, the Conclusion section was modified as per your comments.
I will further refine it for better understanding.
Arvind
Aaditeshwar Seth
Co-founder, Gram Vaani Community Media
http://gramvaani.org
http://www.linkedin.com/company/gram-vaani-community-media
-- Aaditeshwar Seth Co-founder, Gram Vaani Community Media http://gramvaani.org http://www.linkedin.com/company/gram-vaani-community-media
-- Aaditeshwar Seth Assistant Professor Computer Science and Engineering IIT Delhi http://www.cse.iitd.ernet.in/~aseth
Comments inline:Please give evidence, ie. show the reported RTO over time, RTT over time, etc, so that we can make this claim.
On 13-04-2014 15:45, Arvind Mahla wrote:
On Sat, Apr 5, 2014 at 11:01 AM, Aaditeshwar Seth <as...@cse.iitd.ernet.in> wrote:
What I recall about these connection stalls is that:
1. The server (sender) would keep sending data and run into timeouts, but the receiver never received the data within the 0.5 RTT at that time but much much later
2. These stalls were triggered by a burst data release that was likely to be a combination of CUBIC stepping up the congestion window, delayed acks, and possibly TSO.
So a number of things need to be investigated:
- For 1, what is the RTO calculated by TCP at the sender when these stalls occur, to confirm that the RTO is similar to the RTTs and much smaller than the delay at which the data is actually received
We have gone through tcpdumps and Web10g logs. The observation is in sync with your suggestions. The RTO is similar to the RTTs, but slightly higher. The RTO observed in plots is less than half of the delay at which the data is received.
Same comment at earlier, show the congwin variable over time.
- For 2, can we precisely identify the trigger point, which could be based on the amount of congestion window increase and the rate of increase
The observations of dumps for the trigger point of stalls do not reveal any stepping up of the congestion window, in fact in many cases the congestion window shows gradual decrease before connection stall.
Ok
Specifically to understand the cause:
- Check that no middlebox in the way is buffering out-of-order packets, something that was noticed in the 'middleboxes in cellular networks' paper where firewalls were seen to be buffering out of order packets for deep packet inspection to check for virus signatures. In our case, we should check if out of order packets sent artificially are being buffered
The tests were performed and out-of-order packet buffering was not observed.
- Whether signal strength at the receiver is a problem. Do we have signal strength measurements during this time? Otherwise having multiple transfers going is a possibility, but why do you say upload? Here it is a download scenario. I don't think we investigated stalls in upload
Signal strength measurements are there, but the frequency is once in three seconds and reliability has not been ascertained.
Tests will be performed for competing flows: TCP Vs TCP to different destinations, TCP Vs UDP to same destination. These tests will be performed in the current week. Stalls are present in both uploads and downloads and we are investigating in both directions.
Make sure you order other modems in advance if you don't already have different models.
- How about the modem, could this be an issue?
This will be investigated once the above mentioned tests are completed.
--
You received this message because you are subscribed to the Google Groups "ruralnet-act4d" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruralnet-act4...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
In these graphs, you also need to show the acks received and packets sent at the sender, and separately also show the packets received and acks dispatched at the receiver. A connection stall is when the sender has sent packets but which haven't been received. What you've marked as a connection stall here, seems to be a pause in dispatching data at the sender, which seems to be according to standard practice since no new acks came in.
--