Reviews for Wilson11D3

109 views
Skip to first unread message

Rodrigo Fonseca

unread,
Apr 11, 2013, 1:39:45 AM4/11/13
to csci2950u-...@googlegroups.com
Please post your reviews to D3 as a group reply to this message. Since this message is late, if you already sent me a message I'll repost it in the morning. Don't pay attention to the title of the paper, in the case of the reviews, never is not better than late.

Rodrigo

Tan "Charles" Zhang

unread,
Apr 11, 2013, 1:49:48 AM4/11/13
to csci2950u-...@googlegroups.com

Paper Title: Better never than late: meeting deadlines in datacenter networks


Authors: Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron


Date: SIGCOMM 2011 August


Novel Idea:

This work proposed a control protocol called D3 that uses application deadline information to achieve informed allocation of network bandwidth.


Main Results:

The authors designed a modified transport protocol that relies on the end hosts to keep record of and convey the rate allocation for each flow. There are three ideal bandwidth allocation schemes: (1) Fair-share, (2) Earliest deadline first (EDF), (3) Rate reservation. They then concluded the three goals for data center congestion control design: (1) Maximize application throughput, (2) Burst tolerance, (3) High utilization.


For rate control, source hosts first requires a desired rate using its deadline information, the intermediate routers then sees this request and returns an allocation vector to the source host and the source host will adjust its sending rate accordingly.


For rate allocation, they adopted a greedy approach to ensure that flows with a deadline will get a higher priority of getting its share of the bandwidth, and each flow will have a base rate to make sure that each flow will have probe messages sent out when the bottleneck router didn’t allocation enough bandwidth to it, because otherwise when a non-deadline flow gets zero bandwidth, it can no longer get back up.

For router operation, each router needs to know its existing allocation, but to alleviate heavy router processing time, the end hosts are responsible for conveying rate allocation for each flow with the information being carried in the packet headers.


Impact:

For datacenters where it is typical that each traffic flow corresponds to a certain job and each job usually has a deadline, ensuring that jobs are finished and data are delivered on time is crucial to maintaining a high performance rate. This work did just that for data centers.


Evidence:

They implemented the modified transport protocol and deployed D3 across a small test bed structure which includes twelve end hosts arranged across four racks. The goal of running the evaluation is first to determine the value of using flow deadline information to apportion network bandwidth, and second to evaluate the performance of D3 just as congestion control protocol, without deadline information. They compared D3 with RCPdc and normal TCP in three scenarios: (1) flow burst microbenchmarks, (2) benchmark traffic (3) flow quenching, and D3 performs extraordinarily well.


Prior work:

They borrowed the solution for problems caused by misinformed routers from [11,19]


Criticism:

As they mentioned, the practicality of deploying D3 in a real world data center might be hard.


Shu Zhang

unread,
Apr 11, 2013, 1:52:57 AM4/11/13
to csci2950u-...@googlegroups.com
1. Paper Title:
Better Never than Late: Meeting Deadlines in Datacenter Networks
2. Authors:
Chriso Wilson, Hetesh Ballani, Thomas Karaginnis, Ant Rowstron
3. Date:
SIGCOMM’ 11, August 15-19, 2011, Toronto.
4. Novel Idea:
The paper realizes the problem that for large scale web applications, some requests are really interactivity based and these requests must meet their deadline, otherwise the users might lose their patience and thus more severe problems might occur. In the back of these web applications are the back-end data centers which should respond to them as quickly as possible. The paper first investigated existing policies to meet the deadline for applications, and found their shortcomings for flows and possibility to interfere with other “innocent” traffic. As its solution, the paper proposes D3, a new congestion control policy for meeting the deadline of flows, which needs some augmentation in both the packet header format and the router behavior. Experiments show that D3 outperforms traditional TCP and revised versions of TCP (RCP, DCTCP) under various scenarios.
5. Main Results
D3 comes from the observation of the data center traffic patterns and the investigation of network application-level traffic. In Part 4, the design consideration is stated. Topics involve rate limitation, rate control, rate allocation, router allocation and algorithm details. The paper did a good job stating these details. Starting from Part 5, the paper introduces how D3 is implemented and how it is tested. The major modification is to add D3 congestion header into packets which have deadlines for their delivery. The evaluation involves some tests for D3 and traditional protocols such as TCP. Scenarios include those in which a lot of flows occur in the same time(bursty flows), those involve background flows, those with short response flows and flow quenching, etc. The results show that D3 outperforms other protocols. However in the final discussion, the authors admit that it is not realistic to use D3 totally in all datacenters, but they did not go deep into evaluating the performance of D3 when it coexists with other protocols. And the paper also proposed the possibility of extension that the hard deadline of D3 be transferred to soft deadline.
6. Impact
As stated in the “Novel Idea” part, the paper proposed a protocol which performs very well in the “deadline” scenario. Previously, the methods dealing with the deadline problem are pretty naive, such as shortest deadline first, or just reservation the bandwidth. But these two methods did not perform well in certain circumstances, such as concurrent bursty flows and elastic bandwidth fluctuation. So in D3, it recognizes three problems to achieve. One is Maximize application throughput, one is burst tolerance and the last is high utilization (for flows without deadlines). A big assumption in D3 is that the sender knows the length of the flow and hence could calculate its required rate for the flow. So the deadline question is converted to the rate control problem. SO along this path, the following algorithm deals with the routers’ capability and the flows’ required rates. And also, in order to relief the processing stress of routers, the sender of the packet records the information needed for routers to computer the resource allocation. This is a good innovation, and it is a tradeoff between space and time efficiency. And other minor problems like if the flows are bursty and the information of routers seems obsolete to senders, then the router will choose to allocate a minimum baseline bandwidth to pause the flow, so as to solve the problem.
7. Prior Work
The two typical solutions for deadline issues are EDF and rate reservation. EDF is really straightforward, it is packet based. EDF works on per-hop packet deadlines but the datacenter applications have end-to-end flow deadlines. For reservation schemes, a lot of flows don’t have constant rate. So reservation of a constant value of bandwidth will not be efficient or it will not always satisfy dynamic requirements if flow changes. So these limitations motivate the need for a practical datacenter congestion control protocol which deal with deadline issue but also avoid packet scheduling and explicit reservation. So as we can see, D3 is instant allocation with no future allocation.
8. Reproducibility
One way is to modify the kernel and the code in switch to achieve the goal.Or we could simulate the behavior using SDN approaches. The flow could be identified in Openflow switches and if we could let controller to instruct the behavior of flow, that will be cool to reproduce the behavior in an easier way. The vendor message in Openflow provides a way to extend which could be utilized.
9. Ideas for further work
Admittedly, it is a novel way to deal with the deadline problem in the flow granularity level. And the sender should compute the demand rate at the beginning of every RTT, so the deadline problem is converted to the rate problem because we could calculate the rate by dividing the flow size by the deadline. And the paper left the end-point computation task to the users. I am thinking about integrating this method in combinatorial bidding and see if it could be reproduced in SDN.

Shao, Tuo

unread,
Apr 11, 2013, 2:23:40 AM4/11/13
to csci2950u-...@googlegroups.com
Paper Title
Better Never than Late: Meeting Deadlines in Datacenter Networks

Authors
Chisto Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron

Date
SIGCOMM'11, August 15-19, 2011

Novel Idea
This paper mainly focus on the requirements of meeting the deadline of services in data center network in perspective of transmitting delay and it presents a design of rate control in routers with explicit demand from end hosts.

Main Results
The paper outlines three contributions: first, the paper describes cases to utilize dealine infomation to aportion the bandwidth; second, it presents the design and implementation of D3; third, it evalutes performance of D3 in both some microbenchmarks and realistic traffic pattern.

Impact
I think this paper presents a fairly efficient way of scheduling resouces for urgent tasks. And it should be better than implementing it in a centralized way with a controller because it's more real-time and adapts to dynamic changing short traffic.

Evidence
The paper first starts with some race conditions in witch TCP neglects the dealine of the flow and results in task failure. Then the paper points out the challenges in deadline-aware data center network and characters of data center traffics. To elimate the limitation, the paper describes existing solutions like EDF and rate reservations. However, they are either too heavy-weighted or not meeting the characters of data center network. So the paper then proposes a new design and implemention of D3. To evaluate the primary goals of the design, the paper at last conducts several evaluations in comparison to TCP and RCP.

Reproducibility
I think the paper covers enough details of the D3 design to reproduce the work

Prior Work and Competive Work
EDF, RCP, DCTCP, XCP


On Thu, Apr 11, 2013 at 1:39 AM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:
Please post your reviews to D3 as a group reply to this message. Since this message is late, if you already sent me a message I'll repost it in the morning. Don't pay attention to the title of the paper, in the case of the reviews, never is not better than late.

Rodrigo

--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Place, Jordan

unread,
Apr 11, 2013, 2:36:04 AM4/11/13
to csci2950u-...@googlegroups.com
Better Never than Late: Meeting Deadlines in Datacenter Networks
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron
SIGCOMM '11
Today's datacenters often contain network traffic which must meet
a certain deadline. Unfortunately, the protocols used in these
networks are deadline agnostic and value flow-rate fairness over
everything else. While this policy leads to high network utilization
in terms of throughput at the transportation layer, it in fact leads
to low network utilization in terms of throughput at the application
layer due to the fact that flows which have missed their deadline are
useless to the application expecting them.
The authors of this paper present D3, a deadline-aware control
protocol for use in data center environments. D3 works by allowing
applications to embed flow rate requests in their TCP-like packets.
Router can read these requests and respond with rate allocations in
ACKs. Hosts calculate their requests according to deadlines such that
routers can prioritize flows which have closer deadlines. The protocol
also supports flows without deadlines and divides extra-bandwidth
amongst flows when a path is underutilized.
Particularly interesting is the idea that routers which sense they
are oversubscribed with deadlines they cannot meet will allow as many
flows as possible to meet their deadlines while effectively pausing
other flows. This action contrasts with modern TCP which in striving
for fairness will degrade the rates of all flows such that none of
them can meet their deadlines.
The authors present significant performance gains in terms of
application throughput from tests on a small testbed. This protocol
is not very centralized or state dependent so I would imagine these
results will scale nicely to a real data center.
The weaknesses of this paper are in exploring the deployment and
backwards-compatibility of D3 (though the authors present these as
"non-goals" from the start). Additionally, I would be interested in
expanding this idea to work with dynamic scheduling efforts such as
Hedera or MicroTE.


On Thu, Apr 11, 2013 at 1:39 AM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Rodrigo Fonseca

unread,
Apr 11, 2013, 7:41:07 AM4/11/13
to csci2950u-...@googlegroups.com
On Behalf of Rui Zhou

Title:
Better Never than Late: Meeting Deadlines in Datacenter Networks

Authors:
Christo Wilson  Hitesh Ballani  Thomas Karagiannis  Ant Rowstron

Novel Idea:
With additional consideration on flow deadlines, data-center network routers can schedule flows in a smarter way that meets more deadlines than traditional routing scheme.

Summary:
Data-centers are emerging as the mainstream solution to provide network services. Currently the partition and aggregation style is widely used in them. A question (such as search) can be divided to sub-problems and partitioned to many workers, who are expected to provide answers timely for aggregation. The experience of users is largely affected by the percentage of the workers who finished their computation and provide answers within deadlines. But this style of computation could result in huge aggregation burst of traffic. With traditional Network scheme, there is no consideration on the deadlines, thus smarter decision could not be made.
There are two main classes of solutions which try to alleviate the aforementioned drawbacks, earliest deadline first(EDF) and rate reservation. However the first class is sort of hard to implement, and the second is not agile enough to cope with the enormous amount of short flows. 
To address this problem better, the paper proposed D3, with the goals of maximizing application throughput, improving burst tolerance, and providing high utilization, targeting on the homogeneous data-center environment. The key insight of D3 is to determine the size of a flow and its deadline at the initiation time of the flow and reason the rate needed for that flow. With co-operation of the network routers and the end hosts, D3 strives to maximize the number of flows who meet their deadlines as well as providing progress to flows without deadlines and optimize the bandwidth usage at the same time. Very different from traditional cumbersome bandwidth reservation/lease, the allocation of each flow are calculated and adjusted on the fly to achieve optimal solution according to the changing requirements. This  timely adjustment to hosts requests also enables D3 to tolerate bursts very well.  D3 as a congestion control protocol also provides good network utilization and low queuing, as shown in the evaluations.


Questions and Thoughts:
1. People often argue that abstraction and layering is the way to get things done. The end to end agreement also suggest we should encapsulate the layers and provide one functionality at its layer. But this D3 is a great example that a layering violation which exposes information from application layer to transport layer actually brings some really good benefits.  Seems principles always come with exceptions.
2. Why this paper talks on router while other paper often talks with switch? Does D3 has to working with IP? Is the meaning of switch and router more blended together these days?
3. If you cannot predict the size or length of a flow, for example a on-going skype call, what should we do. In fact the prediction assumption is probably the biggest suspicious point. Maybe a traditional network reservation scheme can co-operate in this situation.

Rodrigo Fonseca

unread,
Apr 11, 2013, 7:42:12 AM4/11/13
to csci2950u-...@googlegroups.com
On behalf of Jeff Rasley:

Title: Better Never than Late: Meeting Deadlines in Datacenter Networks
Authors: Christopher Wilson (UCSB), Hitesh Ballani, Thomas Karagiannis and Ant Rowstron (MSR)
Date: Sigcomm '11

Novel & Main Ideas: The authors provide an alternative to TCP, called D3, that provides additional rate information in a flow's packets. Routers are able to use this information to allocate resources to each flow based on their requests. The authors also detail an online algorithm for routers to allocate these rates.

Related/Competitive Work: Builds off of work done by XCP and RCP. They evaluate D3 against TCP and RCP_DC (i.e. D3 without deadlines), which they say is effectively RCP optimized for the data center.

Reproducibility seems potentially straightforward since the hard part is the router implementation and the algorithm has pretty good detail in section 4.3.

Question/Comment: I think it's great that RCP_DC (D3 without deadlines) works so well. However we already knew RCP performs well. Their evaluation of D3 with deadlines is interesting. Since they were not able to get any actual deadlines data they modeled it with an exponential distribution, but don't really go into why they think this is a proper model of deadlines. Is this a known thing or was it just an assumption?

Future work: D3 assumes a single path/flow, they state that dealing with multiple paths for a single flow is beyond the scope of the work and would require additional mechanisms. Realistically it would probably require a complete different protocol since D3 depends so much on the path being consistent.


On Thursday, April 11, 2013 1:39:45 AM UTC-4, Rodrigo Fonseca wrote:

Rodrigo Fonseca

unread,
Apr 11, 2013, 7:43:00 AM4/11/13
to csci2950u-...@googlegroups.com
On behalf of Dimitra

Title: Better Never than Late: Meeting Deadlines in Datacenter Networks

 

Authors: Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron

 

SIGCOMM'11

 

 

Novel Idea: Motivated by the fact that today's transport protocols, such as TCP, aim to fair network sharing but their obliviousness on flows deadlines can compromise applications performance in data center environments, the paper designs, implements and evaluates D3, a congestion control protocol that makes data centers deadline aware. The D3 control protocol -as it stands for Deadline-Driven Delivery, addresses the challenges related to datacenter environments (eg. flows are mostly short and have deadlines that can vary significantly. Traffic is bursty and diverse, with small RTTs) and uses flows deadline information to allocate the network bandwidth appropriately.

 

Main Result: Through evaluation on a 19-node, two-tier datacenter testbed, D3 was found to easily outperform TCP in terms of short flow latency and burst tolerance, even when no deadline information was used. When utilizing deadline information, D3 doubled the peak load supported by the datacenter. D3 can provide significant benefits compared to other existing solutions.

 

Impact: The latency targets and the workflow distribution of today's datacenters applications has implications for the datacenter traffic, where often flows need to complete within a deadline otherwise they become useless. The congestion control and flow scheduling mechanisms in today's datacenters aim to maximum throughput and fairness but are often oblivious of flow deadlines. This can hurt the application performance. This paper proposes an efficient protocol that is aware of the flows deadlines and performs better than TCP, thus promising to solve this problem.

 

Evidence:  The authors begin with a characterization of today's datacenters, and talk about the particularities of datacenter applications, such the partition aggregate structure, the application deadlines as well as their variability, the variety of flow sizes and the frequency of missed deadlines. Then, through Monte Carlo simulations, they evaluate three bandwidth allocation schemes: Fair-share, EDF and rate reservation (the last two are deadline aware). The results on application throughput for varying number of flows, demonstrate that as the number of flows increases, deadline-aware approaches outperform fair-sharing significantly. The authors then present the design and implementation of D3, and specifically talk about rate control, rate allocation, router operation, utilization and queuing and burst tolerance. They deploy D3 using a small testbed structure Like the multi-tier tree topologies used in today's datacenters and compare it against TCP and RCPdc (that is D3 in fair share mode only, without any deadline awareness) and TCPpr (that is TCP with priority queuing) with main comparison metric the application throughput (number of flows finishing before their deadline). For flow burst microbenchmarks, D3 was found to support almost twice as many concurrent senders as RCPdc while satisfying flow deadlines (3-4 times as compared to TCP and TCP pr). For benchmark traffic (using typical datacenter traffic patterns) D3 offered an order of magnitude improvement  over TCP, and two over TCPpr and RCPdc. For flow quenching, D3 performance smoothly degrades under extreme load.

 

Prior Work: D3 was inspired by proposals to manage network congestion through explicit rate control [11,19]. Many of this paper's assumptions were based on previous work such as: [4] for the traffic characterization of datacenters and [8,23] for the problems of using TCP in datacenters. For topologies with multiple paths between endhosts [13,1,2,15], D3 relied on ECMP, VLB and other existing mechanisms used with TCP to ensure that a flow follows a single path.

 

Competitive Work: Motivated by the issues related to the use of TCP in datacenters (network throughput drop caused by bursts in concurrent flows, delays on query-response flows because of long flows) other works have proposed congestion control protocols or used UDP  [4,22]. There are also application level solutions, such as SEDA [25] which deal with variable application load, but are not appropriate for network/transport problems since the datacenter network is shared amongst multiple applications. Finally, EDF [21] is a deadline aware solution where the flow with the earliest deadline receives all the bandwidth until it finishes. The problem with EDF is that it is packet based, so it works on per-hop packet deadlines, while datacenter applications have end-to-end flow deadlines. Thus, even though it is optimal when deadlines can be met, it can drive the network in congestion collapse if there is congestion. Furthermore, it still needs an endhost rate control design.

 

Reproducibility: The results of this paper are reproducible.

 

Criticism: This is a great paper that offers a valuable solution to the problems imposed by existing control protocols when used in datacenters, as they are unaware of flow deadlines. D3's design tries to maximize the number of flows that satisfy their deadlines, thus increasing application throughput. It is tolerant to burst and diverse traffic with widely varying deadlines and manages to outperform other existing solutions. The paper offers a variety of results to support its claims. The evaluation process is thorough. D3 was evaluated under various scenarios (burst traffic, benchmark traffic, flow quenching) and as a congestion control protocol operating without any deadline information. Its deployability and its performance under hard deadlines was also discussed. Overall, it is a very good paper, but it leaves room from improvement as it was based on assumptions such as, static per-flow paths and apriori knowledge of flow size and deadline information.

 

Question/Class Discussion: D3 uses some assumptions such as:  1) there is a sufficient level of trust in the datacenter environment, to delegate both the state regarding flow sending rates and rate policing, to the endhosts 2) flow sizes and deadline information are available at flow initiation time and 3) per-flow paths are static. To which extent are those assumptions reasonable and realistic?

 


On Thursday, April 11, 2013 1:39:45 AM UTC-4, Rodrigo Fonseca wrote:

Rodrigo Fonseca

unread,
Apr 11, 2013, 7:44:35 AM4/11/13
to csci2950u-...@googlegroups.com
On behalf of Chris Picardo:

Paper Title:

Better Never than Later: Meeting Deadlines in Datacenter Networks


Author(s):

Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron


Date:

SIGCOMM’11, August 15-19, 2011, Toronto, Ontario, Canada.

 

Novel Idea:

In this paper, the authors present D3, a deadline-aware control protocol that is customized for the data center environment. D3 uses application deadline information to achieve informed allocation of network bandwidth.


Main Results:

Maximizes application throughput, accommodates burst flows, and maximizes network throughput (high utilization, and low queuing).


Impact:

D3 is practical. It deals with small RTTs, and diverse traffic mix with different deadlines. Datacenters therefore can accommodate & meet deadlines and keep the highly coveted high utilization, and also avoid empty responses.


Evidence:

For D3 as a congestion control protocol, experiments show that D3 performs well in scenarios with both multiple hops as well as multiple bottlenecks.

To meet deadlines, D3 dynamically prioritizes flows based on deadlines and network conditions, D3 is better at satisfying flow deadlines in the presence of background traffic.

The authors demonstrate flow arrival rate can be supported while maintaining more than 99% application throughput and D3 achieves gains by smarter allocation of resources amongst the deadline flows.

Flow quenching leads to a smoother decline in performance at extreme loads. From the application perspective, fewer end users get empty responses.


Prior work:

D3 builds on top packet scheduling in a network based on deadlines. For example, Earliest Deadline First (EDF) where routers prioritize packets based on their per-hop deadlines.  

Also, rate reservation mechanisms like ATM supported Constant Bit Rate (CBR) traffic. The idea is to reserve bandwidth, or at least guarantee performance. However, reservation schemes are to heavy weight for the datacenter environment where most flows are short, and flows do not have a constant rate.

Given a flow’s size and deadline, one can determine the rate needed to satisfy the flow deadline. End hosts can then ask the network for the required rate. With D3 the authors extend existing protocols that handle explicit rate control to now assign flows with rates based on their deadlines, instead of the fair share.


Question & Criticism:

Deadlines are associated with flows, not packets. This is difficult if flow is too short because then deadlines are hard to meet. Furthermore, deadlines could vary a lot. Can D3 handle a large number of short deadlines, and still meet them?


On Thursday, April 11, 2013 1:39:45 AM UTC-4, Rodrigo Fonseca wrote:

Rodrigo Fonseca

unread,
Apr 11, 2013, 7:50:03 AM4/11/13
to csci2950u-...@googlegroups.com
On behalf of David Trejo



On Thursday, April 11, 2013 1:39:45 AM UTC-4, Rodrigo Fonseca wrote:
Reply all
Reply to author
Forward
0 new messages