Rodrigo
Paper Title: Better never than late: meeting deadlines in datacenter networks
Authors: Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron
Date: SIGCOMM 2011 August
Novel Idea:
This work proposed a control protocol called D3 that uses application deadline information to achieve informed allocation of network bandwidth.
Main Results:
The authors designed a modified transport protocol that relies on the end hosts to keep record of and convey the rate allocation for each flow. There are three ideal bandwidth allocation schemes: (1) Fair-share, (2) Earliest deadline first (EDF), (3) Rate reservation. They then concluded the three goals for data center congestion control design: (1) Maximize application throughput, (2) Burst tolerance, (3) High utilization.
For rate control, source hosts first requires a desired rate using its deadline information, the intermediate routers then sees this request and returns an allocation vector to the source host and the source host will adjust its sending rate accordingly.
For rate allocation, they adopted a greedy approach to ensure that flows with a deadline will get a higher priority of getting its share of the bandwidth, and each flow will have a base rate to make sure that each flow will have probe messages sent out when the bottleneck router didn’t allocation enough bandwidth to it, because otherwise when a non-deadline flow gets zero bandwidth, it can no longer get back up.
For router operation, each router needs to know its existing allocation, but to alleviate heavy router processing time, the end hosts are responsible for conveying rate allocation for each flow with the information being carried in the packet headers.
Impact:
For datacenters where it is typical that each traffic flow corresponds to a certain job and each job usually has a deadline, ensuring that jobs are finished and data are delivered on time is crucial to maintaining a high performance rate. This work did just that for data centers.
Evidence:
They implemented the modified transport protocol and deployed D3 across a small test bed structure which includes twelve end hosts arranged across four racks. The goal of running the evaluation is first to determine the value of using flow deadline information to apportion network bandwidth, and second to evaluate the performance of D3 just as congestion control protocol, without deadline information. They compared D3 with RCPdc and normal TCP in three scenarios: (1) flow burst microbenchmarks, (2) benchmark traffic (3) flow quenching, and D3 performs extraordinarily well.
Prior work:
They borrowed the solution for problems caused by misinformed routers from [11,19]
Criticism:
As they mentioned, the practicality of deploying D3 in a real world data center might be hard.
Please post your reviews to D3 as a group reply to this message. Since this message is late, if you already sent me a message I'll repost it in the morning. Don't pay attention to the title of the paper, in the case of the reviews, never is not better than late.
Rodrigo
--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Title: Better Never than Late: Meeting Deadlines in Datacenter Networks
Authors: Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron
SIGCOMM'11
Novel Idea: Motivated by the fact that today's transport protocols, such as TCP, aim to fair network sharing but their obliviousness on flows deadlines can compromise applications performance in data center environments, the paper designs, implements and evaluates D3, a congestion control protocol that makes data centers deadline aware. The D3 control protocol -as it stands for Deadline-Driven Delivery, addresses the challenges related to datacenter environments (eg. flows are mostly short and have deadlines that can vary significantly. Traffic is bursty and diverse, with small RTTs) and uses flows deadline information to allocate the network bandwidth appropriately.
Main Result: Through evaluation on a 19-node, two-tier datacenter testbed, D3 was found to easily outperform TCP in terms of short flow latency and burst tolerance, even when no deadline information was used. When utilizing deadline information, D3 doubled the peak load supported by the datacenter. D3 can provide significant benefits compared to other existing solutions.
Impact: The latency targets and the workflow distribution of today's datacenters applications has implications for the datacenter traffic, where often flows need to complete within a deadline otherwise they become useless. The congestion control and flow scheduling mechanisms in today's datacenters aim to maximum throughput and fairness but are often oblivious of flow deadlines. This can hurt the application performance. This paper proposes an efficient protocol that is aware of the flows deadlines and performs better than TCP, thus promising to solve this problem.
Evidence: The authors begin with a characterization of today's datacenters, and talk about the particularities of datacenter applications, such the partition aggregate structure, the application deadlines as well as their variability, the variety of flow sizes and the frequency of missed deadlines. Then, through Monte Carlo simulations, they evaluate three bandwidth allocation schemes: Fair-share, EDF and rate reservation (the last two are deadline aware). The results on application throughput for varying number of flows, demonstrate that as the number of flows increases, deadline-aware approaches outperform fair-sharing significantly. The authors then present the design and implementation of D3, and specifically talk about rate control, rate allocation, router operation, utilization and queuing and burst tolerance. They deploy D3 using a small testbed structure Like the multi-tier tree topologies used in today's datacenters and compare it against TCP and RCPdc (that is D3 in fair share mode only, without any deadline awareness) and TCPpr (that is TCP with priority queuing) with main comparison metric the application throughput (number of flows finishing before their deadline). For flow burst microbenchmarks, D3 was found to support almost twice as many concurrent senders as RCPdc while satisfying flow deadlines (3-4 times as compared to TCP and TCP pr). For benchmark traffic (using typical datacenter traffic patterns) D3 offered an order of magnitude improvement over TCP, and two over TCPpr and RCPdc. For flow quenching, D3 performance smoothly degrades under extreme load.
Prior Work: D3 was inspired by proposals to manage network congestion through explicit rate control [11,19]. Many of this paper's assumptions were based on previous work such as: [4] for the traffic characterization of datacenters and [8,23] for the problems of using TCP in datacenters. For topologies with multiple paths between endhosts [13,1,2,15], D3 relied on ECMP, VLB and other existing mechanisms used with TCP to ensure that a flow follows a single path.
Competitive Work: Motivated by the issues related to the use of TCP in datacenters (network throughput drop caused by bursts in concurrent flows, delays on query-response flows because of long flows) other works have proposed congestion control protocols or used UDP [4,22]. There are also application level solutions, such as SEDA [25] which deal with variable application load, but are not appropriate for network/transport problems since the datacenter network is shared amongst multiple applications. Finally, EDF [21] is a deadline aware solution where the flow with the earliest deadline receives all the bandwidth until it finishes. The problem with EDF is that it is packet based, so it works on per-hop packet deadlines, while datacenter applications have end-to-end flow deadlines. Thus, even though it is optimal when deadlines can be met, it can drive the network in congestion collapse if there is congestion. Furthermore, it still needs an endhost rate control design.
Reproducibility: The results of this paper are reproducible.
Criticism: This is a great paper that offers a valuable solution to the problems imposed by existing control protocols when used in datacenters, as they are unaware of flow deadlines. D3's design tries to maximize the number of flows that satisfy their deadlines, thus increasing application throughput. It is tolerant to burst and diverse traffic with widely varying deadlines and manages to outperform other existing solutions. The paper offers a variety of results to support its claims. The evaluation process is thorough. D3 was evaluated under various scenarios (burst traffic, benchmark traffic, flow quenching) and as a congestion control protocol operating without any deadline information. Its deployability and its performance under hard deadlines was also discussed. Overall, it is a very good paper, but it leaves room from improvement as it was based on assumptions such as, static per-flow paths and apriori knowledge of flow size and deadline information.
Question/Class Discussion: D3 uses some assumptions such as: 1) there is a sufficient level of trust in the datacenter environment, to delegate both the state regarding flow sending rates and rate policing, to the endhosts 2) flow sizes and deadline information are available at flow initiation time and 3) per-flow paths are static. To which extent are those assumptions reasonable and realistic?
Paper Title:
Better Never than Later: Meeting Deadlines in Datacenter Networks
Author(s):
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron
Date:
SIGCOMM’11, August 15-19, 2011, Toronto, Ontario, Canada.
Novel Idea:
In this paper, the authors present D3, a deadline-aware control protocol that is customized for the data center environment. D3 uses application deadline information to achieve informed allocation of network bandwidth.
Main Results:
Maximizes application throughput, accommodates burst flows, and maximizes network throughput (high utilization, and low queuing).
Impact:
D3 is practical. It deals with small RTTs, and diverse traffic mix with different deadlines. Datacenters therefore can accommodate & meet deadlines and keep the highly coveted high utilization, and also avoid empty responses.
Evidence:
For D3 as a congestion control protocol, experiments show that D3 performs well in scenarios with both multiple hops as well as multiple bottlenecks.
To meet deadlines, D3 dynamically prioritizes flows based on deadlines and network conditions, D3 is better at satisfying flow deadlines in the presence of background traffic.
The authors demonstrate flow arrival rate can be supported while maintaining more than 99% application throughput and D3 achieves gains by smarter allocation of resources amongst the deadline flows.
Flow quenching leads to a smoother decline in performance at extreme loads. From the application perspective, fewer end users get empty responses.
Prior work:
D3 builds on top packet scheduling in a network based on deadlines. For example, Earliest Deadline First (EDF) where routers prioritize packets based on their per-hop deadlines.
Also, rate reservation mechanisms like ATM supported Constant Bit Rate (CBR) traffic. The idea is to reserve bandwidth, or at least guarantee performance. However, reservation schemes are to heavy weight for the datacenter environment where most flows are short, and flows do not have a constant rate.
Given a flow’s size and deadline, one can determine the rate needed to satisfy the flow deadline. End hosts can then ask the network for the required rate. With D3 the authors extend existing protocols that handle explicit rate control to now assign flows with rates based on their deadlines, instead of the fair share.
Question & Criticism:
Deadlines are associated with flows, not packets. This is difficult if flow is too short because then deadlines are hard to meet. Furthermore, deadlines could vary a lot. Can D3 handle a large number of short deadlines, and still meet them?