VL2: A Scalable and Flexible Data Center Network
Authors
Albert Greenberg, Srikanth Kandula, David A. Maltz, James R. Hamilton,
Changhoon Kim, Parveen Patel, Navendu Jain, Parantap Lahiri, Sudipta
Sengupta
Date
SIGCOMM'09 - August 2009
Novel Idea
The paper presents a scalable network architecture that leverages high
end-to-end communication using a virtual layer that, besides routing
traffic between servers, can provide network isolation.
Main Results
The VL2 architecture spreads network traffic flows across multiple
paths, leveraging high end-to-end bandwidth, resolves addresses via a
directory service similar to a local ARP table, and still provides
isolation between services through that same table, using only
commodity equipment and unmodified protocols.
Impact
As in the case of the fat-trees, the architecture has impacts on the
cost of building data-centric clusters where efficient end-to-end
communication is vital (such as the ones that shuffle data like
MapReduce).
Evidence
The paper has an interesting "Measurements and Implications" section
which provides some justifications for assumptions and design choices
taken in consideration for the architecture definition. They verify
that most flows are small, and bigger ones are in the range of ~100MB
(which has to do with distributed file system chunks); they also show
that there are innumerous kinds of traffic patterns, so a specialized
technique would not cover all of them; they verify that traffic is
also unpredictable based on an ingoing one; they also argue that
failure is concentrated on a small fraction of the equipment. Again,
this section gives us insight on some of their arguments and design
choices.
For the performance evaluation part, they show that they come close to
full end-to-end communication performance, performance isolation (in
the sense that spikes in one flow do not affect the other), and
graceful degradation of performance under failures.
Prior Work
They build upon the Clos network design, the Valiant Load Balancing
scheme, ECMP, and they also mention that they use a Paxos
implementation for the design of the directory service.
Competitive Work
They mention Monsoon and Fat-Tree as alternatives that also use the
Clos topology, particularly the fat-tree custom routing tables. They
claim that they achieve a close-to-optimum performance with a simpler
architecture than fat-tree.
Other systems such as DCell and BCube are also mentioned.
Reproducibility
There is no detailed description of the evaluation environment, and
the description of the system, although technical, is meant to
overview the network architecture and to justify the assumptions and
design choices taken into consideration. Therefore, it appears
difficult to reproduce the performance analysis, particularly, but I
think this does not demerit the paper in the technical sense.
Questions + Criticism
[Criticism] I think that the VL2 architecture has indeed excellent
results and it is architecturally simpler than the fat-tree approach.
However, it would be interesting to see how other communication
patterns, besides total shuffling, behave on a VL2 network. The
fat-tree paper appears to make more adequate tests on this matter
(note that on the random test, they get 93.5% of the peak performance,
while VL2 gets 94% - the same in practice). [Question] How the VL2
measures against the other tests mentioned in the fat-tree paper?
Ideas for Further Work
The first idea that came to me was comparing a similar (*in cost*) VL2
and Fat-Tree network on different communication patterns (as discussed
in the fat-tree paper).
On Mon, Oct 25, 2010 at 8:11 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:
Authors: Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta
Date: SIGCOMM 2009
Novel idea: The authors rethink conventional network architecture and develop a novel system in which network topology plays a negligible role in data transfer and performance. The authors claim their system has the property of agility: "the capacity to assign any server to any service."
Main results: The authors implement an alternative network infrastructure, VL2, which possesses the abovementioned qualities of "agility." They also implement Valiant Load Balancing to spread traffic across all available paths without any centralized coordination.
Impact: VL2 does not require developers to make any major changes to their programs in order for them to run under VL2; therefore, it has the potential to add efficiency to a wide array of existing applications at low upfront cost.
Evidence: The authors implemented VL2 on an 80 server testbed using 10 commodity switches. They conducted an all-to-all data shuffle stress test, in which their prototype sustained an efficiency of 94% with a TCP fairness index of 0.995.
Prior work: THe authors employ Valiant Load Balancing (a technique revealed in 2004 in HotNets).
Competitive work: The authors cite inspiration from the early Fat-tree paper by Al-Fares et al as well as Monsoon; however, their changes to existing systems are less invasive.
Reproducibility: Few details are given about the implementation of the directory server, so it appears that the system would be difficult to reproduce.
Criticism: If I were a system administrator, I would be hesitant to deploy such experimental changes to well tested systems such as routing.