Reviews: VL2

Rodrigo Fonseca

unread,

Oct 25, 2010, 8:11:23 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Please post your reviews as a group reply to this message.
Rodrigo

Abhiram Natarajan

unread,

Oct 25, 2010, 10:19:14 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Paper Title: VL2: A Scalable and Flexible Data Center Network

Author(s): Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth
Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Paveen Patel,
Sudipta Sengupta

Date: 2009, SIGCOMM

Novel Idea: Usage of (1) Flat addressing to allow service instances to
be placed anywhere in the network (2) Valiant Load Balancing to spread
traffic uniformly across network paths (3) End-System based address
resolution to scale to large server pools, without introducing
complexity to the network control plane.

Main Result(s): VL2 is a practical network architecture that scales to
support huge data centers with uniform high capacity between servers,
performance isolation between services, and Ethernet layer-2
semantics.

Impact: VL2’s implementation leverages proven network technologies,
already available at low cost in high-speed hardware implementations,
to build a scalable and reliable network architecture.

Evidence: The authors claim that VL2 networks are readily deployable
today, and that they have built a working prototype.

Prior Work: There seems to be a lot of related work in many areas.
(1) Data-center network designs: Monsoon, Fat-tree, DCell, BCube
(2) Valiant Load Balancing
(3) Scalable Routing - Locator/ID Separation Protocol, SEATTLE
(4) Commercial Networks - Data Center Ethernet (DCE)

Competitive Work: The authors thoroughly analyse the pros and cons of
the VL2 design using measurement, analysis, and experiments. The show
that their prototype shuffles 2.7 TB of data among 75 servers in 395
seconds, which amounts to 94% of the maximum possible.

Reproducibility: Would be pretty hard given that a lot of
architectural elements seem pretty complicated, for example the
directory structure itself.

Criticism: The directory service seems difficult to implement! Also, I
am of the opinion that the authors should have provided performance
details about VL2.

Question: Why is it really better than Fat-Trees? Isn't centralised
control of switches (not there in VL2 and there in Fat-trees) a good
thing???

Visawee

unread,

Oct 25, 2010, 11:34:46 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Paper Title :
VL2: A Scalable and Flexible Data Center Network

Author(s) :
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula,

Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, Sudipta
Sengupta

Date :
SIGCOMM’09, August 17-21, 2009, Barcelona, Spain

Novel Idea :
A new network architecture that scales to support huge data centers

with uniform high capacity between servers, performance isolation
between services, and Ethernet layer-2 semantics.

Main Result(s) :
VL2 provides an effective substrate for a scalable data center
networks. It acheives
(1) 94% optimal network capacity
(2) a TCP fairness index of 0.995
(3) graceful degradation under failures with fast reconvergence
(4) 50K lookups/sec under 10ms for fast address resolution

Impact :
The cloud service programmer don’t have to aware much about network
bandwidth constraints.
VL2 also enables agility- any service can be assigned to any server,
while the network maintains uniform high bandwidth and performance
isolation between services.

Evidence :
The authors set up several experiments running on an 80 server testbed
and 10 commodity switches to support their claim about VL2. The
results from the experiments show that
1. VL2 provides uniform high capacity (94% efficiency and the fairness
index of 0.995)
2. VL2 provides VLB Fairness (the VLB split ration fairness index
averages more than 0.98 for all Aggregation switches over the duration
of the experiment)
3. VL2 provides performance isolation (one service’s goodput is
unaffected as another service ramps traffic up and down)
4. VL2 converges after link failures
5. Directory-system performs very well (it provides high throughput
and fast response time for lookups)

Prior Work :
Valiant Load Balanceing: it has been used in this work to provide
uniform capacity and performance isolation

Reproducibility :
The results are reproducible. The authors explain about the
architecture of VL2 very clearly. The experiments are also explained
in detail.

Criticism :
The authors should conduct more experiments on a larger data center to
show the scalability of the architecture.

On Oct 25, 8:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Shah

unread,

Oct 25, 2010, 8:22:12 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Title:

VL2: A Scalable and Flexible Data Center Network

Authors:

[1] Albert Greenberg
[2] James R. Hamilton
[3] Navendu Jain
[4] Srikanth Kandula
[5] Changhoon Kim
[6] Parantap Lahiri
[7] David A. Maltz
[8] Parveen Patel
[9] Sudipta Sengupta

Source and Date:

Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication,
Barcelona, Spain.

Novel Idea:

The novel idea in this paper is the introduction of VL2, a network
architecture that scales to support huge data centers with certain
specifications.

Main Result:

The authors strive to show that their working model of VL2 terminates
the need for oversubscription in the data center network. They also
state VL2 can be realized with existing technology and infrastructure.

Impact:

Although this is not mentioned in the paper, the impacts of this
architecture look promising - because little needs to change with
regards to infrastructure when it comes to implementing VL2.

Evidence:

The authors list in detail the hardware specs - but don't mention
their methodology with as much rigor.

Prior Work:

Though there is not a clear section on any prior work, the researchers
mention that work along this line was conducted by the authors of
Monsoon and Fat-tree.

Competitive Work:

The authors mention that other work like DCell and it's successor,
BCube. They also talk about other work that's centered around Valiant
Load Balancing, Scalable Routing and Commercial Networks.

Reproducibility

The authors don't give enough details to make this paper reproducible.

Question:

Has this idea, since it requires no inherent changes, caught on?

Criticism:

The evaluation section in the paper is rather short. The authors don't
clearly detail out their methodology with regards to testing.

Ideas for Further Work:

Have the authors been able to make a commercial or larger-scale
version of VL2? They mention that they have a working prototype for a
proof-of-concept but does this suffice?

Duy Nguyen

unread,

Oct 25, 2010, 9:48:08 PM10/25/10

to brown-csci...@googlegroups.com

Paper Title:

VL2: A Scalable and Flexible Data Center Network

Authors:

Albert Greenberg, James R. Hamilton, Navendu Jain

Date:
2009/SIGCOMM

Novel Idea:
Another network architecture designed to support for large scale data center.
To achieve server agility, author's ideas are using flat addressing, valiant
load balancing and end-system based address resolution.

Main Result
The authors first describe current data center architecture which have many
drawbacks like: limited server-to-server capacity, fragmentation of resources,
poor reliability and utilization. Then they describe their solution and provide
evaluation in details

Impact:
I think the strong point of this work is that servers can be place anywhere in
the network which are very important for fast deployment and reduce maintenance
overhead.

Evidence:
The testbed is described in section 5. Major goals the authors evaluated are:
uniform capacity (by doing data shuffle stress test), fairness (data is
distributed fairly accoss the network), performance isolation (agility),...

Prior Work:
Based on Valiant Load Balancing and many techniques in data center networking:
Moonsoon, Fat-tree,...

Reproducibility:
Yes

Question/Criticism:
N/A

Hammurabi Mendes

unread,

Oct 25, 2010, 10:56:53 PM10/25/10

to brown-csci...@googlegroups.com

Paper Title

VL2: A Scalable and Flexible Data Center Network

Authors

Albert Greenberg, Srikanth Kandula, David A. Maltz, James R. Hamilton,
Changhoon Kim, Parveen Patel, Navendu Jain, Parantap Lahiri, Sudipta
Sengupta

Date

SIGCOMM'09 - August 2009

Novel Idea

The paper presents a scalable network architecture that leverages high
end-to-end communication using a virtual layer that, besides routing
traffic between servers, can provide network isolation.

Main Results

The VL2 architecture spreads network traffic flows across multiple
paths, leveraging high end-to-end bandwidth, resolves addresses via a
directory service similar to a local ARP table, and still provides
isolation between services through that same table, using only
commodity equipment and unmodified protocols.

Impact

As in the case of the fat-trees, the architecture has impacts on the
cost of building data-centric clusters where efficient end-to-end
communication is vital (such as the ones that shuffle data like
MapReduce).

Evidence

The paper has an interesting "Measurements and Implications" section
which provides some justifications for assumptions and design choices
taken in consideration for the architecture definition. They verify
that most flows are small, and bigger ones are in the range of ~100MB
(which has to do with distributed file system chunks); they also show
that there are innumerous kinds of traffic patterns, so a specialized
technique would not cover all of them; they verify that traffic is
also unpredictable based on an ingoing one; they also argue that
failure is concentrated on a small fraction of the equipment. Again,
this section gives us insight on some of their arguments and design
choices.

For the performance evaluation part, they show that they come close to
full end-to-end communication performance, performance isolation (in
the sense that spikes in one flow do not affect the other), and
graceful degradation of performance under failures.

Prior Work

They build upon the Clos network design, the Valiant Load Balancing
scheme, ECMP, and they also mention that they use a Paxos
implementation for the design of the directory service.

Competitive Work

They mention Monsoon and Fat-Tree as alternatives that also use the
Clos topology, particularly the fat-tree custom routing tables. They
claim that they achieve a close-to-optimum performance with a simpler
architecture than fat-tree.

Other systems such as DCell and BCube are also mentioned.

Reproducibility

There is no detailed description of the evaluation environment, and
the description of the system, although technical, is meant to
overview the network architecture and to justify the assumptions and
design choices taken into consideration. Therefore, it appears
difficult to reproduce the performance analysis, particularly, but I
think this does not demerit the paper in the technical sense.

Questions + Criticism

[Criticism] I think that the VL2 architecture has indeed excellent
results and it is architecturally simpler than the fat-tree approach.
However, it would be interesting to see how other communication
patterns, besides total shuffling, behave on a VL2 network. The
fat-tree paper appears to make more adequate tests on this matter
(note that on the random test, they get 93.5% of the peak performance,
while VL2 gets 94% - the same in practice). [Question] How the VL2
measures against the other tests mentioned in the fat-tree paper?

Ideas for Further Work

The first idea that came to me was comparing a similar (*in cost*) VL2
and Fat-Tree network on different communication patterns (as discussed
in the fat-tree paper).

On Mon, Oct 25, 2010 at 8:11 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Siddhartha Jain

unread,

Oct 25, 2010, 11:50:13 PM10/25/10

to brown-csci...@googlegroups.com

itle: VL2

Novel Idea:

A new architecture to make big datacenters more agile - i.e. more able to support

a variety of services and spikes in the usage of those services without performance

degradation in other services

Main Results:

The architecture is described and some experimental results on a working prototype

are presented.

Impact:

Could be useful in the future as more services move on to the cloud

Evidence:

Good results. Performance closely matches theoretical expectations. TCP fairness

is provided as traffic is evenly split across the network.

Prior Work:

Monsoon, Fat-tree, Dcell and Bcube

Reproducibility:

Not reproducible as will require a cluster and source code doesn't seem to be

available

Question:

We know that a lot of current applications can for instance be deployed on MapReduce.

How would a network configured for a distributed computation framework like MapReduce

perform against VL2 both in raw performance and the variety of applications that

could be run well on VL2 and the network.

How much more would VL2 cost to maintain vs. traditional configurations.

On Mon, Oct 25, 2010 at 8:11 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Basil Crow

unread,

Oct 25, 2010, 11:37:08 PM10/25/10

to brown-csci...@googlegroups.com

Title: VL2: A Scalable and Flexible Data Center Network

Authors: Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta

Date: SIGCOMM 2009

Novel idea: The authors rethink conventional network architecture and develop a novel system in which network topology plays a negligible role in data transfer and performance. The authors claim their system has the property of agility: "the capacity to assign any server to any service."

Main results: The authors implement an alternative network infrastructure, VL2, which possesses the abovementioned qualities of "agility." They also implement Valiant Load Balancing to spread traffic across all available paths without any centralized coordination.

Impact: VL2 does not require developers to make any major changes to their programs in order for them to run under VL2; therefore, it has the potential to add efficiency to a wide array of existing applications at low upfront cost.

Evidence: The authors implemented VL2 on an 80 server testbed using 10 commodity switches. They conducted an all-to-all data shuffle stress test, in which their prototype sustained an efficiency of 94% with a TCP fairness index of 0.995.

Prior work: THe authors employ Valiant Load Balancing (a technique revealed in 2004 in HotNets).

Competitive work: The authors cite inspiration from the early Fat-tree paper by Al-Fares et al as well as Monsoon; however, their changes to existing systems are less invasive.

Reproducibility: Few details are given about the implementation of the directory server, so it appears that the system would be difficult to reproduce.

Criticism: If I were a system administrator, I would be hesitant to deploy such experimental changes to well tested systems such as routing.

James Chin

unread,

Oct 25, 2010, 8:45:25 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Paper Title: “VL2: A Scalable and Flexible Data Center Network”

Authors(s): Albert Greenberg, James R. Hamilton, Navendu Jain,

Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz,
Parveen Patel, and Sudipta Sengupta

Date: 2009 (SIGCOMM ‘09)

Novel Idea: This paper presents VL2, a practical network architecture

that scales to support huge data centers with uniform high capacity
between servers, performance isolation between services, and Ethernet

layer-2 semantics. VL2 uses (1) flat addressing to allow service
instances to be placed anywhere in the network, (2) Valiant Load
Balancing to spread traffic uniformly across network paths, and (3)
end-system based address resolution to scale to large server pools,

without introducing complexity to the network control plane.

Main Result(s): The authors’ VL2 prototype shuffles 2.7 TB of data
among 75 servers in 395 seconds -- sustaining a rate that is 94% of
the maximum possible with a TCP fairness index of 0.995.

Impact: To be profitable, data centers for cloud services must achieve
high utilization, and the key to this is the property of agility --
the capacity to assign any server to any service. Unfortunately, the
designs for today’s data center networks prevent agility, but the
authors overcome these limitations by building a network that meets
the following three objectives: uniform high capacity, performance
isolation, and Layer-2 semantics.

Evidence: The authors evaluated VL2 by using a prototype running on an
80-server testbed and 10 commodity switches. Their goals were to show
that VL2 could be built from components that are available today, and
second, that their implementation met the objectives described in
Section 1 of the paper.

Prior Work: Valiant introduced VLB as a randomized scheme for
communication among parallel processors interconnected in a hypercube
topology. VLB has also been proposed, with modifications and
generalizations, for oblivious routing of variable traffic on the
Internet under the hose traffic model.

Competitive Work: Related work includes proposals in datacenter
network designs (Monsoon and Fat-tree), scalable routing (Locator/ID
Separation Protocol and SEATTLE), and commercial networks (Data Center
Ethernet).

Reproducibility: This paper is vague on how VL2 was evaluated exactly,
so the results are not readily reproducible.

Question: Would VL2 vastly improve the performance of many of today’s
cloud services?

Criticism: The paper doesn’t describe exactly how VL2 was evaluated.

Ideas for further work: Perform the evaluation of VL2 on an even
larger scale.

On Oct 25, 8:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Tom Wall

unread,

Oct 25, 2010, 8:43:14 PM10/25/10

to CSCI2950-u Fall 10 - Brown

VL2: A Scalable and Flexible Data Center Network

Albert Greenberg James R. Hamilton Navendu Jain
Srikanth Kandula Changhoon Kim Parantap Lahiri
David A. Maltz Parveen Patel Sudipta Sengupta
SIGCOMM 2009

Novel Ideas:
Traditional data centers have problems with load balancing,
application isolation and bandwidth utilization. They aim to to fix
these problems with a virtualized routing solution. With VL2, each
application running in a data center thinks it is running alone, and
has full, unrestricted access to the network. Behind the scenes, VL2
acts as a layer two routing mechanism and does valiant load balancing
to maximize available bandwidth and allocate resources. A new
distributed directory system is used to separate IP addresses (called
AA's) from physical locations (LA's). This allows for portability of
applications, facilitating migration while also enhancing scalability
by avoiding the traffic from ARP and DHCP multicasts that tends to
hinder large networks.

Main Result:
They implement a smallish VL2 prototype containing 80 servers. VL2
addresses the concerns that they outlined and delivers better fairness
and performance than traditional data center architectures.

Evidence:
They run a number of tests on their prototype to try to address the
questions and concerns put forth in their design. Their first test has
each server sending 500 MB of data to each of the other servers for a
total of 2.7 TB of bandwidth. It does so relatively quickly and with
a high degree of network utilization. Their next test demonstrates
that VL2 is mostly fair, coming withing 96% to 100% of optimal
fairness depending on the workload. Finally they test the system's
agility and isolation characteristics with promising results. They
also demonstrate that their directory system scales well and doesn't
really hinder performance,

Impact:
This seems like a great way to run your data center, if Microsoft
shares it!

Reproducibility:
While the paper provides a good overview, it is a complex system and
they are pretty light on some of the details, so it might be tough to
rebuild this system. The tests were for the most part very well done
though, and should be easy to verify if you had access to VL2.

Similar Work:
Work with Fat Trees such as today's other paper and Monsoon address
similar problems of scalability and bandwidth utilization. VL2 is
similar in those aspects, but it also does a lot more with regards to
isolation and fault tolerance. DCell and BCube are similar projects
which similarly rely on the server rather than the switch for more of
the routing than normal, however, they aren't as scalable as VL2.

Questions/Criticisms:
How many servers are typically used per application, and how many
applications run in a data center? They have 80 servers, but their
tests in 5.3 only use two applications on 37 servers. This test
reports good results, but is that really fair when they are only using
half of the network?

Future Work:
Perhaps try implementing VL2 using the fat trees from the other
paper. They mention in 4.1 why they don't do that (reduced wiring and
simplified load balancing) but dealing with these complications might
help reduce costs by dropping the need for the higher bandwidth, more
expensive 40GbE switches.

On Oct 25, 8:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Zikai

unread,

Oct 26, 2010, 10:14:45 AM10/26/10

to CSCI2950-u Fall 10 - Brown

Paper Title: VL2: A Scalable and Flexible Data Center Network
Author(s): Albert Greenberg (Microsoft Research) et.al.
Date/Conference: ACM SIGCOMM 09

Novel Idea: (1) VL2’s design explores a new split in responsibilities
between host and network. Specifically, it creates a network
architecture by making minimal changes to hardware of commodity
switches or servers and keeping compatibility with legacy
applications. Extensive modifications are on software and OS (layer
2.5 shim in servers’ network stack).
(2) Apply VLB (Valiant Load Balancing) in a new context: the inter-
switch fabric of a data center. VLB helps to spread traffic across all
available paths without any centralized coordination or traffic
engineering.

Main Results: (1) Perform a thorough study into traffic patterns in a
production data center. The results show that traffic patterns are
highly divergent and the hierarchical topology is intrinsically
unreliable.
(2) Design, build and deploy VL2 in a small server cluster. Run a
series experiments to validate whether VL2 has achieved uniform high
capacity, performance isolation and layer-2 semantics.

Evidence: In part5, authors evaluate VL2 using a prototype running on
a small cluster. They test whether VL2 provides uniform high capacity,
VLB fairness and performance isolation. They also measure how fast VL2
converges after link failures and performance of the directory system.

Prior Work: Clos topology, Valiant Load Balancing, separation of
server names (application-specific addresses) from their locations
(location-specific addresses), Paxos, hose model

Reproducibility: The paper’s introduction on VL2 design in part4
covers generally techniques used and system architectures. However,
when implementing it in real, there are many vague points needed to be
experimented out. Though the design is hard to reproduce, if it is
available from Microsoft, the experiment part is relatively easy to
reproduce.

Question: Is it possible that some traffic other than unlimited-rated
UDP and large numbers of short TCP connections violate the hose model
but still common in data centers? (If so, VL2 may lose performance
isolation in some situations)

Criticism:
When authors do their evaluations in Part 5, they test individual
objectives separately. Each experiment is set for validating one
property. None is able to prove that all these properties can be
achieved together. For example, in performance isolation evaluation
(part 5.3), only two services exist and the traffic they are
generating are far below that generated in part 5.1. Therefore, when
multiple services are running at near-maximum capacity, performance
isolation may be lost.

Another potential problem is that even when VL2 work perfectly in a
small cluster (80 machines and 10 switches), whether it can scale to a
data center with thousands of machines and hundreds of switches is
doubtful.

On Oct 25, 8:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Matt Mallozzi

unread,

Oct 26, 2010, 11:50:13 AM10/26/10

to brown-csci...@googlegroups.com

Matt Mallozzi

10/26/10

Title:

VL2: A Scalable and Flexible Data Center Network

Authors:

Greenberg, Hamilton, Jain, Kandula, Kim, Lahiri, Maltz, Patel, Sengupta

Date:

2009

Novel Idea:

Make a large cluster appear to be interconnected by one gigantic switch

with regards to performance and addressing. Also, try to isolate unrelated

services from one another, making each service appear as if it has its own

switched network. And do this all using cheap commodity hardware.

Main Results:

A working system that meets the goals. This system removes the need for

oversubscription in network design - each node in the cluster can

communicate equally quickly with any other node, no matter if they are in

the same rack or not. This inter-node bandwidth approaches the maximum

possible from each node's network interface card.

Impact:

This could have a huge impact on cloud providers, especially software as a

service - this allows virtual machines to be migrated to any system in the

cluster rather than just a "close" one, as the virtual layer 2 semantics

allow a virtual machine to keep its IP address rather than needing to

venture outside of a small VLAN.

Evidence:

Experiments on another small cluster, but more convincing analysis than that

of FatTrees.

Prior Work:

This system draws inspiration from Valiant Load Balancing and Scalable

Routing systems.

Competitive Work:

This builds upon the work of FatTrees, mainly adding the isolation property

between services and the layer 2 semantics while modifying operating systems

rather than routers/switches.

Reproducibility:

Not as reproducible as FatTrees - even though I've given up on seeing code

open-sourced, some explicity pseudocode is always nice.

Criticism:

The cluster from which they gather data about network traffic is much

larger than their experimental cluster - it seems like one of these clusters

would be outside the intended use of their system.

Ideas For Further Work:

Test common virtual machine migration patterns over VL2.

On Mon, Oct 25, 2010 at 8:11 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Reply all

Reply to author

Forward