Reviews: FatTrees

Rodrigo Fonseca

unread,

Oct 25, 2010, 8:10:54 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Please post your reviews as a group response to this message.
Rodrigo

Duy Nguyen

unread,

Oct 25, 2010, 9:45:31 PM10/25/10

to brown-csci...@googlegroups.com

Paper Title:
A Scalable, Commodity Data Center Network Architecture

Authors:
Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date:
2008/SIGCOMM

Novel Idea:
The paper points out the ineffectiveness of current tree-based topology
of switches/routers used in data centers: oversubscription, low bandwidth
and expensive cost. It then proposes a new architecture which is based on
fat-tree topology and uses commodity switches/routers to achieve full
bisection bandwidth.

Main Result/Impact:
I think this work will have big impact in the near future when the needs of
high performance data center at low cost increase. The solution here does
not require any complex infrastructure to build such a topology using commodity
switches/routers. It also shows that by careful learning similar problems solved
in the past (here is super-computers & telephone network), one can come up with
the similar solution for current issues.

Evidence:
Section 5. of the paper describes the testbed in detail, performance number is also
stated clearly in a table.

Prior Work:
Fat-trees and works in super-computers & telephone network.

Reproducibility:
Yes

Question/Criticism:
Beside the cost of expensive core switches/routers, it would be nice if they also
give real number of current bandwidth to let us see how low it is, so that we can
better understand their motivation.

Siddhartha Jain

unread,

Oct 25, 2010, 11:50:57 PM10/25/10

to brown-csci...@googlegroups.com

Title: FatTree

Novel Idea:

Current topologies support only upto roughly 50% utilization of aggregate

available bandwidth. Fat-tree is a new architecture to utilize the available

bandwidth fully using largely commodity switches. Note that fat-tree topologies

have been known for a long time.

Main Results:

The architecture is described and experimental results on power dissipation

and network bandwidth utilization are presented.

Evidence:

FlowScheduler manages to achieve very good network bandwidth utilization while

reducing local congestions.

Prior Work:

InfiniBand, Thinking Machines, SGI

Reproducibility:

The architecture is well described but reproduction would still be a challenge.

On Mon, Oct 25, 2010 at 8:10 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Jake Eakle

unread,

Oct 26, 2010, 3:24:42 AM10/26/10

to brown-csci...@googlegroups.com

Paper Title

A Scalable, Commodity Data Center Network Architecture

Author(s)	Mohammad Al-Fares Alexander Loukissas Amin Vahdat
Date	2008
Novel Idea	As clusters built from commodity hardware become more popular, it has become apparent that inter-node bandwidth is often a major bottleneck. The authors identify inefficiencies in the traditional networking topology used in these clusters as a primary cause, and propose an alternative 'Fat Tree' topology which will allow any node to communicate with any other at full capacity.
Main Result(s)	They identify oversubscription as the core inefficiency present in traditional topologies, and define it as the ratio of the aggregate bandwidth of the hosts to the bisection bandwidth of the network - that is, a larger ratio indicates more wasted bandwidth. They propose a topology, the 'Fat Tree', which can in theory provide a 1:1 oversubscription ratio. It does this using by providing a more densely connected network, such that traffic out of a given pod of nearby nodes can be routed in a multitude of ways through the network. Another primary issue with traditional network topologies is that the top-level switches must be high-end 10GigE machines that cost a lot of money. In the Fat Tree layout, a large number of commodity switches, each responsible for many fewer nodes, defrays the cost considerably -- the price increase per port per bit/sec is not linear, so using more cheaper machines can create a substantial savings. Additionally, the power usage and heat generation per port per bit/sec is not linear either, so Fat Tree deployments waste far less energy for the same performance than those utilizing fewer, higher-end switches. The Fat Tree topology comes with some challenges. For one, standard IP routing protocols tend to choose one path between two points and stick to it, which would immediately negate the advantage conferred by the Fat Tree layout. To offset this, they introduce their own routing protocol, which uses the ID number of each node as a source of deterministic entropy for seeding an even distribution of connections over the available channels. Another challenge is the extremely large number of physical connections required between computers, especially as clusters become large. They present a fairly reasonable-looking model of how the cables can be bundled and arranged to minimize the awkwardness of this, but it is ultimately an unavoidable downside to having each machine at each level be connected to k/2 others (where k is the number of pods and number of nodes per pod).
Impact	I never have anything interesting to write in this section.
Evidence	They evaluate their work on a small virtual cluster running on an even smaller set of actual machines. While their data seems to solidly back up their claims about the theoretical superiority of Fat Tree topology, it leaves many unanswered questions about the real-world feasibility of implementing them. For instance, they do not address how hard it will be for cluster operators to obtain/create the required router-level changes that enable Fat Trees (I presume using Click routers on a large scale is not feasible in the real world).
Prior Work	They use the Click modular router, described in Kohler, Morris, Chen, Jannotti (hi JJ!), and Kaashoek (2000), to test the viability of the routing changes they need to make.
Reproducibility	They provide detailed algorithms for just about everything; reproducing their virtual experiment should be very possible.
Question	It seems like a big part of what makes this awkward is the excessive cabling everywhere - maybe I'm wrong and this is really not a big issue, but I kind of expect it would be. If so, I wonder if it's possible to do some reasoning about the kinds of flows that happen, and strategically remove top layer connections between pods and Core routers that service only pods this pod will talk to infrequently, or second-layer connections between edge routers and aggregation routers they will rarely use. This does require a lot of prior knowledge about the nature of the computation the cluster will be used for though...
Criticism	A mild criticism of the organization of the paper - they frequently reference later results in the beginning section without enough context for the reader to understand what's going on without having read later sections.
Ideas for further work	Like, actually building one, and stuff.

--
A warb degombs the brangy. Your gitch zanks and leils the warb.

Tom Wall

unread,

Oct 25, 2010, 8:22:15 PM10/25/10

to CSCI2950-u Fall 10 - Brown

A Scalable, Commodity Data Center Network Architecture

Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

SIGCOMM 2008

Novel Idea:
Noting that current network topologies require expensive hardware yet
still have scalability and utilization problems, they come up with an
alternative. By utilizing only commodity hardware organized into a fat
tree, they can create better data center networks. Their network uses
a two level routing table to smartly balance traffic amongst
replicated switches at a particular level in the tree. They
demonstrate how to use flow classification and scheduling to further
utilize the network. Finally, they briefly discuss how to deal with
the more complicated wiring required by their topology.

Main Result:
Their fat tree approach appears to yield higher bandwidth utilization
and scale better using commodity hardware. With flow scheduling, they
come very close to the ideal bisection bandwidth for most types of
traffic.

Evidence:
They have only two tests. The first benchmarks various traffic loads
on a small scale network consisting of 36 (20 switches and 16 hosts)
devices virtualized onto 10 machines. The second test uses a large
scale simulated test designed with Click to test the overhead of the
flow scheduling.

Impact:
This has the potentially to greatly reduce the costs for large scale
datacenters.

Reproducibility:
The network topology was described well enough to be easily
reproducible. Their tests, not so much.

Similar Work:
Thinking Machines and SGI put out two similar attempts at using fat
trees to create a more scalable network. They claim Thinking
Machines' solution is too prone to packet reordering, a problem that
their solution avoids. Similarly, the SGI solution, Myrinet, relied on
the host, rather than the switches, to make informed routing decisions
based on round trip times. Infiniband offers a scalable solution but
is incompatible with TCP/IP.

Questions:
They mention that use of TCAMs in their two level routing table
requires lots of power. Thinking back to the power discussion from
last week, how much worse off would they be with a more power
efficient solution? Their limited discussion on power doesn't consider
this.

How is the flow scheduler's performance affected by hardware failures?

How might the situation change with the introduction of the new 100
GbE standard? The expensive, non-commodity solution may become more of
a contender to their approach in some cases.

Criticisms:
It's understandable why they didn't, but it would have been better to
use real configurations to do their tests. Virtualization and
simulation may hide some latencies/complications/downsides to their
solution.

Their discussion on the power and cooling requirements is lacking.
They use the power and heat dissipation of a single hardware device
and expect it to generalize to the numbers of units in their
architecture. Doesn't the heat of one switch affect its neighbors? If
so, since they have more hardware than a traditional data center, I'd
expect heating to be a bigger problem than they predict.

Future Work:
I'd like to see an evaluation on real data center.

Shah

unread,

Oct 25, 2010, 8:20:48 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Title:

A Scalable, Commodity Data Center Network Architecture

Authors:

[1] Mohammad Al-Fares
[2] Alexander Loukissas
[3] Amin Vahdat

Source and Date:

ACM SIGCOMM Computer Communication Review, Volume 38 , Issue 4.
October 2008.

Novel Idea:

The novel idea in this paper is to leverage commodity Ethernet
switches to support aggregate clusters that are made up of tens of
thousands of machines.

Main Result:

The scientists present a data center communication architecture whose
topology is based around the fat-free model. They then present
techniques in routing that are scalable while ensuring backward
compatibility.

Impact:

The authors claim that though they were able to deliver scalable
bandwidth at lower costs, more work needs to be done to test their
architecture.

Evidence:

The authors validate their approach by generating a benchmark suite.
They evaluate the 4-port fat-free by employing the TwoLevelTable
switches, the FlowClassifier and the FlowScheduler.

Prior Work:

In Section 7, the authors list out several other work upon which this
paper is founded. Specifically, they mention OSPF2 and Equal Cost
Multipath - both forwarding techniques.

Competitive Work:

Here, also the researchers allude to several examples - including but
not limited to - Myrinet - that also uses switches that are fat-free,
InfiniBand - that also achieves scalable bandwidth and Torus - another
interconnect topology.

Reproducibility

The authors provide quite a detailed trail of their methodology. It
looks as though their experiments are reproducible.

Question:

Looking at Table 1, one can't help but wonder how quickly the ratio
Cost/GigE is decreasing. Has this fast pace of decrease continued on
in 2010?

Criticism:

The authors themselves acknowledge in Section 2.1.4 that Figure 2
understates trivializes both the difficulty and expense of using
certain kinds of components while building specific architectures.

Ideas for Further Work:

Since the new paradigm seems to be getting rid of high-end systems in
lieu of low-cost, commodity ones, are there other areas in the tech
world that can still benefit from using this?

Basil Crow

unread,

Oct 25, 2010, 11:52:24 PM10/25/10

to brown-csci...@googlegroups.com

Title: A Scalable, Commodity Data Center Network Architecture

Authors: Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat

Date: SIGCOMM 2008

Novel idea: The authors rethink conventional network architecture and develop a novel system which improves inter-node communication bandwidth between datacenters.

Main result(s): The authors modify existing router software in order to take advantage of a Clos topology knows as a fat-tree. This modified system results in significant bandwidth cost savings in comparison to previous systems.

Impact: This system does not require developers to make any major changes to their programs; therefore, it has the potential to add efficiency to a wide array of existing applications at low upfront cost.

Evidence: The authors evaluate their system on a cluster of 20 switches and 16 end hosts and run a variety of benchmarks.

Prior work: The authors base their work on ideas brought to light in 1953 by Charles Clos in the field of telephone switches.

Reproducibility: Although no source code is available, the authors describe their modifications to core algorithms extensively and provide several examples of pseudocode; therefore, their work should be fairly reproducible.

Criticism: If I were a system administrator, I would be hesitant to deploy such experimental changes to well tested systems such as routing.

Joost

unread,

Oct 25, 2010, 11:19:50 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Title: A Scalable, Commodity Data Center Network Architecture

Authors: Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat
Date: SIGCOMM’08, August 17–22, 2008
Novel Idea: By using a FatTree hierarchy similar to that used by super
computers in their parallel processing, the authors propose a change
in network topology such that internal network throughput is massively
increased.
Main Results: The authors proposed solution of network hierarchy, when
coupled with the network flow router, drastically improves throughput
on internal network traffic and scales in a cost efficient manner when
traditional network hierarchies hit the third tier of the tree
approach.
Impact: Utilization of this network topography could reduce power
costs and initial set up costs, however (as the authors acknowledged)
it requires a lot more wiring than traditional structures, which could
incur additional cost.
Evidence: The authors tested the network internally in terms of
several types of tweaking (such as the result which showed the
usefulness of the flow coordinator). However, all tests to external
systems were based on hypothetical extrapolation, as opposed to
concrete tests, nor was there ever a real test between this system and
traditional architecture.
Criticism/Question: How hard would it be to do randomized routing from
the individual nodes to each other as opposed to the deterministic
method now chosen?

On Oct 25, 8:10 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

James Chin

unread,

Oct 25, 2010, 8:43:39 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Paper Title: “A Scalable, Commodity Data Center Network Architecture”

Authors(s): Mohammed Al-Fares, Alexander Loukissas, and Amin Vahdat

Date: 2008 (SIGCOMM ‘08)

Novel Idea: This paper demonstrates how to leverage largely commodity
Ethernet switches to support the full aggregate bandwidth of clusters
consisting of tens of thousands of elements.

Main Result(s): The authors found that they were able to deliver
scalable bandwidth at significantly lower cost than existing
techniques.

Impact: Today, the principle bottleneck in large-scale clusters is
often inter-node communication bandwidth. The authors believe that
larger numbers of commodity switches have the potential to displace
high-end switches in data centers in the same way that cluster of
commodity PCs have displaced supercomputers for high-end computing
environments.

Evidence: The authors generated a benchmark suite of communication
mappings to evaluate the performance of the 4-port fat-tree using the
TwoLevelTable switches, the FlowClassifier and the FlowScheduler. The
compared these methods to a standard hierarchical tree with a 3.6:1
oversubscription ratio, similar to ones found in current data center
designs. As expected, for any all-inter-pod communication pattern,
the traditional tree saturated the links to the core switch, and thus
achieved around 28% of the ideal bandwidth for all hosts in that
case. On the other hand, the two-level table switches achieved
approximately 75% of the ideal bisection bandwidth for random
communication patterns. Also, the FlowScheduler acts on global
knowledge and tried to assign large flows to disjoint paths, thereby
achieving 93% of the ideal bisection bandwidth for random
communication mappings, outperforming all other methods in all the
benchmark tests.

Prior Work: This paper builds upon various efforts in building
scalable interconnects, largely coming out of the supercomputer and
massively parallel processing (MPP) communities. Many MPP
interconnects have been organized as fat-trees, including systems from
Thinking Machines and SGI. Myrinet switches also employ fat-tree
topologies and have been popular for cluster-based supercomputers.

Competitive Work: Popular MPP interconnects include InfiniBand and
Torus. The authors’ proposed forwarding techniques are related to
existing routing techniques such as OSPF2 and Equal-Cost Multipath
(ECMP).

Reproducibility:.The findings appear to be reproducible if one follows
the testing procedures outlined in the paper and has access to the
code that the authors used. The setup does seem to be quite involved,
though..

Question: Are there other Click elements that should be considered
other than the Two-Level Table, FlowClassifier, and FlowScheduler?

Criticism: The paper doesn’t give concrete examples of the cost
savings that could result from using commodity switches instead of
high-end ones.

Ideas for further work: Perform benchmarks on an even larger scale.

On Oct 25, 8:10 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Hammurabi Mendes

unread,

Oct 25, 2010, 10:58:33 PM10/25/10

to brown-csci...@googlegroups.com

Paper Title

A Scalable, Commodity Data Center Network Architecture

Authors

Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date

SIGCOMM'08 - August, 2008

Novel Idea

The paper proposes an architecture based on commodity Ethernet
switches that allows full bisection bandwidth in scalable manner.

Main Results

The fat-tree architecture allows that up to ~27000 nodes can
communicate on full bandwidth on their network interfaces. They
require no changes in the hosts, but require some changes on switches.

Impact

The impact is on the cost of building data centers where nodes can
potentially communicate at full speed of their network interfaces.
Part of this is because the solution is based on the use of commodity
equipment (with no or few modifications).

Evidence

The authors present cost estimates, as well as describing the
architecture and the mechanisms that should be incorporated in the
switches to improve performance in some cases (fairly distributing
routes among ports).

For that matter, they describe a two-level routing table (which they
end implementing in hardware), flow classification and reassignment,
and finally the scheduling of large flows (to minimize route
conflicts).

The paper also proposes a way to pack the equipment in order to tackle
the high demand for wiring.

Prior Work

They mention that techniques for building scalable interconnecting
networks for MPPs is something they base their work on. They mention
some vendors that used fat-trees in such cases (Thinking Machines,
SGI), and they also mention that Myrinet/Infiniband switches use
fat-trees (note that they try to avoid using Myrinet/Infiniband,
giving preference to commodity Ethernet equipment).

Competitive Work

They authors mention toroidal interconnect networks for MPPs, which
could be potentially useful for cluster environments, but the authors
argue that the associated wiring cost is prohibitive, just as
scattering routing decisions.

Finally they cite OSPF2 and ECMP, which they rule out for either
packet reordering (because of the round-robin or random
port-choosing), increase in the table size (when we split prefixes (?
-- see "Questions/Criticism")), and decision oblivious to bandwidth
(the hash-based scheme based only on source/destination addresses).

Reproducibility

The difficulty lies in the technical details involved in the general
architecture and in the particular implementation the authors have
chosen. However, essential information, as the flow classification
algorithm, the flow scheduler methodology, the two-level routing, and
the algorithm for producing the switch routing tables are provided,
except for little parametric details (things as "[...] every _few
seconds [...]" appear in the text).

However, I would say that the experiments are reproducible.

Questions + Criticism

The paper presents the heat dissipation and power demands normalized
to the aggregate bandwidth of the switches. I think that showing those
number in absolute scale is important, because the fat-tree
architecture uses more switches at total.

In this line, how much the system would consume/dissipate when we
consider the TCO over time?

Still related to heating how much the packaging affects such metric?
It appears that it is all about reducing cable length, but cables look
pretty cheap compared to electrical power over the months to cool down
the system.

A little technical question: how exactly the technique for splitting
prefixes works on ECMP?

Ideas for Further Work

The first idea is measuring power consumption and heat dissipation
with more attention, but the authors mention that this is an ongoing
work already.

An interesting thing would be measuring the impact of the fat-tree
architecture on a Map-Reduce system (in the shuffle between the Map
and the Reduce phases).

On Mon, Oct 25, 2010 at 8:10 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Visawee

unread,

Oct 26, 2010, 12:01:30 AM10/26/10

to CSCI2950-u Fall 10 - Brown

Paper Title :
A Scalable, Commodity Data Center Network Architecture

Author(s) :

Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date :

SIGCOMM’08, August 17-22, 2008, Seattle, Washington, USA

Novel Idea :
Interconnecting commodity switches in a fat-tree architecture to
achieve the full bisection bandwidth in a large cluster

Main Result(s) :
FatTree with FlowScheduler achieve 93% of the ideal bisection
bandwidth for random communication mappings.

Impact :
FatTree delivers more bandwidth than existing solutions while
simultaneously requiring lower cost.

Evidence :
The authors set up comparison test on 20 switches and 16 end hosts to
support their claim.
The results show that FatTree with FlowScheduler outperform all other
methods in all the benchmark tests (achieving 93% of the ideal
bisection bandwidth for random communication mappings)

Prior Work :
This work relates to the existing routing techniques such as OSPF2 and
Equal-Cost Multipath (ECMP)

Reproducibility :
The results are reproducible. The authors explain about the
architecture of FatTree together with its detailed implementation very
clearly. The experiments are also explained in detail.

Criticism :
The authors should conduct more experiments on a larger data center to
show the scalability of the architecture.

On Oct 25, 8:10 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Abhiram Natarajan

unread,

Oct 25, 2010, 8:13:48 PM10/25/10

to CSCI2950-u Fall 10 - Brown

Paper Title: A Scalable, Commodity Data Center Network Architecture

Author(s): Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date: 2008, SIGCOMM

Novel Idea: Interconnection of commodity switches in a fat-tree
architecture that helps achieve full bisection bandwidth of clusters
consisting of tens of thousands of nodes

Main Result(s): They design an architecture for data center
communication that has scalable interconnection bandwidth, economies
of scale and backward compatibility

Impact: They argue that their approach will be the only way to deliver
full bandwidth for large clusters once 10 GigE switches become
commodity at the edge, given the current lack of any higher-speed
Ethernet alternatives (at any cost). The cite an instance of their
architecture employing 48-port Eternet switches capable of providing
full bandwidth to 27,648 hosts.

Evidence: They give sufficient evidence that their approach is easy,
requires no modification to the end host network interface, the
operating system or applications. It is fully backward compatible with
Ethernet, IP and TCP.

Prior Work: (1) Scalable interconnects like MPP Communities, Thinking
Machines, SGI, Myrinet switches, InfiniBand, BlueGene/L, Cray XT3 (2)
Forwarding techniques like OSPF2, ECMP

Competitive Work: Using 20 switches and 16 end hosts multiplexed onto
ten physical machines, they perform a series of tests to provide
empirical evidence of the superiority of their architecture.

Reproducibility: They give clear descriptions of most of the
algorithms. Of course, one would need far more details to actually
reproduce their results, but the description in the paper is pretty
thorough.

Idea: Since most hosts have two ethernet ports, the second port could
be used to connecting up to another fat tree probably (?)... Wonder
how that would help....

On Oct 25, 8:10 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Zikai

unread,

Oct 26, 2010, 10:15:17 AM10/26/10

to CSCI2950-u Fall 10 - Brown

Paper Title: A Scalable, Commodity Data Center Network Architecture

Author(s): Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat (UCSD)
Date/Conference: ACM SIGCOMM 2008

Novel Idea: (1) By interconnecting commodity Ethernet switches in a
fat-tree architecture, it is possible to achieve full bisection
bandwidth with backward compatibility in an economic way.
(2) Use two-level route lookup to assist with multi-path routing
across fat-tree to effectively utilize the high fan-out available.
Meanwhile use flow classification with dynamic flow assignment and re-
allocation and global-knowledge based flow scheduling as alternate
multi-path routing methods.
(3) Utilize redundancy of available paths between hosts in fat-tree
topology to design failure broadcast protocol as fault-tolerance
mechanism

Main Results: (1) Design and implement three multi-path routing
methods (two-level route lookup, flow classification and flow
scheduling) together with fault-tolerance mechanism (failure broadcast
protocol) in data center network with fat-tree topology.
(2) Deploy the three routing methods in a small network. Evaluate and
compare them in terms of aggregate bandwidth with standard
hierarchical tree in different scenarios.
(3) Present packaging and placement techniques to ameliorate wiring
complexity for fat-tree topology network.

Evidence: In part5, the three multi-path routing methods are evaluated
in a small network. Test workloads are generated according to
different traffic patterns. Their aggregate bandwidths are measured
and compared with standard hierarchical tree.

Prior Work: Clos topology, fat-tree, Click, CAM(Content Addressable
Memory),

Competitive Work: VL2

Reproducibility: Introductions into all three routing methods, fault-
tolerance mechanisms and experiment settings are detailed and clear.
Though it may take great efforts, it is possible to reproduce the work
and evaluations.

Question: In Table 2, flow scheduling outperforms all other methods.
In Table3, authors test the flow scheduler’s time and memory
requirements. Even when there are 27K hosts, latency and memory usage
are still low. Are these evidences enough to justify the use of a
global knowledge based central scheduler?

Criticism: As authors confess, their work is not fully validated. For
example, fault-tolerance mechanism is not tested. Though they have
some analysis, they do not test the methods in a large data center
network. Therefore, scalability of their methods is not fully proved.

On Oct 25, 8:10 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Matt Mallozzi

unread,

Oct 26, 2010, 11:50:51 AM10/26/10

to brown-csci...@googlegroups.com

Matt Mallozzi

10/26/10

Title:

A Scalable, Commodity Data Center Architecture

Authors:

Al-Fares, Loukissas, Vahdat

Date:

2008

Novel Idea:

Using cheap commodity networking hardware to provide the maximum possible

interconnect bandwidth between any two nodes in a cluster. All this can be

done at lower cost than the previous non-commodity approach and with higher

performance than the previous commodity approach, all while maintaining

backwards compatibility with Ethernet, IP, and TCP.

Main Results:

A network switching topology constructed without any heterogeneity in

switches that is able to achieve a 1:1 oversubscription ratio for Gigabit

Ethernet.

Impact:

This could drastically reduce the cost of building a datacenter network,

while improving the performance. This may make feasible a class of systems

that would have encountered a network bandwidth bottleneck in previous

switching topologies.

Evidence:

Experiments run on a very small cluster, especially compared to their claims

of how large their proposed solution can scale. Pairs of nodes communicate,

each sending a constant amount of traffic, where the pairings are determined

by various communication patterns.

Prior Work:

This architecture is built off of previous work in massively parallel

processing systems, but using commodity hardware rather than expensive

specialized hardware.

Competitive Work:

InfiniBand is a technology that acheieves a similar purpose to FatTrees, but

does so using expensive specialized hardware and without being compatible

with Ethernet, IP, or TCP.

Reproducibility:

Should be fairly reproducible - the pseudocode is short and easy to

understand.

Question:

How would the measured performance change if the evaluation did not involve

pairs of nodes sending data to each other, but more generally where each

node sends data to one node and each node receives data from one node, but

not necessarily the same node?

Criticism:

Their test cluster is far too small to exhibit the scale they claim to have.

Ideas For Further Work:

Test this system more completely, especially with larger clusters and in

relation to heat/power.

On Mon, Oct 25, 2010 at 8:10 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Reply all

Reply to author

Forward