Reviews for Alfares08FatTree

Rodrigo Fonseca

unread,

Mar 13, 2013, 9:26:10 PM3/13/13

to csci2950u-...@googlegroups.com

Hi,

Please post the reviews for the FatTree paper as a group reply to this message.

Thanks!

Rodrigo

Shu Zhang

unread,

Mar 13, 2013, 10:10:13 PM3/13/13

to csci2950u-...@googlegroups.com

The paper is about the data center topology. Not like previous papers we read in class, this paper is more concerned about the practical problems, like expense in constructing a data center, the energy consumption, the wire length and related expense, etc.
At first , the paper argues that the principle bottleneck in large-scale clusters is often inter-node communication bandwidth. So the paper wants to propose a new topology that could maximize the communication bandwidth. Besides this goal, the design also wants to consider the economies of scale and backward compatibility. The core of this paper is the architecture of the fat tree.
So the paper talks about the current data center network topologies before discussing the fat-tree. One problem is oversubscription. Oversubscription is used to split the bandwidth of powerful switches to spare the bandwidth. But the problem is that it also means under-utilization of full bandwidth. Another problem is that the multi-path routing is static algorithm. Also, the cost of current architectures are high because they use expensive switches.
The fat-trees deals with it in a opposite way. It is inspired by the fat tree in the telephone network. So fat tree uses low-cost switches to construct a multi-root tree. The basic element is pods which contains a bunch of switches. The pod is divided into two levels, and low level switches as long as the hosts connected to them are formed subnets.
So the motivation is that spreading outgoing traffic from any given pod as evenly as possible among the core switches. Here the paper assumes the administrator knows everything about the network and switches and hosts are connected previously. Then the paper advocates a two-level routing table being mounted on pod switches. The routing table in core switches are still one level because they just need to know which pod to deliver packets.
The routing algorithm is very straightforward if we have already mounted the routing table correctly. But there are something to optimize. First the switches could be implemented to identify packets belonging to one flow. Another is that the load balancing of flows. And another thing to concern is fault-tolerance. So the topology could guarantee that there is always another link for backup. So one tech we should notice is that when there is a link failure, a broadcast should be done for notifying others the situation. It is a little different from traditional routing algorithms.
An interesting part is 3.9. The paper compares the energy consumption of fat-tree so as to show this method is more energy-saving, and the result is that the user could pay less on that .In part 4, the paper builds the prototype on Click, each functionality is a module in Click.
Table 2 and table 3 are the main results of experiments. The best performance belongs to flat-tree with two-level table plus flow classification plus flow scheduling. So the most obvious result is that the utilization is almost 100% in many test cases.
The 6th section talks about an interesting issue, the cost of wires connecting so many switches. The paper proposes a rack-based solution and also considers how to shorten the distances of pods to reduce the cost.

Zhiyuan "Eric" Zhang

unread,

Mar 13, 2013, 10:14:04 PM3/13/13

to csci2950u-...@googlegroups.com

Paper Title

A Scalable, Commodity Data Center Network Architecture

Authors

Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date

SIGCOMM’08, August 17–22, 2008, Seattle, Washington, USA.

Novel Idea

Large-scale data centers usually have significant bandwidth requirement. However, in the traditional tree-like architecture the inter-node communication bandwidth has become the principle bottleneck. This paper presents an innovative data center network architecture addressing this problem. The novel idea is the fat-tree topology with commodity switches: the high fan-out of this architecture allows outgoing traffic of a single pod to be spread as evenly as possible among the core switches. Specifically, for each pair of hosts there are multiple paths connect them so that the demand of the core switches can be significantly reduced.

Impact

The architecture described in this paper is very attractive to data center design. It brings significant performance enhancement with commodity switches while has no other difficult requirement.

Evidence

The authors describe their experiments with a 4-port fat-tree network based on Click. The result shows that fat-tree architecture has better bandwidth and efficiency, which supports the initial claims.

Prior Work

The flat-tree and similar ideas are proposed in supercomputers and massively parallel processing communities.

Reproducibility

The experiment requires multiple machines, which makes it a little harder to reproduce the result. However, based on the scale of the problem they are trying to solve, I would say it's easier than it should be.

Comments

This paper provides an insight of the technologies in modern data centers, as well as their pros and cons and the problems in the data center design. It's fun to read and really helpful to understand the motivation of this paper (and many others!). The paper has nice figures and the model is well described.

The only complain I have is the experiment doesn't seem to be very convincing. The result would be much stronger if they can test on larger scale environment

Jeff Rasley

unread,

Mar 13, 2013, 11:04:30 PM3/13/13

to csci2950u-...@googlegroups.com

Authors: Mohammad Al-Fares, Alex Loukissas, & Amin Vahdat (UCSD)

Context: Sigcomm 2008

I enjoyed the detailed explanation about the problems with current topologies, this helped put a lot of data center discussion into context (even if it is ~5 years old).

Novel Idea: Extended Clos Topology/Fat-Tree to overcome IP routing issues in order to support large commodity-based hosts/switches.

Main Results: This paper discusses a data center topology that supports k^3 / 4 hosts, where k relates to the number of ports on the switches used. This leads to much cheaper and larger data centers than previously seen.

Prior Work: Clos Topology (1953) and Fat-tree (1985) seem to be the closest work. They bring this up near the start of the paper and then discuss their changes to it. In the related work section however they only really bring up supercomputer and massively parallel processing systems.

Criticism/Reproducibility: This seems very difficult to correctly evaluate without a large number of machines/switches. Their method should give a rough approximation to their success. I would love to see numbers from an organization that actually uses this. I imagine, based on this paper's popularity, that this technique has actually been used...?

On Wednesday, March 13, 2013 9:26:10 PM UTC-4, Rodrigo Fonseca wrote:

Christopher Picardo

unread,

Mar 13, 2013, 11:12:19 PM3/13/13

to csci2950u-...@googlegroups.com

Paper Title:

A Scalable, Commodity Data Center Network Architecture

Authors:

Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat

Date:

August 17-22, 2008.

Novel Idea:

The paper presents a data center communication architecture that leverages commodity Ethernet switches to deliver scalable bandwidth for large-scale clusters.

Basic requirements: (1) uses scalable interconnect bandwidth, (2) considers economy of scale, and (3) is backward compatible.

Main results:

The new architecture permits to deliver scalable bandwidth at significant lower cost than existing techniques (i.e.: extends the work of Massively parallel interconnects organized as fat-trees like the systems from Thinking Machines and SGI, Myrinet switches, and the InfiniBand interconnect).

Experiment Results:

- Two-level table switches achieve approximately 75% of the ideal bisection bandwidth for random communication patterns.

- The flow classifier outperforms both the traditional tree and the two level-table in all cases, with a worst-case bisection bandwidth of approximately 75%. The flow classifier has dynamic flow assignment and does re-allocation.

- The FlowScheduler act on global knowledge and tries to assign large flows to disjoint paths achieving 93% of the ideal bisection bandwidth for random communication mappings, and outperforming all other methods in all the benchmark tests.

- Effective Packaging to reduce wiring overhead eliminate most of the required external wiring and reduces the overall length of required cabling, which in turn simplifies cluster management and reduces costs.

Impact:

Less cabling and infrastructure in general entails a lean packaging solution that consumes less power and requires less heat dissipation

The fat-tree topology is fault-tolerant and the use of a simple failure broadcast protocol allows switches to route around link or switch failures bi-directionally between neighboring switches.

Furthermore, the scheduler marks any link reported to be down as busy or unavailable, thus disqualifying any path that includes it from consideration, in effect routing large flows around the fault.

Routing large flows represent the most important role in figuring out the achievable bisection bandwidth of a network. The researchers in this paper schedule large flows to minimize overlap with one another and a central scheduler makes this choice, with global knowledge of all active large flows in the network (i.e. using edge switches).

Evidence:

The paper shows that by interconnecting commodity switches in a fat-tree architecture it is possible to obtain the full bisection bandwidth of clusters consisting of tens of thousands of nodes. One instance of the prototyped architecture uses 48-port Ethernet switches, which can provide full bandwidth to up to 27648 hosts at lower cost than existing solutions.

Prior work:

Work inspired by the torus topology of IBM’s super computer BlueGene/L, the Cray XT3, and existing routing techniques like OSPF2, and ECMP (Equal Cost Multipath)

Question:

How do we program this new data center network architecture at a higher level of abstraction? It seems we are limited to program it at the switch and/or router level within a fat-free topology; will it work in and/or interact with a non-universal topology [1], which according to [1] seem to be more prevalent?

Furthermore, a centralized fat-tree topology, although scalable, might not be convenient for other applications that require a wider/distributed configuration.

[1] Charles E. Leiserson Fat-trees: universal networks for hardware-efficient supercomputing, IEEE Transactions on Computers, Vol. 34 , no. 10, Oct. 1985, pp. 892-901.

DTrejo

unread,

Mar 14, 2013, 12:09:26 AM3/14/13

to csci2950u-...@googlegroups.com

Paper: A Scalable, Commodity Data Center Network Architecture by Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Novel Idea: Use commodity network hardware in a fat-tree configuration.

Results: Their fat-tree configuation dramatically reduces oversubscription, power costs, and cooling costs.

Impact: Viable and preferable network topology that is more efficient than the status quo, with the network elements on a whole costing less.

Evidence: Their test network that used Click, and their simulations.

Prior work: MPP systems like infiniband and torus.

Reproducibility: Relatively high: code written was on the order of hundreds of lines; thorough understanding of wiring and routers also neccesary.

Criticism: Did not try their fat-tree approach in a real datacenter (cost-prohibitive, yes).

On Wednesday, March 13, 2013 9:26:10 PM UTC-4, Rodrigo Fonseca wrote:

kmdent

unread,

Mar 13, 2013, 11:20:47 PM3/13/13

to csci2950u-...@googlegroups.com

A Scalable, Comodity Data Center Network Architecture by M. Al-Fares, A. Loukissas, A. Vahdat

Novel Idea: Costs of keeping nodes connected with at 1:1 oversubscription is extremely high as the data center expands. They propose using a fat tree architecture in data centers which allows 100% bandwidth for connected hosts on commodity switches for a fraction of the cost. This approach requires no modification to the end host network interface or the operating system.

Main Results: Using the fat tree method, they are able to deliver scalable bandwidth at much lower cost than other architectures. The fat tree uses an altered routing algorithm which allows for load balancing as well as the added benefit of fault tolerance.

Evidence: They test the fat tree against the generic hierarchical tree structure. They ran lots of tests including random send to another host, send to a random host on subnet with a probability, hosts in the same subnet send to hosts elsewhere, etc. On each of the tests, the fat tree did either better or the same as the traditional tree on bandwith.

Impact: This topology has been used in thinking machines as well as SGI machines. Myrinet switches use the fat tree architecture which are used commonly when constructing supercomputers.

Prior Work: ECMP, OSPF2 for forwarding techniques.

Competitive Work: Torus is another architecture to connect to other nodes, but in large clusters, the wiring becomes impractical.

Question: Are 10GigE switches considered commodity parts today?

--

kmdent

--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Charles Zhang

unread,

Mar 13, 2013, 10:59:13 PM3/13/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Paper Title: A Scalable, Commodity Data Center Network Architecture

Authors: Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date: SIGCOMM, Auguest 2008

Novel Idea:
This paper proposed a new network topology with modified routing algorithm for large data centers that offers lower machine monetary cost and power consumption and heat production.

Main Results:
The paper first up described the goals for the design of data center communication architecture to be high bandwidth, low cost of equipments, and backwards compatibility. Then proceeds to describe the fat tree topology with a top layer of core switches, a middle aggregation layer consisting of lower performance switches, and edge hosts at the bottom layer. This topology provides multiple alternative paths between two hosts whose communication path traverses multiple switches, so that different hosts in one subnets talking to hosts in another subnet will take different paths in order to fully utilize the available bandwidth. Later they introduced the concept of flow classification and flow scheduling to more cleverly assign flows to under utilized links with the help of a controller.

Impact:
By employing this topology, large datacenter can use commodity hardware and cut down the cost for equipments and power consumption and cooling system cost.

Evidence:
It is easy to see that the monetary cost is reduced by using commodity switches. For power consumption and heat production, they didn’t actually run experiments to measure how much electricity this topology consumes or the amount of heat it produces, but rather they based they analysis on numbers reported in the switch data sheets which are provided by the switch vendors.

Prior work:
Much effort of this work is based on the work done in the supercomputer and massively parallel processing (MPP) communities, one example being Thinking machines

Reproducibility:
Since they only modified and added 100+ lines of code to the existing algorithm, and they didn’t use a very complicated testing setup, the reproducibility is fairly high for the results in table 2. And follow the same reasoning, one can easily come to the conclusion that fat tree topology will cut down significantly on equipment costs and power consumption and cooling system cost.

Shao, Tuo

unread,

Mar 14, 2013, 1:07:46 AM3/14/13

to csci2950u-...@googlegroups.com

Paper Title

A Scalable, Commodity Data Center Network Architecture

Authors

Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat

Date

SIGCOMM’08, August 17–22, 2008

Novel Idea

To achieve better utilization of data center bandwidth within a moderate cost, the paper presents a scalabale network architecture using commodity switches.

Main Results

The paper proposes a fat tree topology, upon which it implements a two-level routing table in every switch. For load-balancing purpose, it use a central scheduler to dynamically choose path for a new flow. By implementing this scheduler, it also helps to reconverge network from link failures between switches.

Impact

The fat tree architecture is specially designed for commodity-switch-based datacenter. However, its implementation provides same bandwidth for each host as network based on non-commodity switches and its evaluation shows lower energy cost and a nearly full utilization of the network bandwidth. It would be widely used in those non-experimental data centers.

Evidence

The paper first identify the principle bottleneck in large-scale data center network. Then it compares the cost of infrastructure of two different network architecture-hierachical designed topology and fat tree topology-which are designed to guarantee bandwidth for host-to-host connection. The result shows that the cost of fat-tree is much lower than the other one. To achieve the same Qos as the hierachical designed achitecture, the paper proposes a two-level routing table design as well as a central sheduler. The result of its evaluation experiments indicate it did achieve its primary goal and it provide a solution to reduce cable cost.

Competetive Work

The most notable work is the hierachical designed architecture. But the fat-tree design is much more economic and it's capable of the service the hierachical one provides after optimization.

Criticism and Question

It seems that the topology is very complicated. Each core switch has to connect to all pods no matter how far away these pods are. Is it really scalable?

And it also seems it's not very flexible. Like the example network in the paper, if each switch actually has more than 4 ports, then the other ports are wasted.

Rodrigo

Zhou, Rui

unread,

Mar 13, 2013, 11:44:09 PM3/13/13

to csci2950u-...@googlegroups.com

Title:

A Scalable, Commodity Data Center Network Architecture

Authors:

Mohammad Al-Fares

Alexander Loukissas

Amin Vahdat

Review:

Bandwidth is increasingly the scalability bottleneck in large scale clusters. Existing solutions for addressing this bottleneck center around hierarchies of switches, with expensive, non-commodity

switches at the top of the hierarchy. At any given point in time, the port density of high-end switches limits overall cluster size while at the same time incurring high cost.

Lead by the goals of Scalable interconnection bandwidth, Economies of scale as well as Backward compatibility, this paper presents a data center communication architecture that leverages commodity Ethernet switches to deliver scalable bandwidth for large-scale clusters. The topology was based around the design of a fat-tree structure, in which all switching elements are identical, enabling people to leverage cheap commodity parts and perform scalable routing while remaining backward compatible with Ethernet, IP, and TCP. To cope with the draw-backs of huge numbers cables needed to interconnect all the machines, packaging solutions are also proposed.

This paper declares that this architecture is able to deliver scalable bandwidth at significantly lower cost than existing techniques. The authors also believe that larger numbers of commodity switches have the potential to displace high-end switches in data centers in the same way that clusters of commodity PCs have displaced supercomputers for high-end computing environments.

Questions:

1. Could packaging always work when the scale grows to a mega degree?

2. Could we borrow some structures from large scale integrated circuits? The people from ECE have a whole set for great design of those available.

On Wed, Mar 13, 2013 at 9:26 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Rodrigo

Reply all

Reply to author

Forward