Reviews for Vahdat11Emerging, Farrington12Mice

Rodrigo Fonseca

unread,

Mar 20, 2013, 9:09:47 PM3/20/13

to csci2950u-...@googlegroups.com

Hi,

Please post your reviews to both papers (which are short) as a group reply to this message.

You may combine the reviews if you want, or just have one after the other.

Thanks,

Rodrigo

Shu Zhang

unread,

Mar 20, 2013, 10:28:35 PM3/20/13

to csci2950u-...@googlegroups.com

Review for [Mice]:

Nowadays the compositions of uplinks of pods in datacenter are mainly the packet switch ports and circuit switch ports. The reason is that using some circuit switch ports could lower the cost (cost is really a big issue in data center construction, as has been seen in recent reviewed papers). The traditionally scheduling method is HSS, which discovers the hotspot and then offload traffic.

The paper discovered that nowadays the capability of optical switches has been largely increased , and in additional, the traditional packet switches based topology is mainly basing on the multiplexing of outgoing ports and it could lower the utilization of physical link, whereas the optical switches could use TDMA to multiplex flows. Along this way, the paper proposed TMS algorithm.

TMS algorithm has two phases. First the TDM (traffic demand matrix ) is scaled into a BAM (bandwidth allocation matrix). Then BAM is decomposed into a circuit switch schedule, which is a convex combination of permutation matrices that sum to the original BAM. The matrix decomposition algorithm could be BvN. So all traffic are decomposed using the algorithm, no matter it is large or small, so the paper does not only catch the elephants, asl it also hunts mice (A pretty interesting title).

An optimization method is longest time-slot first scheduling. There might be some very short time slots generated by BvN, so these small time slots could be neglected because the time used to setup the scheduling might be rather larger. So the greatest benefit comes from scheduling the first n time slots (n is decided by min duty cycle and max allowed schedule length).

Table 1 shows the tradeoff of number of decomposition matrices versus duty cycle and fraction of CSN and PSN. The major trend is that, if n is small, so TDM could not satisfy all flows, so the additional traffic should be given to traditional PSN. When n reaches the total amount of decomposition matrices, all traffic would be scheduled by TDM, but the duty cycle would fall to lowest.

The paper has a prototype Mordia to test TMS algorithm. The Oprical Ring in Mordia is interesting, but if we look at it by a topology view, it is equivalent to all pods connecting to a switch to do TDMA jobs.

Questions:

1. The matrix decomposition algorithm is proposed 60 years ago. Does it mean the paper says, “ Oh, there are much quicker switches, now we could pick up the ‘forgotten’ algorithm again and apply it directly” ?

2. The matrix decomposition / time slice calculation needs a large amounts of time. How does it fit to dynamic flow changes in which case the BAM matrix changes?

Jeff Rasley

unread,

Mar 20, 2013, 10:36:39 PM3/20/13

to csci2950u-...@googlegroups.com

The main takeaways I have from the OSA paper was that in order for data centers (DC) to take advantage of optical circuit switching (OCS) 4 major requirements must be met:

1) Lower cost of integrated MEMS-based OCS

2) The scalability of OCS is currently only at few hundreds of duplex ports, this must be higher for DC adoption.

3) Faster switching times, currently between 10-20ms which is not acceptable, per packet switching requires nanoseconds however OCS could support less than 100 microseconds.

4) Reduce insertion loss from 5 dB to below 2 dB.

Hunting Mice with Microsecond Circuit Switching

This paper primarily discusses the concept and use of traffic-matrix scheduling (TMS) in data center inter-rack communication, where there is a hybrid of circuit and packet switching networks.

In workloads like Hadoop data centers experience an all-to-all traffic pattern, which can significantly stress the network. By using time-division multiple access (TDMA) they are able to schedule slots of time for traffic between racks, and create virtual output queues (VOQ) to

buffer traffic between scheduled slots.

Both papers seem to build heavily on the author's previous work Helios and partially on c-Through.

Reproducibility may be hard (financially) in this case. The authors have built Mordia which seems to be a unique optical testbed.

Question: I am curious how well this works with workloads with high variability of flow sizes and timing. Since there the authors also have many unanswered questions I would be anxious to see more raw performance numbers.

Zhiyuan "Eric" Zhang

unread,

Mar 20, 2013, 11:33:45 PM3/20/13

to csci2950u-...@googlegroups.com

Paper:

Hunting Mice with Microsecond Circuit Switches

Nathan Farrington, George Porter, Yeshaiahu Fainman, George Papen, Amin Vahdat

Hotnets ’12, October 29–30, 2012, Seattle, WA, USA.

Review:

This paper presents traffic matrix scheduling (TMS), a generalized version of hotspot scheduling which basically uses circuit switching only. One key disadvantage of the optical technology is that it's hard to achieve packet switching. Before this paper was published, there had been proposals for hybrid data center network with both traditional packet switches and optical circuit switches. However, the authors argue that it is possible to build data center networks with only circuit switching.

The idea is that it uses time-shares circuits at microsecond scale (as in the title), and scheduling algorithms to set up and tear down circuits between different pods. One issue they talked about is the ratio of the setup time and transmission time, which is related to the link efficiency for different flows.

In their evaluation, they test two scheduling algorithms on a single PC, and evaluate their growth rates. Both algorithms are pretty expensive computing, so their scalability in real data centers might be a problem.

One last comment: this time-sharing circuit switch idea seems to be a little similar to packet switch, only it's not switching fast enough. I can see it's good at handling large flows, but I doubt it's a better idea than hybrid data centers.

Paper:

The Emerging Optical Data Center

Amin Vahdat, Hong Liu, Xiaoxue Zhao and Chris Johnson

OFC 2011, OTuH2

Review:

This paper presents an overview of how to utilize optical technologies in the data center network. Specifically, they discuss about optical circuit switching (OCS) and wavelength division multiplexing technology and the requirements for them in data center environments.

For OCS, one requirement worth to mention is faster switching time. There are still many fundamental challenges in optical packet switching, but with a faster switching time, OCS can still have a big impact on hybrid data center networks or the network in the Mice paper.

On Wednesday, March 20, 2013 9:09:47 PM UTC-4, Rodrigo Fonseca wrote:

Christopher Picardo

unread,

Mar 21, 2013, 12:01:03 AM3/21/13

to csci2950u-...@googlegroups.com

Paper Titles:

Hunting Mice with Microsecond Circuit Switches (HM)

Authors: Nathan Farrington, George Porter, Yeshaiahu Fainman, George Papen, Ami Vahdat,

The Emerging Optical Data Center (EODC)

Authors: Amin Vahdat, Hong Liu, Xiaoxue Zhao, Chris Johnson

Dates:

October 29-30 2012, and 2011

Novel Idea:

HM: A generalization of hotspot scheduling, called traffic matrix scheduling, where most or even all bulk traffic is routed over circuits. Traffic matrix scheduling rapidly time-shares circuits across many destinations at a microsecond time scales in polynomial time.

EODC: The paper presents an overview of current data center network deployments, the role played by optics in this environment, and opportunities for developing variants of existing technologies specifically targeting large-scale deployment in the data center. I.e.: wavelength division multiplexing (WDM) technology, and optical circuit switching along side electrical package switches (EPS).

Main results:

HM:

Designed and built microsecond-scale circuit switch called Mordia (Microsecond Optical Research Datcenter Interconnect Architecture). It is an optical ring of wavelength-selective switches called stations, approximately 2300x faster than the optical space switches used in Helios and c-Through (11.5us, vs 27ms).

This prototype shows it is possible to support microsecond-scale circuit switching over commodity Ethernet technology.

EODC:

Optical Circuit Switching – Ideally want native optical packet switching (OPS), but not possible at this time, so we use OCS because it is data rate agnostic and energy efficient. So allows to lower cost, scales well, allows fast switching, and low insertion loss.

WDM Optical transceivers – To reduce cable overhead, to scale with increasing link bandwidth, and to leverage optical circuit switching, WDM must perform well, overcoming the following constraints:

- Transceivers with large power consumption present thermal challenges and limit EPS chassis density.

- Data center transceivers must account for multi-building span reaching 1km and optical loss from OCS and patch panels.

- Photonics highway must align seamlessly with electrical switch fabric in bandwidth and speed.

- For intra-building network, a rich-mesh topology is desirable

Impact:

HM:

Traffic matrix scheduling achieves the following advantages over packet switching:

- CAPEX and OPEX reduction

- Low latency

- No jitter

- Scales the nominal link rate to hundreds of Gb/s per link.

Hence, all these characteristics make circuit switching a viable contender for future data center network architectures.

EODC:

Paper points out:

- The per-port cost of an OCS is competitive with, if not inherently cheaper than the comparable EPS. However, it has more capacity through wavelength bundling and lower power consumption.

- WDM reduces cable complexity, very challenging to accomplish in a data center.

- OCS eliminates some fraction of the optical transceivers and EPS prots by eliminating a subset of the required OEO conversions.

- Paper concludes, data center network architectures are about to change because by emerging optical technologies and components like optical circuit switching and Wavelength Division multiplexing (WDM) transceivers.

Prior work:

There is some prior work like c-through Helios, and Flyways, HSS, but in general these are all experimental and prototyping work attempting to leverage a transition from current to new technology with hybrid approaches. We are getting there, they say.

Question:

I wonder if the switching speed of the optical circuit switch for the emerging architecture incorporating OCS, and WDM should be a source for concern? I also see it as a single point of failure, but assume it is reliable enough to work just fine.

Using an optical circuit switches results in less wiring, and a cleaner interface, and better maintenance.

Also, why OCS cannot perform per-packet switching? Thanks.

Shao, Tuo

unread,

Mar 21, 2013, 12:17:56 AM3/21/13

to csci2950u-...@googlegroups.com

Paper Title

Hunting Mice with Microsecond Circuit Switches

Authors

Nathan Farrington, George Porter, Yeshaiahu Fainman, George Papen, Amin Vahdat

Date

Hotnets ’12, October 29–30, 2012

Novel Idea

This paper presents a traffic matix scheduling for optical circuit switching which help to offload bulk traffic from traditional packet switching in data center.

Main Results

The paper describes a scheduling algorithm to convert the host to host traffic demand into schedule for circuit assignment.

Impact

The paper follow the trend of constructing hybrid data center and provides a resolution to schedule circuit assignment in current one-channel physical link.

Evidence

The paper first describes a effective link rate model and points out three ways-faster switch, increasing stable time and increase nominal link rate-to improve the efficiency. The second way would increase the host buffer size and the buffer size would ultimately be limited by the link rate because it has to be drained when it's being scheduled. The Third way is an opposite multiplexing method to TMS. So to make TMS practical, the first method must be feasible. Then the paper describe the algorithm and provides a optimization for reduce penalty it suffered from short time-slot scheduling.

Criticism and Question

The premise of practical TMS is using microsecond-scale switching. Is it feasible now? Given the technology of WDM, we can utilize mutiple channels in one link rather than using time-slot multiplexing. Is it a better way for scheduling? The paper doesn't seem to compare its method with other ways of possible scheduling.

Paper Title

The Emerging Optical Data Center

Authors

Amin Vahdat, Hong Liu, Xiaoxue Zhao and Chris Johnson

Novel idea and Main Results

By addressing the role optics play in modern data center, the paper decribes the components and requirements of data center networking.

Impact

Although many technologies are under developing, it seems to be a more economic solution to scale data center network while reducing the cabling complexity with OCS adn WDM.

Evidence

The paper first reviews the current data center network architecture and points out the limitations of large overhead for internal network connectivity to scale the network and noneconomic high-speed copper cables. And then it demonstrate optical circuit switch and VCSEL's advantages in such espects. Finally, it outlines the requirments for OCS and WDM.

Criticism and Question

One benefit the paper mentions is that OCS would reduce cabling comlexity. However, in some of current network architecture like VL2, cabling comlexity is intentionally increased to better handling link failure. Is optics more reliable than copper cables? Would optical link become single point failure?

--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Papagiannopoulou, Dimitra

unread,

Mar 21, 2013, 1:15:24 AM3/21/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Paper Title: The Emerging Optical Data Center

Authors: Amin Vahdat, Hong Liu1, Xiaoxue Zhao, Chris Johnson

OSA/OFC/NFOEC 2011

Novel Idea: In this paper the authors present an outline of existing data center network deployments and how optics could play an important role in overcoming the limitations imposed by existing technology in large-scale data centers. Specifically they examine how wavelength division multiplexing (WDM) technology could be optimized to be used in data centers and how a combination of Optical Circuit Switching (OCS) and EPS would be advantageous for data centers.

Main Results: The authors showed how emerging optical technology and its components could be incorporated in large-scale data centers. They especially focused on the role of Optical Circuit Switching and WDM transceivers in the data center.

Impact: The data center switching and interconnect technology currently available, imposes several limitations and obstacles in scaling up to large data centers while maintaining good performance levels. For example, the number of Electrical Packet Switches could complicate management and OpEx, while EPS ports and optical transceivers could introduce a high cost in the network equipment and large amounts of multimode fiber would be required. Optimizing the optical technology components and incorporating them into data center networks could play a critical role in overcoming those challenges.

Evidence: The authors first explore what are the communication and network requirements of large-scale data centers. They present an example of a current data center architectures and then show how emerging data center architectures that employ OCS would be. Then, they analyze the main challenges of incorporating OCS into data centers and what are the main requirements for OCS hardware in that case. Finally, they talk about how WDM performance could be achieved without increasing power and cost significantly, in order to meet the emerging data center economies and scale.

Prior Work: The authors present an emerging data center architecture, based on the works of [7] and [9] that employ OCS. They further refer to existing data center technology [3,4,5,6] and [7] that incorporates OCS along with EPS in the data center.

Criticism: This paper is mainly and overview of existing technologies and parameters that should be taken into account when incorporating optics to build large and scalable data centers. It serves its purpose on motivating the networking community to take into consideration how the evolving optical technology could waive the limitations that exist in current data centers and help build scalable and efficient data centers in the future.

On Wed, Mar 20, 2013 at 9:09 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

kmdent

unread,

Mar 21, 2013, 12:29:17 AM3/21/13

to csci2950u-...@googlegroups.com

The Emerging Optical Data Center

A. Vahdat, H. Liu, X. Zhao, C. Johnson

Novel Idea: Increasing the amount of optics in the data center, namely WDM and OCS.

Main Results:OCS has an increased data transfer rate as well as extremely low power usage. OCS can’t currently do packet switching, though there is much research going on to fix this. OCS should replace some core switches and handle mainly longer flows. Copper cables are prone to high error rates, and use a tremendous amount of power. VCSELs currently can’t cross a single datacenter, but they are low power, and widely used in data centers. WDM transceivers need to be used to scale these data centers without incurring absurd costs.

Impact: Possibly large amounts in the future.

Prior Work: OCS

Evidence: Results from other papers.

Crit: A fair amount of background information was required to understand this paper. For example, they just start talking about VCSELs without introducing them.

Hunting Mice with Microsecond Circuit Switches 2012
N. Farrington, G. Porter, Y. Fainman, G. Papen, A.Vahdat

Novel Idea: Hotspot scheduling is inflexible and static. Data centers with hybrid switches should instead use traffic matrix scheduling. TMS decouples scheduling and switching. Unlike HSS, once TMS has constructed a schedule, it will be implemented in hardware in a matter of microseconds. Separating scheduling and switching allows TMS to schedule a larger amount of traffic while using the same expensive algorithm to construct the schedule.

Main Result: The TMS algorithm is O(N^2). TMS routes all bulk data transfers through circuit switches instead of only hotspots. It allows for lower jitter, lower latency, and lower CAPEX and OPEX.

Evidence: They created mordia to test algorithms such as TMS.

Previous work: Sinkhorn 1964. Bazazz et al. Vattica et al.

--

kmdent

Papagiannopoulou, Dimitra

unread,

Mar 21, 2013, 5:32:38 AM3/21/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Paper Title: Hunting Mice with Microsecond Circuit Switches

Authors: Nathan Farrington, George Porter, Yeshaiahu Fainman, George Papen, Amin Vahdaty

Hotnets ’12, October 29–30, 2012

Novel Idea: In this paper, the authors present Traffic Matrix Scheduling (TMS), a generalization of hotpot scheduling (HSS), in which most or all bulk traffic is routed over circuits. The proposed scheduling algorithm runs in polynomial time and time-shares circuits across many destinations at microsecond time scales.

Main Results/Impact: The main result of the paper, is TMS that can be used for scheduling circuit switches in data center networks to route all bulk traffic, apart from hotpots traffic, which makes circuit switches much more useful for future data center designs.

Evidence: The authors do a comparison between TMS and HSS to show the main differences between the two scheduling policies. They use an example of TMS, in which they show how 8 pods run Hadoop with an all-to-all communication pattern by providing the physical and logical topology, the inter-pod traffic demand matrix and a Gantt chart. They report useful equations to compute the duty cycle D that determines the effective link rate of the circuit switch, as well as equation to compute the amount of buffering required by each host in bits, and discuss the ways in which the effective link rate could be increased. They present the TMS algorithm in detail and provide measurements on its execution obtained after testing TMS with dense uniform random input patterns. Moreover, they show examples of the trade off between the number of schedule time slots, the amount of traffic sent over the circuit switched network compared to the packet switched network and the duty cycle D.

Prior Work: As mentioned before, TMS generalizes HSS that has been used for circuit scheduling in hybrid data center networks, in previous works such as [8,18, 5, 7, 17] that combined electronic packet switching with either wireless or optical circuit switching.

Competitive Work: There are other hybrid data center network prototypes, such as the work on c-Through [16,17] and Helios [7] that use HSS. Both of those though, require some operations that are time consuming and rely on the critical path for circuit reconfiguration. Also, there are other works such as [3,4] that use the Birkhoff-von Neumann decomposition algorithm to compute schedules for input queued switches, but those works have been focusing on packet switching, while this paper is focusing on circuit switch scheduling for data center networks with packet buffers distributed among the hosts.

Reproducibility: Yes.

Criticism: There have been works that have proposed hybrid data center networks that combine electronic packet switching and wireless or optical circuit switching, to support bulk traffic. Most of those though have been focusing on hotspot scheduling, and while they have managed to offer some improvements, they rely on packet-switching for the remaining traffic. This work uses circuit switching to route all bulk data center traffic, not just the traffic from the hotspots. Since circuit switching has some significant advantages over packet switching, this work has a particular value as it makes circuit switching a considerable alternative for future data center architectures. Furthermore, it proposes a generalization over HSS, to overcome the limitations that are imposed by HSS (certain features that make HSS static and inflexible) while at the same time exploring further the capabilities of circuit switching in data centers.

Place, Jordan

unread,

Mar 20, 2013, 11:00:09 PM3/20/13

to csci2950u-...@googlegroups.com

The Emerging Optical Data Center

Amin Vahdat, Hong Liu, Xiaoxue Zhao, Chris Johnson
OSA/OFC/NFOEC '11
The authors describe OCS as a means of increasing throughput across
the backbone of data center network architecture. They outline what
properties OCS technology must have in order to be practical and
discuss the current state of data center network architecture. This
paper serves a good introduction to optical switching.

Hunting Mice with Microsecond Circuit Switches

Nathan Farrington, George Porter, Yeshaiahu Fainman, Geore Papen, Amin Vahdat
Hotnets '12
In today's data centers, high-speed circuit switching technology
is often uses as a backbone of a network architecture in order to
handle the large amount of intra-data center communication. These
circuit switches can transmit data quickly but circuit setup requires
overhead. To reduce the frequency of circuit setup and
reconfiguration, many hybrid networks use hotspot scheduling (HSS) as
a means of analyzing traffic patterns and setting up optimal circuits
for offloading traffic from the packet-switched network. However, HSS
requires a lot of overhead to compute optimal setups and often leave
the circuits with excess capacity.
This paper proposes an alternative to HSS called traffic matrix
scheduling (TMS). TMS measures traffic patterns in the network and
builds a traffic demand matrix outlining the desired transmission rate
from host to host. Using this matrix, a bandwidth allocation matrix
can be constructed and used to determine a circuit switching schedule.
A circuit switch then time shares circuits according to this schedule.
During a circuit's time share, a queue that has been building is
emptied as data is transferred quickly across the circuit. The authors
note that in order for this time sharing to function correctly, the
time to change between circuits must be on the scale of microseconds.
The authors have deployed and tested TMS in a miniature data
center test bed. They briefly discuss the results which are generally
positive. As further work, I wonder if it might be possible to predict
traffic patterns or optimally order the configurations as to minimize
the overhead of changing between circuits. Additionally, how well does
TCP congestion work with this "queue-and-empty" flow? Do packets get
acked when being queued or delivered?

On Wed, Mar 20, 2013 at 9:09 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Reply all

Reply to author

Forward