Reviews for Wang12Bigdata

23 views
Skip to first unread message

Rodrigo Fonseca

unread,
Apr 24, 2013, 10:48:08 PM4/24/13
to csci2950u-...@googlegroups.com
Hi,

Please post your review here as a group reply to this message. 
Charles will lead the discussion.

Thanks,
Rodrigo
 

Shu Zhang

unread,
Apr 24, 2013, 11:04:29 PM4/24/13
to csci2950u-...@googlegroups.com

There are a lot of researches in how to distribute flows to ease congestions on bottlenecks in data center networks, especially in the presence of big data computation. Many companies use Hadoop to do MapReduce work, and related research topics have long been proposed.


But the previous researches are not application aware, they analyze the traffic pattern only in the network view, without knowing the real demand of applications. So the prediction basing only on the network knowledge could be wrong. For example, without knowing the destination of aggregation jobs, topology and route established by the network-view algorithms could not be able to set up the routes with lowest latency because they don’t know where traffics will eventually flow to. So using application level knowledge to help establishing communication routes is a natural outcome. Indeed, in the previous papers read in this class, we have been aware of this problem. For example, the D3 network might be work fine for deadline-sensitive jobs, meanwhile it is not effective on other kinds of jobs, like Skype which even the user does not know the how long the task will last. So if we could take the real time, real world application demand into account, that will make the network operate more efficiently.


The paper argues that it is time for us to consider merging application with the SDN. The reasons are three developing trends, the SDN, the dynamically reconfigurable optical circuits, and structured big data applications. SDN let us to programme network in a central controller, and luckily, many big data frameworks such as Hadoop, Spark and HBase are also centrally controlled. So in the implementation view, it makes possible to merge two central parts together.

Secondly, the optical switches have higher capability in data transmission and will be widely deployed as ToR switches, and they are reconfigurable so that it makes run-time network configurations possible. And lastly, take Hadoop as an example, from the application level, the traffic pattern is not that random but relatively tractable. So the outcome is that the configuration in certain phases will be quite stable and the optimization is really possible if we could configure the network basing on the application traffic patterns. Another advantage of the relatively stable patterns is that they make prediction possible by looking back to historic traffic records which share the similar patterns and scales.


The general idea of the system is to add an interface to applications to input their traffic demands to the SDN controller, and controller could configure the network according to the demand matrices.One example the paper uses is a 8-to-1 aggregation application. If the application tells the SDN controller that it is going to do the aggregation work, the controller could set up the topology so that the transmission is completed in one round.


There is a Hadoop example. One issue is the traffic demand estimation. Although the Job tracker and the name node know the location of mappers and reducers, but it is not able to know the traffic produced by the shuffle phase until the map phase ends because the amount intermediate key-value pairs will only be known in that time. So to prevent the configuration delays the whole job, the paper introduces a demand estimation engine to predict the traffic generated in the future phase. They argue that they could deduce the shuffling traffic according to previous similar map tasks with similar input split size. The format is the demand matrix, which might be a point-to-point representation of traffic demands. By using the matrix and the knowledge of mappers and reducers, the routes could be optimized by location and neighbours knowledge. Traditionally, the N-to-M shuffle would require establish a full matrix of communication between all nodes, but the paper used the Torus topology to optimize the allocation by using location information plus the demand matrices. Another optimization is the partially overlapping aggregations. For two shuffling jobs, there might be overlapping parts, and non-intersecting parts really share little traffic. Starting from this idea, by intersecting the jobs, we could establish the traffic allocation in a way which could leverage the utilization of links.


The short paper talks something about implementation and overhead. The biggest challenge is the scalability issue. How to distribute the rules to the whole datacenter must be dealt with an efficient way. The idea is to lower the granularity to reduce the rules to be installed on switches. The paper uses racks as the matching units, and each rack is allocated a spanning tree, so the number of rules is relevant to the number of racks. Another problem is consistency. The system should enforce the installation of rules to be synchronized so the traffic could be directed in a grace way. But it did not dive into it a lot.


To conclude, although I suspect whether it is the first paper to advocating using application-level knowledge to leverage network performance, it does reason the probability of doing so.  


Question

In the discussion part, the paper said the failure handling (such as link failure) mechanism is implemented both in the master node as well as in the controller such as Floodlight. So there is no need to deal with the problem in the cross-layer work. But I am curious whether these two version of updates will be totally synchronized. If the SDN knows the failure faster than the master node, then the network information will not be same, and the Hadoop might produce the demand matrix basing on the old version of information, and the SDN might configure in the new network fashion, or vise versa. So is there necessary to have the synchronization mechanism in failure handling in the integrated system?


 

Jeff Rasley

unread,
Apr 24, 2013, 11:26:05 PM4/24/13
to csci2950u-...@googlegroups.com
Title: Programming your network at run-time for big data applications
Authors: Guohui Wang (IBM), T.S. Eugene Ng (Rice), and Anees Shaikh (IBM)
Context: HotSDN '12

Novel Idea: This work uses optical switches to re-configure the data center topology by tying an SDN controller to the centralized components of big data applications like Hadoop.

General Ideas: The authors describe an idea about making the data center network more application aware. In this scenario the data center is equipped with reconfigurable optical circuit switching and an OpenFlow controller. The application they chose to look at initially is Hadoop, they leverage the fact that the network and Hadoop both have centralized points of control (i.e. job tracker and SDN controller). By modifying the Hadoop controller to perform traffic demand estimation and report these stats to the SDN controller they can more easily reconfigure the network to optimize certain high-load transfers (e.g. shuffle-phase) of Hadoop.

Impact: Potentially this could be a really cool way to connect Hadoop, optical switches and SDNs, which seem like a great combination to improve performance.

Future Work: They are looking into non-Hadoop big data applications and parallel database systems as well. 

Questions/comment: It would be interesting to see how well this works, they don't seem to have much evaluation of this. Specifically, I am skeptical that you can accurately predict shuffle transfer sizes based on previous ones in a job. Doesn't this assume that the mapper outputs are uniform in size? Maybe I misunderstood something but I can think of certain MR jobs that would not have uniform mapper outputs. Even more specifically, is the last sentence in section 3.1 true?

Christopher Picardo

unread,
Apr 25, 2013, 1:24:06 AM4/25/13
to csci2950u-...@googlegroups.com

Last Hurrah!

Title: Programming Your Network at Run-Time for Big Data Applications

Authors: Guohui Wang, T.S. Eugene Ng, Anees Shaikh

Date: In Proc. of ACM Workshop on Hot Topics in Software-defined Networks (Hot-SDN 2012), August 2012

Novel Idea: The authors integrate as part of their design a SDN controller and optical switching to create a run-time network configuration optimized for big data application performance and network utilization. 

“The authors focus on using Hadoop as an example to explore the design of an integrated network plane, and describe Hadoop job scheduling strategies to accommodate dynamic network configurations”.

Main Results:

For big data applications, an application-aware network controller provides improved performance.

By correlating traffic volume observed on different tasks, it is feasible to predict the shuffling traffic demand of map tasks before they are finished. 

Given the duration and size of typical Map reduce jobs in production data centers, the authors believe the latency to install rules is relatively small.

Implementing flow level traffic engineering requires installing a rule for each selected flow on ToR switches, which imposes additional overhead to the network configuration.

 

Impact:

The authors demonstrate that it is possible to perform a real time or run time configuration of a SDN for big data applications. This integrated network control architecture programs the network at runtime using optical circuits with a SDN controller and with low configuration overhead. 

Hence, this paper sets to achieve very ambitious goals. Also, among the obstacles encountered or to be investigated we have:  finding the ideal topology for a certain type of tasks of big data workload, quick reconfiguration, balance and/or scheduling of partial overlapping aggregations, achieve state consistency during topology and routing updates, performance and energy consumption.

Question:

The use of a SDN controller constitutes a single point of failure to the system. If instead we have 2 or more controllers collaborating with each other we can free up resources, and be more flexible in terms of hard deadlines when reconfiguration becomes a priority. Would, maintaining consistency across the network, be much easier or difficult?  Having redundant components is a good or a bad thing for this kind of SDNs?

Zhou, Rui

unread,
Apr 25, 2013, 1:21:13 AM4/25/13
to csci2950u-...@googlegroups.com
Paper:
Programming Your Network at Run-time for Big Data Applications

Authors:
Guohui Wang T. S. Eugene Ng Anees Shaikh

Novel Idea:
Particular big data applications in data-center networks tended to have particular traffic patterns, Hybrid Optical  switches can achieve really high throughput with optical links which are configurable within milliseconds, and we now have SDNs which can shape flows. Combining the three of them, we can arrange network traffic in a way that suits the applications (Hadoop in this paper) running in the Data-center.

Summary:
It has been a constant attempt that people are trying to shape the network to better suit the need of applications. By predicting the giant flows, we can try to arrange the network so that the "elephants" are sent through the dramatically fast optical links. However, several previous trials have shown that prediction on general network traffic is  very hard to do and the current approaches are limited  to very short of time interval, thus the overhead of reshaping the network just for a short period of prediction could mean significant overhead.

However if we look at nowadays big data applications, many of them involve large flows with significant patterns. Hadoop runs map reduce, which have fixed phases of mapping, (combining, partitioning, shuffling) and  reducing.  This paper thus tries to shape the network to be aware of  Hadoop traffic shape and suit it.

A possible system architecture for this job is a hybrid muti-root network with some of the roots using optical switches, and the flows in the network are controlled by SDN controllers such as floodlight speaking openflow. The controller expose general interface for device and network configuration as well as a query interface for the application to glance at the network settings.

We can take advantage of the application awareness to do better in both job scheduling and network topology shaping. In this paper's implementation,  rack-based bin-packing placement for reduce tasks to aggregate them onto a minimum number of racks, and batching process is used to  help aggregate traffic from multiple jobs to create long duration traffic that is suitable for circuit switched paths and ensures that an earlier task does not
become starved by later tasks. The authors also provide algorithms to shape the network topology based on predication of Single aggregation pattern, Data shuffling pattern, and Partially overlapping aggregations.

Oddly enough, no tests were provided based on their implementation.

Reproducibility:
For map and reduce style and similar big data applications, application-aware network can certainly be very useful. The key of the effeteness will be how well can we predict the traffic.

Question, Critism and  Further idea
1. 
The reshaped topology is good only the trading off between hop number and traffic congestion are well balanced.  The paper may need further discussion on the control of  indirect hop numbers, or prove their algorithm will not lead to big chain of indirect hops.
2.
I think marketing based resource allocation nicely fit in the needs for  fairness and priority.
3.
The time period T is a very important factor that needs experiments to find the optimal.




 

--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Shao, Tuo

unread,
Apr 25, 2013, 1:23:55 AM4/25/13
to csci2950u-...@googlegroups.com
Paper Title
Programming Your Network at Run-time for Big Data Applications

Authors
Guohui Wang, T.S.Eugene Ng, Anees Shaikh

Date
HotSDN'12, August 13, 2012

Novel Idea
The paper presents a design of run-time system in SDN for big data applications to optimize its performance.

Main Results
The design of run-time system proposed in this paper includes three parts: traffic demand estimation, job scheduling and routing for different patterns

Impact
The system proposed in this paper is designed more specifically for one application(Hadoop in this paper). It should work well completing Hadoop tasks and improve the performance. But it doesn't give any evaluation about it. The effectiveness remains to be seen.

Evidence
The paper first discussed about architecture and traffic patterns in data center network and it points out the advantages of application awareness by using an example of Hadoop task. Then it describes the strategies could be applied in traffic demand estimation and job scheduling. Most importantly, for each aggregation pattern in Hadoop, it proposes different routing mechanism respectively. It also propose some implementation issues and possible solutions to minimize the control overhead.

Prior Work
BCube and CamCube utilize the Hypercube and Torus topologies for routing.

Competetive Work
OSA

Criticism and Questions
The paper doesn't give any evaluations about the design. However, the delay of control is critical in delay-sensitive applications like Hadoop. Although the author claims the design minimize the control overhead, it lacks the statistics to convince the readers. 


On Wed, Apr 24, 2013 at 10:48 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:
 

--

kmdent

unread,
Apr 25, 2013, 1:16:17 AM4/25/13
to csci2950u-...@googlegroups.com

Programming Your Network at Run-time for Big Data Applications by Guohui Wang, T. S. Eugene Ngy, and Anees Shaikh


Novel Idea: Using Software Defined Networks along with optical switches on jobs involving big data to improve application performance and network utilization.


Main results: The flow installation can be achieved with relatively low overhead in comparison the the long duration of MapReduce flows. There are some concerns as to the scalability of the respective SDN controller in regards to how fast it can update the state across the network, and how consistent it can keep it. They have a job estimation engine to compute the source and destination associated with a task. Then by correlating the traffic volume with different tasks, the estimation engine can predict the shuffle traffic demand of the mapping phase before the tasks are finished. As for data aggregation, they use Single Aggregation pattern in cases where reducers need to collect from multiple mappers on different racks, Data Shuffling pattern when there are multiple reducers across multiple racks, and Partial Overlapping pattern in cases when there are overlapping resources.


Evidence: They don’t really provide any evidence other than a couple of aggregation examples.


Prior Work: OSA. BCube and Cam Cube network architectures. Das et al.


Reproducibility: Not really applicable. They don’t produce anything.


Criticism: They don’t give any results on how this affects hadoop jobs.




-- 
kmdent

--

Papagiannopoulou, Dimitra

unread,
Apr 24, 2013, 11:59:43 PM4/24/13
to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Title: Programming Your Network at Run-time for Big Data Applications

Authors: Guohui Wang, T. S. Eugene Ng, Anees Shaikh

 

Novel Idea: Motivated by emerging network architecture trends (such as the growth of big data applications, the growing research interest on dynamically reconfigurable optical circuits and the potential capabilities of SDNs on configuring and improving network's performance) the authors propose an integrated network control architecture to program the network at run-time for big data applications, using optical circuits with an SDN controller. They focus on Hadoop to explore the design of the integrated network control architecture.

 

 

Results: Through their analysis, the authors conclude that the proposed integrated network control architecture, shows great potential to improve application performance not only for Hadoop but also for other big data applications with a centralized master, at small configuration overhead.

 

 

Prior Work: The work done in this paper is motivated by the network architecture trends mentioned before. More specifically, [1],[2],[20] and [29] show the growing interest on big data applications, most of which have a centralized master that enables the use of application-level information for network optimizations. Also, [26],[18],[15] and [25] show the great research interest on data center network architectures that use optical switches  to provide increased bandwidth at low complexity and energy consumption. Moreover, [14] demonstrates the need for application-level information to ensure good circuit utilization and application performance.

 

 

Related Work: The paper refers to other works on allocating optical circuits in data centers (c-Through [26], Helios [18], OSA[15]). Those though, rely on network level traffic data to improve application performance. Schares et al. [16] discussed the use of optical switches for stream processing systems. [12] used OpenFlow to aggregation traffic in a converged packet-switching network and it was focused on WAN services. Finally, [27] used topology switching in fat-tree based data center networks. 

 

 

Evidence: The paper doesn't include any experiments or supporting data. The authors describe the overall architecture of cross-layer network control for big data applications. Then, they describe the traffic patterns of big data applications (bulk transfer, data aggregation, control messages with certain latency limitations) and explain that in oversubscribed data center networks the data aggregation and shuffling patterns can make performance suffer, so their design is targeted in handling a mix of such traffic patterns. Then, through illustrated examples, they motivate their approach on using application-level information to perform network optimizations. Moreover, they use Hadoop as an example to discuss the design of the integrated network control plane (traffic demand estimation, job scheduling, handling of aggregation traffic). Finally they describe how they would implement the presented design.

 

 

Critisism: The ideas presented in this paper look promising, but it is still a work in progress and it needs a lot of refinement as well as an evaluation process. The paper doesn't include any specific results, just discussion on what could be done. It is positive though, that it also includes a discussion on potential future work. Another comment, regarding the wording of the paper: This is not a very comprehensive paper, in the sense that many things could be explained and analyzed better. Nevertheless, the idea of using application level information to SDNs to improve application performance, seems promising. It would be nice to implement and evaluate it though, using not only Hadoop but also other big data applications. 



 

--
Reply all
Reply to author
Forward
0 new messages