Reviews for Popa12Faircloud

Rodrigo Fonseca

unread,

Apr 8, 2013, 8:42:02 PM4/8/13

to csci2950u-...@googlegroups.com

Hi all,

Please post your review to the FairCloud paper as a group reply to this message.
Christopher is going to lead the discussion tomorrow.

Thanks,
Rodrigo

Zhiyuan "Eric" Zhang

unread,

Apr 8, 2013, 8:53:29 PM4/8/13

to csci2950u-...@googlegroups.com

Paper Title

FairCloud: Sharing the Network in Cloud Computing

Authors

Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica

Date

SIGCOMM’12, August 13–17, 2012, Helsinki, Finland

Novel Idea

The basic idea of this paper is that traditional network sharing policies like the ones in TCP are not suitable in a cloud computing environment. The authors argue that sharing network resource in cloud networks should meet three requirements: minimum bandwidth guarantee, high utilization and payment proportionality. Then they argue that there are fundamental tradeoffs between these requirements. They also developed three resource allocation policies (PS-L, PS-N and PS-P) to solve the network sharing problem with different tradeoff decisions.

Main Results

PS-L and PS-N achieves network proportionality at different levels: PS-L provides per-link proportionality and PS-N provides proportionality on network level. The tradeoff between these two is that PS-L doesn't provide minimum bandwidth guarantee and its proportionality is very limited, while PS-N doesn't provide either minimum bandwidth guarantee or high utilization and its proportionality is a little more generalized. PS-P, on the other hand, is only suitable for tree-based topologies, and it provides everything except proportionality and strategy-proof. In their simulation and experiment, the result supports the tradeoff analysis of the designs.

Evidence

The authors propose a set of desirable properties to examine the tradeoff space and analysis different sharing policies. Those policies are evaluated with simulations of single congested link and small-scale networks. They also performed experiments in a production cluster at Facebook to validate their observation on a large scale.

Comment

One thing I like about this paper is the examples it provides when discussing the network sharing properties and tradeoffs. I also find there is an interesting point which the authors leave to future work: we can select suitable values for the coefficient \alpha and \beta rather than just binary values as in the PS-P. I think it should be interesting to find out what can be done by using continuous values for those coefficients.

Question

Just a little confusion: in section 4.2, it says PS-N provides path A1-A3 a weight of 1/4 + 1, and Nx' is the number of VMs that X communicates with across the network. Does A1 communicate with 3 VMs? Then why the Nx' for A1 is 4?

Jeff Rasley

unread,

Apr 8, 2013, 9:26:04 PM4/8/13

to csci2950u-...@googlegroups.com

Title: FairCloud: Sharing the Network in Cloud Computing

Authors: Lucian Popa (HP), Guatam Kumar, Mosharaf Chowdhury (UCB), Arvind Krishnamurthy (UW), Sylvia Ratnasamy, & Ion Stioca (UCB).

Context: SIGCOMM 2012 with a related previous HotNets paper from 2011.

Novel Idea: A technique (PS-P) for the sharing of network resources in data center networks that preserves min-guarantee, work conservation and utilization incentives.

Main Results: The authors describe and evaluate 3 techniques for sharing network resources in a cloud environment like Amazon's EC2 and the various trade-offs associated to each one and related current techniques.

Impact: This could potentially be the basis for how companies like Amazon could start charging and configuring their EC2 services.

Evidence: The authors used evaluated the 3 outline techniques (PS-L, PS-N, & PS-P) through simulations and a physical testbed.

Prior Work: Seawall also proposed network sharing ideas but at the hypervisor level, the authors point out that they could use Seawall to implement PS-N, & PS-P.

Reproducibility: The algorithms and some accompanying deployment strategies are listed in the paper. In theory this shouldn't be too difficult to reproduce.

Future Work: In terms of PS-N the authors state that if you only use the weights on congested paths and ignore them on uncongested paths then you could potentially get around the issues PS-N has in terms of utilization incentives. They state, however, that this technique would be difficult (but potentially interesting) to deploy.

On Monday, April 8, 2013 8:42:02 PM UTC-4, Rodrigo Fonseca wrote:

Shu Zhang

unread,

Apr 9, 2013, 12:03:43 AM4/9/13

to csci2950u-...@googlegroups.com

The paper is constructed basing on three topics which are required if we want to solve the resource sharing problem in cloud computing. They are : Minimum network bandwidth guarantee, high utilization , and payment proportionality. One point which is worthy attention is that the paper discusses the issues in the VM’s granularity, which is a common view for today’s real-word IaaS infrastructure, such as EC2.
The reasons why three requirements are important are:
Min-guarantee: This requirement is key for achieving predictable application performance and is usually enforced through admission control.
High Utilization: High utilization is particularly important for throughput sensitive applications ( which might have bursty traffic ).
Network Proportionality: Share bandwidth between tenants based on their payments could reach fair law in the network..

However, in a very contradictional condition, these three requirements could not be achieved simultaneously:
(1) There is a hard tradeoff between min-guarantee and network proportionality.The example listed in the paper shows that one could arbitrarily reduce the bandwidth of one tenant by increasing the number of its own VMs.So if network proportionality is promised, then eventually the min-guarantee will not be promised.
(2) There is a tradeoff between network proportionality and high utilization. It could be discussed from two viewpoints: congestion proportionality, link proportionality. Since network proportionality only looks at the number of VMs, so by adding some paths which won’t consume the congestion bandwidth will let the utilization on the bandwidth to fall. So congestion proportionality only looks at the bandwidth at congested paths. But if strategy proofness is not guaranteed, there still would be some problems. So in link proportionality, single links are investigated. Since links are independent, high utilization is achieved. But different communication patterns might break the rule. Either per-source, per-dest, or per-SD fail to reach the link proportionality.

Then the paper proposed five network sharing properties.These properties are related to the three requirements, actually these properties are only used to examine the above tradeoffs more explicitly. They are work conservation, strategy-proofness, utilization incentives, communication-pattern independence and symmetry.

So since the three requirements could not be satisfied at the same time, we could only try our best to balance these trade-offs and try to meet these properties mentioned above as much as possible.
(1) PS-L: The weight of a queue for a tenant is relevant to its VM’s total weight which are using this link. But these are some bugs. So to fix the bug, one method is to global weight on each link. Another solution is using the opposite communication receiver as a denominator if the weight. PS-L only offers a link level view of proportionality.
(2) PS-N: The weight of communication between X and Y is modified so the denominator is the number of VMs X communicates with across the entire network, instead of just a particular link.Some details are not clear to me, waiting to be explained.
(3) PS-P: PS-P reaches min-guarantee by setting the importance factor on the weighting scheme. One example is to set more weight on hosts which are closer to a link than those which are remote. So it achieves a per-source fair sharing for the traffic towards the root and per-dest fair sharing from the root. So, the paper said the min-guarantee is reached because the VM competes on a given link to the root only with the other VMs in the same subtree. PS-P could also be implemented per tenant. The disadvantage is that the provided guarantees are not for each VM.
The three policies all reach the properties of work conserving, symmetric, and offer communication-pattern independence. But they have different subsets of the remaining properties. And, since these policies are based on weighted queuing, it requires switches to support this feature. If no switch could support weighted queuing, central controller should be adopted to provide the QoS services.

The evaluation is divided into three parts: link level, network level (small scale ) and a real cluster (large scale) .
For Link-level scenarios, a single congested link is examined. The experiment focus on the bandwidth allocation of a tenant which increasing the number of receivers of the other VM.So bandwidth of PS-P and Per-source (out) remains unchanged because they are equal and only look at the source in this condition. PS-L provides tenant a proportional share ( 2 / (n +3)) , A has two VMs and totally N+3 machines. However per-flow and per-source does not provide proportionality. Then they tested the MapReduce jobs on the link. So mappers could be treated as senders and reducers are receivers. While keeping (M+R == 10) , PS-L keeps unchanged. But if they change the number of M and R, PS-P’s allocation increases as M/M+5.
In the network scenario, one tenant is using pairwise one-to-one communication pattern and another is using all-to-all communication pattern. From 10(b), we can see that per-flow and PS-L favor dense communication patterns over sparse communication patterns. If the core links are under-provisioned, PS-N provides the best proportionality at the network level. Then two experiments in fully provisioned network aim to test the tradeoff between proportionality and bandwidth guarantees. PS-P provides better min-guarantee property and PS-N provides better proportionality. Figure 12 is very clear to show that PS-N could provide min-guarantee while other policies will decrease the min-guarantee as the increase of the number of hosts.
At last they performed experiments from a 3200-node Facebook production data center. From figure 13 we can see that, for PS-P differs the least between large and small jobs, providing a relatively fair allocation of bandwidth. The per-flow performs the worst, it even achieves the quadratic allocation. So the hint is for clusters which a lot of jobs with different sizes reside in, PS-P could provide the best fairness.

Two important reference papers are Oktopus and SecondNet, which aim to privide static reservation throughout the network to implement bandwidth guarantees for the hose model and pipe model.

Inspiration:

Since I am doing a project on data center marketing , which achieves the network bandwidth guarantee basing on a market mechanism, the FairCloud solves the problem from a different angle. I am considering whether it is possible to combine the policies of FairCloud with market economy. The possibility of combination emerges because the bidding system also starts from the angle of tenants, and the concept of VMs could be easily imported to the scenario.
So FairCloud might perform as a backup / optimization plan for the bidders who does not win the auction. If a bidder loses an auction, he might still be able to establish a connection, but the tradeoff is that his expected guarantee min-bandwidth could not be achieved. But, using min-guarantee policies like PS-P could do a best-effort complementary delivery.

Place, Jordan

unread,

Apr 8, 2013, 9:06:36 PM4/8/13

to csci2950u-...@googlegroups.com

FairCloud: Sharing the Network in Cloud Computing

Various authors from HP Lab, UC Berkeley, U Washington
SIGCOMM '12
Companies such as Amazon have began selling virtual server
instances to the public with a "pay for what you use" model. These VMs
are hosted in data centers and advertised with certain memory and CPU
power guarantees. Unfortunately, it is difficult to provide inter-VM
bandwidth guarantees to customers utilizing multiple VMs, even though
this feature is desirable to customers and could be advertised to
boost VM providers' revenue. The authors of FairCloud aim to address
the problem of fairness in today's data centers.
They begin by defining desirable properties that would ideally
characterize a fair network. These properties include work
conservation, strategy-proofness, utilization incentives,
communication pattern independence and symmetry. They then explain why
it is especially difficult (if not impossible) to have all these
characteristics and evaluate the tradeoffs in favoring one property
over another.
The authors propose three means of determining what bandwidth
flows in the data center should be allotted and explain the pros and
cons of each in terms of deployability, minimum bandwidth guarantees
and the properties mentioned above. The first of these means is
"Proportional Sharing at Link-level" (PS-L) which simply shares the
bandwidth of a link according to how many of a customer's VMs are
communicating through that link. "Proportional Sharing at
Network-level" (PS-N) is similar to PS-L, but counts VMs at the
network-level for weighting instead of the link-level. Lastly,
"Proportional Sharing on Proximate Links" enforces per-source fair
sharing towards the data center's network root and per-destination
fair sharing from the data center's network root.
The authors test the fairness of their proposed algorithms using
Facebook datacenter traces run through a simulator. These results are
reproducible and support the fairness properties each method mentioned
above were hypthosized to have. The authors mention how these fairness
mechanisms may be deployed (though they never evaluate whether
deployment will work well in practice).
Data center fairness is definitely a concern for the future of the
cloud and this paper is great starting point for exploring how
fairness may be enforced. Unfortunately the deployability of some of
these mechanisms seems fairly unscalable in practice (mostly due to
the queuing requirements of switches). I wonder if a centralized
controller like in an SDN may be used to offer more practical
deployability with similar results.

> --
> You received this message because you are subscribed to the Google Groups
> "CSCI2950-u Spring 13 - Brown" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to csci2950u-sp13-b...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Papagiannopoulou, Dimitra

unread,

Apr 9, 2013, 1:11:43 AM4/9/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Title: FairCloud: Sharing the Network in Cloud Computing

Authors: Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica

SIGCOMM'12

Novel Idea: The authors of the paper, identify three main requirements that sharing in cloud networks should satisfy: min-guarantee, high-utilization and network proportionality. Based on those requirements, they set some fundamental tradeoffs for network resource allocation in cloud and data center environments. They develop a set of properties to explicitly express those tradeoffs and use them as a guide for the design of three resource allocation policies.

Main Result: The main results of the paper are the following: First, is the definition of the critical tradeoffs for network resource allocation in cloud and data center environments. Second, the development of three resource allocation policies, PS-L, PS-N and PS-P. Third, the evaluation of these policies through simulations and a software switch implementation, which show that they achieve their intended properties.

Evidence: The authors first analyze the desirable requirements for bandwidth allocation across multiple tenants in cloud networks and use specific examples to present those requirements and show that they cannot be simultaneously satisfied. They describe the tradeoffs between network proportionality and min-guarantee, as well as high utilization. After presenting traditional allocation policies, they come to the conclusion that both Per-Source and Per-Destination allocation policies fail in providing link proportionality and they don't satisfy min-guarantee. They present five desirable properties for network sharing that help them examine the described tradeoffs (work conservation, strategy-proofness, utilization incentives, communication-pattern independence and symmetry). Using detailed illustrated examples, they introduce PS-L, PS-N and PS-P. The allocation policies are evaluated through link-level and network-level scenarios as well as trace-driven simulations. The evidence show that PS-L achieves link proportionality but does not provide utilization incentives, PS-N achieves better proportionality at the network level but not full incentives for high utilization and PS-P provides guarantees and incentives for high utilization but not proportionality. The aforementioned results are justified by reporting the aggregate bandwidth for different number of mappers, when running MapReduce jobs. The simulation results are also validated though experiments on a testbed for which the obtained aggregate bandwidth numbers match closely the simulation results (at less than 4% error margin).

Prior Work: The simulation results of this paper were validated by experiments in the DE-TERlab testbed [4]. For this purpose, the authors implemented switch support for PS-L, PS-P and PS-N using per-flow queues in a software router that was implemented in Click [20]. The experiments were carried out using MapReduce traces from a Facebook production data center [12]. Throughout the paper, the authors assume an Infrastructure-as-a-Service (IaaS) could model such as Amazon EC2 [2] where tenants pay a fixed flat-rate per Virtual Machine. Finally, the problems of traditional allocation policies that were exposed through previous works [11, 25], such as per-flow fairness, per-SD and Per-Source allocation policies, gave a motivation to design the three new allocation policies that satisfy the discussed properties.

Competitive Work: Other works on sharing cloud networks have been proposed in the past. Among them, Seawell [25] is different from this work, but it can be used to implement PS-N or PS-P in the future. Oktopus [10] and SecondNet [17] propose static reservations in the network to provide bandwidth guarantees for the hose and the pipe model. As they come with both advantages and drawbacks, it would be better to be used in combination with the allocation policies of this paper. Gatekeeper [24] proposes a per-VM hose model with work conservation, using a hypervisor-based mechanism. For network sharing, NetShare [9] uses per-tenant weights that are constant throughout the network.

Reproducibility: The results of this paper are reproducible.

Criticism: Overall, this is an great work on many levels. First, the authors clearly state their motivation, the objectives of their work as well as their contributions. The way they present their ideas, justifies their choices in the resulting resource allocation policies. They use detailed illustrated examples, and theory to support their statements. The evaluation process is thorough. They consider link-level scenarios, network-level scenarios and perform trace-driven simulations. They present solid evidence to show how and when their allocation policies work better than others. Finally, they don't only propose their resource allocation policies, but also examine the practical challenges of their deployment, which gives a more complete and concrete picture of how their work could be adapted in the real world.

On Mon, Apr 8, 2013 at 8:42 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Charles Zhang

unread,

Apr 8, 2013, 11:15:22 PM4/8/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Paper Title: FairCloud: Sharing the Network in Cloud Computing

Authors: Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica

Date: August 2012 SIGCOMM

Novel Idea:

This paper discussed the fairness of sharing the cloud resource among network tenants, which resulted in the proposal of three requirements, two fundamental tradeoffs, and three allocation policies.

Main Results: As mentioned in the novel idea section, they exposed the fundamental tradeoffs in network resource allocation in cloud and data center environments and proposed a set of requirements and properties to explicitly express those tradeoffs. And then they developed a set of resource allocation policies dealing with the tradeoffs and used simulation and testbed experiments to evaluate the policies.

Impact:

The allocation policies proposed in this paper is a good starting point for further exploring the tradeoff space.

Evidence:

To evaluate the performance of the three policies, they conducted experiments on all policies in three different test scenarios. First they ran the test on only one congested link as the link level scenario, and another test on a small scale hand crafted network as the network level scenario, and finally a large scale test where they leveraged traces obtained from a 3200 node production cluster at facebook as the trace driven simulations. For the experiments, they implemented switch support using per-flow queues in a software router implemented in Click and then generated the results using a flow level simulator written in java and validated the simulation results by experiments in the DETERlab testbed.

Prior work: The hose model and pipe model are based on Oktopus and SecondNet’s static reservations. Gatekeeper proposed a per-VM hose model with work conservation

Reproducibility:

Yes, the work can be reproduces. The authors said that they wrote in total 900 lines of C++ code and 2000 lines of python code.

Comment:

Overall a great work that discussed the actual fairness of the network resource allocation with consideration for performance and utilization as well.

kmdent

unread,

Apr 8, 2013, 10:33:59 PM4/8/13

to csci2950u-...@googlegroups.com

FairCloud: Sharing the Network in Cloud Computing

by Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, and Ion Stoica

Novel Idea: Outlining three requirements of networks, identifying the fundamental tradeoffs in a network, and then creating three allocation policies to better achieve these requirements. The first is minimum bandwidth guarantee regardless of others use
of the network. The second is maximum network utilization at all times. The final is dividing up network resources based on how much they paid, known as network proportionality.

Main Result: Minimum guarantees are directly in conflict with network proportionality. The problem is that one person can increase the number of VMs and directly reduce the minimum guarantee on the link for others. Network Proportionality is directly in conflict with maximum utilization. Network and congestion proportionality are not link independent, which means that they don’t provide incentives to use free resources, instead they penalize by reducing network allocation on other congested links. Under maximum utilization, free resources would be incentivized. The traditional allocation mechanisms all have the same problem of not adequately addressing the desired properties.The proposed properties are full utilization of bottlenecked links, cheat proof, no disincentivization of using free links, equality of communication, and reversing demands of flows does not change network allocation. In addition to the flate rate- per VM, PS-P can support per byte pricing models. Deployment for PS-L, PS-P, and PS-N are all currently possible. Besides weighting, PS-P needs one bit for each interface identifying whether the interface is facing the hosts or the network core. PS-N is a little more complicated, because there needs to be coordination between the source and the dest hypervisors for setting the weight.

Evidence: For each of the different discussed methods, they do an example to show how it is lacking in their proposed network control properties. They also run tests about the different policies on a virtual network simulator. They test the policies on a single congested network link, then on a larger network, and finally on traces from a facebook cluster.

Reproducibility: The results are fairly reproducable.
Prior Work: Seawall
Competitive Work: Oktopus, Gatekeeper, NetShare
Question: Why would facebook release traces? Wouldn’t that make it easier for people to find security vulnerabilities?

Shao, Tuo

unread,

Apr 9, 2013, 1:32:15 AM4/9/13

to csci2950u-...@googlegroups.com

Paper Title

FairCloud: Sharing the Network in Cloud Computing

Authors

Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica

Date

SICOMM'12, August 13-17, 2012

Novel Ideas

The paper investigates the tradeoffs to share network resources among different network allocation requirements and presents policies enables balancing the tradeoffs.

Main Results

The paper makes mainly two contributions: the first is that it exposes the fundamental tradeoffs in network resource allocation; the second is it develops a set of policies to best navigate these tradeoffs.

Impact

By investigating the tradeoffs and developing policies, it enables commercial cloud network to provide customer with network resources according to customer's payment.

Evidence

The paper first categorized the network allocation requirements and discussed about the tradeoffs among these requirements in different scenarios. Then by investigating the traditional allocation policies, the paper then points out some of the properties of network allocation behind those requirements. Based on the previous discussion, the paper proposes three allocation policies in different levels. At last, the paper evaluates these policies in different scenarios of different level and test whether they meet the property requirements.

Jeff Rasley

unread,

Apr 9, 2013, 10:26:26 AM4/9/13

to csci2950u-...@googlegroups.com

Video associated with FairCloud at SIGCOMM '12: http://conferences.sigcomm.org/sigcomm/2012/video/SIGCOMM-V-01-FairCloudSharingtheNetworkInCloudComputing.mp4

On Monday, April 8, 2013 8:42:02 PM UTC-4, Rodrigo Fonseca wrote:

Reply all

Reply to author

Forward