Project Details

Rodrigo Fonseca

unread,

Feb 21, 2013, 10:12:47 AM2/21/13

to csci2950u-...@googlegroups.com

The project for this course will be in groups of preferably 2 people. You will have to have a project proposal draft, describing one or more ideas you have for a project, by next Friday, March 1st. On the following class, March 5th, we will discuss these in class.

There are two types of possible projects:

1. open research problem: you will investigate some new idea, either solving a problem, or extending an existing solution in a meaningful way. This is a mini research paper, and ideally it could lead to a publication!

2. reproducing a research result from the literature. This is very much in the spirit of http://reproducingnetworkresearch.wordpress.com/, using network emulation, simulation, or real deployment

Don't rely solely on the reading list for the course, but use it as a baseline.

Also consult recent publications in Sigcomm, CoNEXT, Infocom, HotNets, HotSDN, HotICE, NSDI, SOSP, OSDI, USENIX ATC, CCR.

This page has a good selection of potential results/experiments to be done for flavor 2 of the project: http://www.stanford.edu/class/cs244/2013/pa3.html

If you have particular interests, please come talk to me!

Rodrigo

Rodrigo Fonseca

unread,

Feb 28, 2013, 11:17:16 AM2/28/13

to csci2950u-...@googlegroups.com

Also, here's a nice list of current and recent research papers on SDNs:

http://www.nec-labs.com/~lume/sdn-reading-list.html

Zhiyuan(Eric) Zhang

unread,

Mar 1, 2013, 11:58:08 PM3/1/13

to csci2950u-...@googlegroups.com

One idea I have for project is design a domain specific language in Haskell for OpenFlow network (maybe for a more specific field such as routing policy). Haskell is perfect for domain specific language design, and it's fun. A lot of work has been done about programming languages design for SDN[1], and this is a very well-defined domain. This saves us a lot of work on formalizing the domain knowledges and enables us to concentrate on the implementation (which is the fun part). Some similar work has already been done at Yale: Nettle[2] is an embedded domain specific language in Haskell for OpenFlow networks. It has a low layer for the interaction between the Haskell library and the OpenFlow protocol, and it provides APIs based on function reactive programming(FRP). One thing we can do is that implement a light-weight version of it on NOX: FRP is a smart idea, but it seems to be an overkill to me. I think for programming languages in some subfield (like routing policy, still thinking about this part), pure functional language should be enough, and with the help of tools like monad hopefully we can have a simpler language. One potential difficulty is that Haskell's Foreign Function Interface(FFI) is designed for C and it has poorly support to C++. This might be a problem if we want the Haskell library interacts directly with the NOX APIs.

Another topic that interests me is using OpenFlow for load-balancing. This paper[3] proposed an interesting algorithm for distributing traffic and adjusting policy for load-balancing. This is still a vague idea, but I think it would be interesting to reproduce the result in this paper and maybe extend the algorithm into some specific situation. One interesting case is load-balancing for memcached: the traffic would be unbalanced if one server contains many "hot keys" while the others don't. This can be implemented as an application running on OpenFlow controllers

On Thursday, February 21, 2013 10:12:47 AM UTC-5, Rodrigo Fonseca wrote:

Charles Zhang

unread,

Mar 3, 2013, 10:26:11 PM3/3/13

to Rodrigo Fonseca, csci2950u-...@googlegroups.com

Project idea:

I'd be interested in knowing more about MPTCP in SDN. I have read the first few chapters of the MPTCP paper that we are going to read on April 4th and in it a controller was mentioned but I haven't gone through the details about how the controller is designed and implemented, but I guess that's the same concept behind the global controller of an SDN. That paper was for data centers, so I'd be interested to know how MPTCP will fit into a more general SDN.

Some work has been done in ECMP (equal cost multi-path routing) in the broader Internet and there are some several severe limitations due to the nature of the architecture of today's Internet (according to wikipedia): one being if the sub flows later re converge onto a single low bandwidth link which is a common scenario, then the performance will drop, maybe even below a single path TCP due to the congestion this convergence may cause; another being the logical topology being different from the physical topology of the network, thus the same physical path could be treated as multiple alternative paths, causing those paths to be overloaded with traffic.

But with SDN's central controller, it will have knowledge of the topology of the network and avoid assigning multiple sub flows to connections that will submerge to fewer low bandwidth paths and only assign an equivalent number of sub flows to the number of downstream paths in a narrow "bottleneck". Also the central controller will have the knowledge of both the physical and logical topology of the network, and it could assign the MPTCP routing entries according to the physical topology.

I do not have a partner yet so if you have a similar idea then I am happy to work with you! But I'm also not limited to only the idea above and flexible in joining other interesting project.

--
You received this message because you are subscribed to the Google Groups "CSCI2950-u Spring 13 - Brown" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csci2950u-sp13-b...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jeff Rasley

unread,

Mar 4, 2013, 7:54:50 PM3/4/13

to csci2950u-...@googlegroups.com

Project Idea:

This project extends ongoing research at Brown regarding the network impact of “Big Data” analysis frameworks such as Hadoop. There are two major components I would like to accomplish this semester.

There has been recent work by a student in the Brown systems group to instrument current versions of Hadoop with X-Trace [1]. This instrumentation is currently at a state where we can start using is to answer questions about Hadoop’s execution. In addition to application level traces produced by X-Trace, my work in instrumenting Hadoop to expose low-level TCP flow is complete. I would like to correlate high level X-Trace traces with my lower level flow-based ones to get a more detailed end-to-end picture of the execution of jobs. After high and low level traces are connected I am interested performing critical path analysis on the events, in order to gain a better understanding of the bottlenecks in various Hadoop jobs. Depending on what bottlenecks are found, I would also like to experiment with decreasing these bottlenecks in order to decrease overall job completion time.

In attempt to correlate high/low level trace and critical path analysis, I would also like to better categorize TCP flows. Currently I am breaking down flows into categories based on Job ID and using port information to get coarse-grained insight into what the flows are doing, such as traffic related to the Shuffle, DataNode or NameNode traffic. These 3 port-based categories are too coarse, I would like to better understand more specifically what is going on. This shouldn’t be too difficult since we are already logging stack traces for each TCP flow and once we correlate X-Trace data with flows we will have that additional information.

So far I have described tasks that I will be doing myself this semester. I am also looking for another student that has experience in visualization techniques to help visualize some of the data we currently have. Ideally I would like to have a visualization that shows all of the machines in our cluster and all the communication between them. This would ideally be an animation that could be replayed at various speeds with detailed labels of what happened. Since we currently have the tools to collect all required data for this type of visualization I don’t imagine it being too difficult. The hardest part will be in presenting it nicely, ideally the student would have previous experience with D3. Currently we have a student in the systems group who is using D3 [2] to present the X-Trace data but this visualization does not currently take into account events over time, in the animation sense. What I really like about this idea is that it will enable us to gain a more detailed understanding of how Hadoop works and what happens throughout the entire job's lifespan.

This is one static example from D3 that could be a jumping off point for visualizing this kind of data: http://bl.ocks.org/mbostock/4062006

[1] http://cs.brown.edu/~rfonseca/pubs/xtr-nsdi07.pdf
[2] http://d3js.org/

Reply all

Reply to author

Forward