how to realize the visualization of ~9,000 proteins with cytoscape

34 views
Skip to first unread message

happyhappy

unread,
Oct 5, 2009, 8:10:28 AM10/5/09
to cytoscape-discuss, happyq...@163.com
Hi,

I've got interested in protein-protein interactions. Since the
cytoscape is powerful for visualization, I downloaded human protein-
protein interaction data and imported them to this software. When I
use cytoscape layout->spring embedded, however, the softeware went
wrong. And I thought that it might be related to java heap space, so
I've changed cytoscape.sh to enlarge the space. Later, the result
still appeared to be a failure. No more visualization emerged, which
caused me to be confused about the layout of those 9,000 proteins and
their relationship.
can you lend me a hand? Hope to hear from you soon! Thanks a lot!

Alexander Pico

unread,
Oct 5, 2009, 1:09:27 PM10/5/09
to Cytoscape-Discuss, happyq...@163.com
Dear Happy,

I was able to reproduce the error running Spring Embedded on BINDhuman.sif
(see attached screenshot). This is a bug.

In general, you will want to be cautious running layouts on huge networks.
The performance depends on your hardware. If you want to share your
specifications (CPU and RAM), we could probably give a rough estimate of
reasonable network sizes that can be laid out.

As an example, I have 2.53GHz and 4GB and I would *not* try to layout the
BINDhuman.sif with 19k nodes and 31k edges.

One approach is to select a subset of the interactome (e.g., nodes connected
to proteins of interest), create a new subnetwork with just these nodes and
edges, and then lay those out.

- Alex
Picture 1.png

Gang Su

unread,
Oct 6, 2009, 2:46:53 PM10/6/09
to cytoscap...@googlegroups.com
Have you tried Force-directed layout? That might help in some cases
Spring fails.
Actually i don't think to put such a large interaction network into
one display will help out anyway. Like Alex said, you may want to
select a subset of the network and layout with those.
That will get you a clearer idea of the underlying structure too.

Gang

happyhappy

unread,
Oct 8, 2009, 11:13:39 PM10/8/09
to cytoscape-discuss
Thanks a lot for your advice. However, my aim to do with cytoscape is
to see the layout of proteins of my interest in the whole human
protein-protein interaction. So to display such a large network seems
to be necessary. Could you help me to figure some methods out to solve
this problem? Thanks all!

happyhappy

On Oct 7, 2:46 am, Gang Su <paladinj...@gmail.com> wrote:
> Have you tried Force-directed layout? That might help in some cases
> Spring fails.
> Actually i don't think to put such a large interaction network into
> one display will help out anyway. Like Alex said, you may want to
> select a subset of the network and layout with those.
> That will get you a clearer idea of the underlying structure too.
>
> Gang
>
> On Mon, Oct 5, 2009 at 1:09 PM, Alexander Pico <ap...@gladstone.ucsf.edu> wrote:
> > Dear Happy,
>
> > I was able to reproduce the error running Spring Embedded on BINDhuman.sif
> > (see attached screenshot). This is a bug.
>
> > In general, you will want to be cautious running layouts on huge networks.
> > The performance depends on your hardware. If you want to share your
> > specifications (CPU and RAM), we could probably give a rough estimate of
> > reasonable network sizes that can be laid out.
>
> > As an example, I have 2.53GHz and 4GB and I would *not* try to layout the
> > BINDhuman.sif with 19k nodes and 31k edges.
>
> > One approach is to select a subset of the interactome (e.g., nodes connected
> > to proteins of interest), create a new subnetwork with just these nodes and
> > edges, and then lay those out.
>
> >  - Alex
>

happyhappy

unread,
Oct 8, 2009, 11:07:58 PM10/8/09
to cytoscape-discuss
Dear Alex,

Thank you for your suggestion and help. My task to utilize cytoscape
is to see the layout of proteins which I focus on in the whole human
protein-protein interaction. Therefore, it is necessary for me to
obtain exact view the huge network with those 9,000 nodes. What's
more, I have connected my pc with a server with about 32G RAM. But the
result turned out to be strange, just like nothing happened.
In addition, I've read a paper where they make the visualization of
thousands of nodes and simultaneously highlight those proteins of
interest. Therefore, I'm hopeful to have some advice from you
specialist. I'm indeed looking forward to hearing from you. Thanks
all!

On Oct 6, 1:09 am, Alexander Pico <ap...@gladstone.ucsf.edu> wrote:
> Dear Happy,
>
> I was able to reproduce the error running Spring Embedded on BINDhuman.sif
> (see attached screenshot). This is a bug.
>
> In general, you will want to be cautious running layouts on huge networks.
> The performance depends on your hardware. If you want to share your
> specifications (CPU and RAM), we could probably give a rough estimate of
> reasonable network sizes that can be laid out.
>
> As an example, I have 2.53GHz and 4GB and I would *not* try to layout the
> BINDhuman.sif with 19k nodes and 31k edges.
>
> One approach is to select a subset of the interactome (e.g., nodes connected
> to proteins of interest), create a new subnetwork with just these nodes and
> edges, and then lay those out.
>
>  - Alex
>
> On 10/5/09 5:10 AM, "happyhappy" <happyqian...@gmail.com> wrote:
>
>
>
> > Hi,
>
> > I've got interested in protein-protein interactions. Since the
> > cytoscape is powerful for visualization, I downloaded human protein-
> > protein interaction data and imported them to this software. When I
> > use cytoscape layout->spring embedded, however, the softeware went
> > wrong. And I thought that it might be related to java heap space, so
> > I've changed cytoscape.sh to enlarge the space. Later, the result
> > still appeared to be a failure. No more visualization emerged, which
> > caused me to be confused about the layout of those 9,000 proteins and
> > their relationship.
> > can you lend me a hand? Hope to hear from you soon! Thanks a lot!
>
>  Picture 1.png
> 154KViewDownload

Gang Su

unread,
Oct 8, 2009, 11:29:26 PM10/8/09
to cytoscap...@googlegroups.com
Hi Happy,

How many edges are there? Sometimes the problem is, if your network is
too dense (There are way too many edges than nodes), the force
simulation won't be able to separate the nodes far apart from each
other. I don't know what exactly your goal is. For a dense network
it's very difficult to visualize the topo structure on a 2D plane. If
you are looking for clusters of nodes, you can go and use clusterMaker
plugin. If you want to inspect the relationship of your protein with
regard to the entire human proteome, I think you will only get a
pretty graph when the edge number is pretty low. Look at the
galFiltered data, there are only around 300 nodes and 300 edges, the
average degree is around 2. if you have 300 nodes and 3000 edges, then
every node is pretty much on average connecting to 20 other nodes. In
this case, unless the graph has a strong small world property, where
you can have several large clusters, the force directed layout will
only generate a messy hairball. IMHO the reason is that you are trying
to flatten out a hyperspace structure into a 2D plane, and this fails
when simply the hyperspace structure is very complex. This is
equivalent to the problem of Multi Dimensional Scaling.

If you provide more specific information maybe i can help with some
other suggestions.

Gang

Scooter Morris

unread,
Oct 9, 2009, 9:02:33 AM10/9/09
to cytoscap...@googlegroups.com
Hi Happy,
    We regularly visualize networks with lots of nodes and edges.  Here are my recommendations:
  1. Definitely set your memory sizes up.  I use 16GB for heap space and 50MB for my stack space.
  2. Use Force-Directed by bringing up the Layout-->Settings panel and selecting Force-Directed.
  3. Adjust the Force-Directed settings to get the desired result.  Since you're looking at protein-protein interaction networks, remember that they are (or should be) scale-free, which means you will have hubs and outliers.  Depending on what you are interested in, you may wind up increasing or decreasing the default spring length to spread things out or bunch them together.
Iterate on step 3 -- adjusting the spring length, mass, and spring strength coefficients until you get things to look reasonable.   This may take some patience, but I've almost always been able to get a network that I've been able to use to explore.  If you want to take it the next step to find complexes and hubs in the network, there are lots of plugins available, including clusterMaker, which provides an implementation of MCL clustering (and Community Clustering in the new version -- thanks to Gang Su!).

-- scooter

happyhappy

unread,
Oct 10, 2009, 8:01:33 AM10/10/09
to cytoscape-discuss
hi,scooter,

Thanks a lot! After using force-directed layout, the display of those
proteins seem to be reasonable. However, just as you guys said, if I'd
like to find complexes and hubs, I should do more work. Therefore, I
got very interested in the plugin----ClusterMaker. Could you give me
some introduction about such plugin and its function? What's more, I
tried to use this plugin to cluster proteins of my interest by MCL
clustering. But nothing happened when pressing the button : create
clusters. Could you teach me how to make full use of such plugin?
Thanks much more.

happyhappy

On Oct 9, 9:02 pm, Scooter Morris <scoo...@cgl.ucsf.edu> wrote:
> Hi Happy,
>     We regularly visualize networks with lots of nodes and edges.  Here
> are my recommendations:
>
>    1. Definitely set your memory sizes up.  I use 16GB for heap space
>       and 50MB for my stack space.
>    2. Use Force-Directed by bringing up the Layout-->Settings panel and
>       selecting Force-Directed.
>    3. Adjust the Force-Directed settings to get the desired result.
> >> 154KViewDownload- Hide quoted text -
>
> - Show quoted text -

happyhappy

unread,
Oct 10, 2009, 8:11:32 AM10/10/09
to cytoscape-discuss
hi, Gang,

Yes, what you said is pretty right. Though I have got the display of
those 9,000 proteins by force simulation, they are just like messy
hairball. And my goal is to distinguish whether there is significant
differences between a subnetwork and the whole global network.
Therefore, I'd like to use some topological coefficients or some
degree centrality to analyze significance. Further, I'd like to
cluster some proteins isolated from the human interaction network
through subnetwork. So I think clusterMaker may be very helpful. Could
you give me some introduction or materials to learn it? Thanks a lot!

happyhappy
> >> >> can you lend me a hand? Hope to hear from you soon! Thanks a lot!- Hide quoted text -

Joonas Jämsen

unread,
Oct 11, 2009, 1:21:36 PM10/11/09
to cytoscap...@googlegroups.com
Hi,

I would first divide my dataset according to some other variable and
then analyze the subnetworks from that dataset. At present, it is
somewhat challenging for a beginner (been there, done that) to analyze a
large set of proteins like you say you have.

Best regards,
Joonas.
jajams.vcf

Alexander Pico

unread,
Oct 11, 2009, 1:44:04 PM10/11/09
to cytoscap...@googlegroups.com
If you are interested in clusters/motifs based on topology and in
network properties in general, I'd recomend these plugins:

MCode
NetworkAnalyzer

- Alex
> <jajams.vcf>

Gang Su

unread,
Oct 12, 2009, 3:09:39 PM10/12/09
to cytoscap...@googlegroups.com
I am porting lots of community analysis functions to Glay 2.0 plugin,
the first prototype would probably be released in a month.

Gang

On Sun, Oct 11, 2009 at 1:44 PM, Alexander Pico
Reply all
Reply to author
Forward
0 new messages