implementing add_egde function on input data from Gene Ontology

16 views
Skip to first unread message

ayesha

unread,
Nov 18, 2009, 2:10:42 AM11/18/09
to networkx-discuss
Hi,

I am new to the programming world of python and need real help.

I have a text file that has numerous lines of input and each line
looks something like the following:

ln1 = Y54E10BR.6 GO:0006350|GO:0006979|
ln2= C08F11.14 GO:0006097|GO:0006099|GO:0008152|
and so on...

the first string is protein , the second and third being genes. I have
written a piece of code that separates the proteins and genes based on
the white spaces and the ' | '.
I now need to create a network such that, there is an edge between
each proetin and genes in the same line. A gene in one line may also
have an edge with a protein in another line...Can I do it with the
current file format that I have? I have been reading about the
add_edge function and I get the impression taht it needs some
numerical input?
If thats the case,how can I convert my more than 5000 proteins and
genes all into numbers and then add an edge between all of them?
Any ideas please?

Dan Schult

unread,
Nov 18, 2009, 9:06:57 AM11/18/09
to networkx...@googlegroups.com
Here are some code snippets that might help...

G=nx.Graph()
for p,glist in yield_one_line_at_a_time(file):
for gene in glist:
G.add_edge(p,gene)

Or using a "list comprehension" style

G=nx.Graph()
G.add_edges_from( [ (p,gene) for p,glist in yield_etc(filename) for
gene in glist ] )

Strings are just fine for nodes. You don't have to convert to numbers.
Dan
> --
>
> You received this message because you are subscribed to the Google
> Groups "networkx-discuss" group.
> To post to this group, send email to networkx-
> dis...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/
> group/networkx-discuss?hl=.
>
>

Fidel

unread,
Nov 19, 2009, 9:35:00 AM11/19/09
to networkx-discuss
Hello Ayesha,

>
> I have a text file that has numerous lines of input and each line
>

If you are new to python, you might want to take a look at

http://wiki.python.org/moin/BeginnersGuide


> I have a text file that has numerous lines of input and each line
> looks something like the following:
>
> ln1 =  Y54E10BR.6       GO:0006350|GO:0006979|
> ln2=   C08F11.14        GO:0006097|GO:0006099|GO:0008152|
> and so on...
>
> the first string is protein , the second and third being genes. I have
> written a piece of code that separates the proteins and genes based on
> the white spaces and the ' | '.
> I now need to create a network such that, there is an edge between
> each proetin and genes in the same line. A gene in one line may also
> have an edge with a protein in another line...Can I do it with the
> current file format that I have?

I believe you can, since it looks a lot like an adjacency list.

>...I have been reading about the
> add_edge function and I get the impression taht it needs some
> numerical input?

You can use any hashable object. I believe strings are hashable.

http://networkx.lanl.gov/reference/generated/networkx.Graph.add_edge.html#networkx.Graph.add_edge

I hope this helped.

Greetings,
Fidel

ayesha

unread,
Nov 23, 2009, 2:13:51 PM11/23/09
to networkx-discuss
Thanks a lot Dan and Fidel :)
I was able to add edges by using a simple for loop with
myGraph.add_edge(protein, lGene)...I now need to calculate the
clustering coefficient for all nodes in my network. I found out that
by following the example below i could compute the number of triangles
in the network.But if anyone could please clarify and elaborate a bit
more on it:

>>> G=nx.complete_graph(5)
>>> print nx.triangles(G,0)
6
>>> print nx.triangles(G,with_labels=True)
{0: 6, 1: 6, 2: 6, 3: 6, 4: 6}
>>> print nx.triangles(G,(0,1))
[6, 6]

in the first line, do I need to pass the total no. of nodes in my
network? What is the significance of '0' in the second line? so that
it starts from the very beginning?
and the fourth line i assume returns true whenever it finds a trianlge
in the graph right?

Moreover, I need to calculate the Pearson Correlation Coefficient in
order to get the relatedness of the biological network. Is there a way
to do that in networkx?

Thanks in adv. !
> > group/networkx-discuss?hl=.- Hide quoted text -
>
> - Show quoted text -

alex

unread,
Nov 23, 2009, 2:21:58 PM11/23/09
to networkx...@googlegroups.com
ayesha wrote:
> [...]
> Moreover, I need to calculate the Pearson Correlation Coefficient in
> order to get the relatedness of the biological network. Is there a way
> to do that in networkx?
> [...]

If you can create a matrix of observations of variables then numpy
corrcoef might help with this.

Alex

Fidel

unread,
Nov 24, 2009, 8:41:53 AM11/24/09
to networkx-discuss
> I was able to add edges by using a simple for loop with
> myGraph.add_edge(protein, lGene)...I now need to calculate the
> clustering coefficient for all nodes in my network. I found out that
> by following the example below i could compute the number of triangles
> in the network.But if anyone could please clarify and elaborate a bit
> more on it:

Maybe you can use

http://networkx.lanl.gov/reference/generated/networkx.clustering.html#networkx.clustering

> >>> G=nx.complete_graph(5)

Creates a complete graph on 5 vertices.

> >>> print nx.triangles(G,0)
> 6

Counts the number of triangles of G that include vertex 0

> >>> print nx.triangles(G,with_labels=True)
>
> {0: 6, 1: 6, 2: 6, 3: 6, 4: 6}

If you put with_labels=True the function will return a dictionary
keyed on vertex labels an the values will be the number of triangles
of G containing a given vertex. In your example, each of the 5
vertices of the complete graph is contained in 6 triangles.

> >>> print nx.triangles(G,(0,1))
>
> [6, 6]

The fourth line will print the number of triangles containind 0 and 1
respectively, that is why you get [6,6]. Each of 0 and 1 is contained
in 6 triangles.
For more details about the triangle function you can see

http://networkx.lanl.gov/reference/generated/networkx.triangles.html#networkx.triangles


You can check this other site for the Pearson Correlation Coefficient,
it is a statistics package that includes a function to calculate it.

http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/python/Statistics/

Fidel
Reply all
Reply to author
Forward
0 new messages