Agilent data input into BINGO

587 views
Skip to first unread message

neuman

unread,
Mar 17, 2009, 6:10:53 AM3/17/09
to cytoscape-helpdesk
Hi all,

Im using Cytoscape v 2.6.1 with BINGO plugin. I have a problem using
my Agilent G4122F (4x44k arrays) mouse array data as input in BINGO. I
selected either the ProbeID, TargetID or GeneSymbol such as Dpysl3,
Pa2g4, Unc45a into BINGO, but it doesnt give me a network for the gene
ontology. Instead BINGO gives me following message: "The selected
annotation does not produce any classification for the selected nodes.
Maybe you chose the wrong type of gene identifier ?"

I might have the wrong format for my input data. Is there a program or
a database to convert/translate my Agilent data into a readable BINGO
format or gene names?

Hope you guys can help me.


Regards
Khoa Nguyen Do, Cand. Scient
Research assistant
Nutritional Immunology Group (NIG)
Center for Biological Sequence Analysis (CBS)
Technical University of Denmark (DTU)
Søltofts Plads bldg. 224
2800 Kgs. Lyngby
Phone +45 4525 2784
email: kh...@bio.dtu.dk



allan_k...@agilent.com

unread,
Mar 17, 2009, 11:47:10 PM3/17/09
to cytoscape...@googlegroups.com
Does your array data have a column with Gene Symbols? The three gene symbols you provide as examples seem to be just fine. I tested with the Agilent Literature Search tool -- all three genes had aliases in the lexicon Agilent Literature Search keeps for Mus Musculus. Also, I was able to generate a network using them as search terms.

Is it possible that there are case sensitivity issues with BiNGO? Can you try maybe running your inquiry to BiNGO using lower case gene symbols and see if you get the same lack of results?

AllanK

______________________________

Allan Kuchinsky
Agilent Technologies
5301 Stevens Creek Blvd. Mailstop 54U/SC
Santa Clara, CA, 95051
* phone (408) 553-2423
* mailto:allan_k...@agilent.com

swaraj basu

unread,
Mar 18, 2009, 3:21:52 AM3/18/09
to cytoscape...@googlegroups.com
Hello,
         I think the problem lies in providing the correct GO annotation file for your Bingo run. In the bottom of your Bingo box there is a drop down menu of select organism/annotation. This asks for nothing but a list of Gene or protein identifiers in one column and their corresponding GO ID in another column. For your own data you should have one such custom annotation file. Please check http://www.psb.ugent.be/cbd/papers/BiNGO/annotations.htm for further information. I hope I have been of some help to you. Please do ask if any further issue comes across.
                                                               Swaraj

neuman

unread,
Mar 18, 2009, 10:03:22 AM3/18/09
to cytoscape-helpdesk
Thanks you guys, it help me, but I still strugles with some problems.

>>AllanK
Yes, I got a column with GeneSymbols, but also a ProbeID (e.g.
A_51_P115147, A_51_P115441, A_51_P119923) and TargetID (e.g. AK087419,
NM_009468, NM_011119) column. As you can see is the ProbeID more
consistent comparing to the TargetID and GeneSymbol column. But never
the less I tried to run me GeneSymbols with the Agilent Literature
Search Tool, it gave me a network, but the nodes was entitled with
other gene or proteins names instead. Do you have an explanation for
this?

I want to run BINGO with my ProbeIDs, but cant run it as like they are
now, can I convert/translate it to a different format, so BINGO can
udnerstand it?

I tried to run BINGO with lower case gene symbols, but it gave me a
network with only 7 genes included out of 130 genes. Do you know why?


>>Swaraj
I dont understand your reply.
"...your Bingo box there is a drop down menu of select organism/
annotation. This asks for nothing but a list of
Gene or protein identifiers in one column and their corresponding GO
ID in another column."
The BINGO box doesnt tell you to chose gene or proteins identifiers
and their corresponding GO ID, it only wants you to choose an
organism.

And I dont understand to translate my data to the correct GO
annotation, can you help me?

Khoa



On 18 Mar., 04:47, <allan_kuchin...@agilent.com> wrote:
> Does your array data have a column with Gene Symbols?  The three gene symbols you provide as examples seem to be just fine.  I tested with the Agilent Literature Search tool -- all three genes had aliases in the lexicon Agilent Literature Search keeps for Mus Musculus.  Also, I was able to generate a network using them as search terms.
>
> Is it possible that there are case sensitivity issues with BiNGO?  Can you try maybe running your inquiry to BiNGO using lower case gene symbols and see if you get the same lack of results?
>
> AllanK
>
> ______________________________
>
> Allan Kuchinsky
> Agilent Technologies
> 5301 Stevens Creek Blvd.  Mailstop 54U/SC
> Santa Clara, CA, 95051
> * phone (408) 553-2423
> * mailto:allan_kuchin...@agilent.com
> email: kh...@bio.dtu.dk- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

allan_k...@agilent.com

unread,
Mar 18, 2009, 11:25:24 AM3/18/09
to cytoscape...@googlegroups.com
The gene symbols you are using are probably aliases for the formal gene name. Agilent Literature Search keeps a lexicon (a dictionary) of gene names for each species. These are based upon Entrez Gene names. Each entry consists of a formal gene name and a list of its aliases. Agilent Literature Search will match a gene for any alias you give it and will assign the formal name for the gene as the node ID. If you hover the mouse over the node, you will see a tooltip with the names of the aliases. Your search terms should be among the aliases.

AllanK

______________________________

Allan Kuchinsky
Agilent Technologies
5301 Stevens Creek Blvd. Mailstop 54U/SC
Santa Clara, CA, 95051
* phone (408) 553-2423
* mailto:allan_k...@agilent.com

swaraj basu

unread,
Mar 19, 2009, 12:48:15 AM3/19/09
to cytoscape...@googlegroups.com
Hello,
          For your list of gene names which you have uploaded to cytoscape you have to prepare a file which classifies each node name of your network as per GO ontology.
For ex I deal with a hypothetical example BRCA1 gene. Suppose I have uploaded a network in which the BRCA1 gene has the node ID as BOO12(just hypothetical). Now if we check GO ontology for BRCA1 we will find it gives many results from all three categories of Biological process, Molecular Function, Cellular component.
so we have BRCA1 Biological Process DNA damage response, signal transduction resulting in induction of apoptosis and its GO ID GO:0008630 .
                  BRCA1 Cellular Component
condensed chromosome and its GO ID GO:0000793
                  BRCA1 Molecular Function  
damaged DNA binding    and its GO ID GO:0003684......and so on.
Actually there are around 68 results for BRCA1. Now if I have to prepare a bingo annotation file from this I will open a text document and write as follows
(species=YourSpecies)(type=Full)(curator=GO)
 
BOO12  GO:0008630
  BOO12   GO:0000793
  BOO12   GO:0003684

.....like this all GO annotations. I have done it for just one gene you have to do this for all the genes present in your network in the same manner and put it all in a single file. Then you just have to save the file with any suitable name suppose myGoannotation.db and keep it in any folder. Now open up Cytoscape->BINGO and go to select organism-> select custom and browse and upload your file. Now select all the nodes of your network and in the BINGO window give a name for the cluster under the heading Cluster...for ex myrun. Now you are ready to run the software and get the results. I hope this is of some help to you, please do ask if any further doubts come across.
                                                                                                         Swaraj
                                                              

swaraj basu

unread,
Mar 19, 2009, 1:25:06 AM3/19/09
to cytoscape...@googlegroups.com
Hello,
          As far as translating your data to GO annotation you have to first upload your network into cytoscape keeping the GENE symbols as primary IDs. Then the same gene symbols can be uploaded into an online GO classification tool to get results of GO IDs for all of them. The results need to be programmaticaly parsed to prepare a BINGO annotation file. I will look out for a suitable application and post to you.
                                                         Swaraj

neuman

unread,
Mar 19, 2009, 11:05:03 AM3/19/09
to cytoscape-helpdesk
I appreciate your help guys, but needs some more help.

>>AllanK
I can see when I hover the mouse over the node, there appears aliases
up of the gene names, but not those ones I have inserted as input.
Another thing is that I ran 21 of my genes with Agilent Literature
Search, 53 nodes where included in the network, but only 4 of my genes
appears as nodes (the turned yellow). What is the explanation for
that? The other thing is that when is save the network and open it
again the aliases dont turned up when I hover the mouse over the node,
why is that? Does is just disappear?

I have approximately 35000 genes symbols, which I want to make a
network to see which pathway the different genes are connected to each
other. But the Agilent Literature Search is limited to 100 enquiries,
is there a way to overcome this?

I tried to insert my 35000 gene symbols into the "Terms" box, it took
approximately 3 hours with my "super fast computer". It will probably
take a hole lot more to run the enquiries, but the limit is 100
enquiries.
I want to buy a new computer for this purpose, I look at this
computer:
Dell, Intel core 2 quad 3GHz, 4.0GB memory, 250 GB hard drive. Do you
think that it should be efficient enough to run these kind of data and
provide me these networks? Or a bigger computer necessary?

>>Swaraj
Now it gives more sense. But as mentioned above I have 35000 genes, it
will take a lot of time to do as you are saying. Do you think it is
possible to make a program, that will give these GO annotations?


On 18 Mar., 16:25, <allan_kuchin...@agilent.com> wrote:
> The gene symbols you are using are probably aliases for the formal gene name.  Agilent Literature Search keeps a lexicon (a dictionary) of gene names for each species.  These are based upon Entrez Gene names.  Each entry consists of a formal gene name and a list of its aliases.  Agilent Literature Search will match a gene for any alias you give it and will assign the formal name for the gene as the node ID.  If you hover the mouse over the node, you will see a tooltip with the names of the aliases.  Your search terms should be among the aliases.
>
> AllanK
>
> ______________________________
>
> Allan Kuchinsky
> Agilent Technologies
> 5301 Stevens Creek Blvd.  Mailstop 54U/SC
> Santa Clara, CA, 95051
> * phone (408) 553-2423
> > - Vis tekst i anførselstegn -- Skjul tekst i anførselstegn -

allan_k...@agilent.com

unread,
Mar 19, 2009, 12:38:02 PM3/19/09
to cytoscape...@googlegroups.com
The way the literature search tool works is that it first retrieves a set of abstracts based upon the search terms given. Then it parses each abstract and, in each sentence of each abstract it looks for noun-verb phrases. For the noun verb phrases, any gene in the lexicon can match a noun in the sentence, not just the original search terms. So you can get many interactions that don't specifically include your search terms, just that one or more of your search terms will co-occur in the same article.

How are you saving your network? You need to export it as an Agilent Literature Search network and then import it again as an Agilent Literature Search network. If you just save the network as a SIF file, then information about aliases, search terms, etc. will not be persisted.

Unfortunately, the limit of 100 search terms cannot be exceeded. This limit is due to PubMed policy for accessing their search engine programmatically. In order to keep their servers from being overwhelmed, NCBI have established certain limits on number of queries, number of total hits, and other search parameters.

AllanK

______________________________

Allan Kuchinsky
Agilent Technologies
5301 Stevens Creek Blvd. Mailstop 54U/SC
Santa Clara, CA, 95051
* phone (408) 553-2423
* mailto:allan_k...@agilent.com

swaraj basu

unread,
Mar 20, 2009, 1:58:48 AM3/20/09
to cytoscape...@googlegroups.com
Hello,
         Yes for 35,000 the GO IDs can be generated but it is a long and tedious procedure, as far as I know agilent life sciences must be having precomputed GO annotations in their website, otherwise a search has to made for an application which will convert all the gene names to their respective GO IDs. We in our place rely on BLAst2GO but it takes up only sequences of genes and proteins, if you can get the list of all the sequences of the genes and compile them in a text file in fasta format with the gene short name as heading for each sequence then it can be run on Blast2Go. Otherwise I hope Cytoscape has some inherent plugin which can do the job better. The topic is open for discussion.
                                                                               Swaraj

swaraj basu

unread,
Mar 20, 2009, 6:14:08 AM3/20/09
to cytoscape...@googlegroups.com
Hello,
          You can go to http://www.chem.agilent.com/en-US/products/instruments/dnamicroarrays/pages/gp58802.aspx, in this page the txt file for your array is present along the label

Mouse Genome, Whole - Four-Plex

                                                               

Download the text file and save it in an excel sheet. Open it as network using cytoscape->import network from table. Only load the Probe ID as source interaction. Then again open the same excel sheet through cytoscape->import attribute from table. Select ProbeID column of table as map id. Now your attributes are uploaded. Install BioMart plugin in cytoscape. Now go to file->import attributes from biomart. In the biomart window Datasource must be ENSEMBL 53 (Mus musculus), attribute should be target ID and data type should be refseq DNA Ids. In the bottom attribute section select all the boxes with GO Description and GO ID. Now Import and in a while you will have a new attribute of GO description and GO ID in your cytoscape session. These along with Probe ID can be copied and a short script can be run to parse them into an annotation file suitable for bingo.

Hope this is of help to you.

                                         Swaraj


neuman

unread,
Mar 20, 2009, 10:10:54 AM3/20/09
to cytoscape-helpdesk
Hi guys,

>>AllanK
Yes, I thought so. The Agilent Literature Search Tool isnt that Im
being looking for. I want to get my 35000 genes to make a network,
which states which pathways they belongs, so I cant use this tool.

>>Swaraj
I tried to follow your protocol, but didnt manage to run this mouse
genome file. When I imported the excel sheet into excel, the filename
appears red in the control panel, shouldnt it appear yellow?
After that I import the excel sheet through cytoscape->import
attribute from table. Then I should select my ProbeID column as map
id, but cant see where to put it as a map id. Is it just to put a
check mark on the ProbeID column?
No new attribute of GO description and GO ID appears in my cytoscape
section....
Hope you can help me with that one.

Have a nice weekend
khoa




On 19 Mar., 17:38, <allan_kuchin...@agilent.com> wrote:
> The way the literature search tool works is that it first retrieves a set of abstracts based upon the search terms given.  Then it parses each abstract and, in each sentence of each abstract it looks for noun-verb phrases.  For the noun verb phrases, any gene in the lexicon can match a noun in the sentence, not just the original search terms.  So you can get many interactions that don't specifically include your search terms,  just that one or more of your search terms will co-occur in the same article.
>
> How are you saving your network?  You need to export it as an Agilent Literature Search network and then import it again as an Agilent Literature Search network.  If you just save the network as a SIF file, then information about aliases, search terms, etc. will not be persisted.
>
> Unfortunately, the limit of 100 search terms cannot be exceeded.  This limit is due to PubMed policy for accessing their search engine programmatically.  In order to keep their servers from being overwhelmed, NCBI have established certain limits on number of queries, number of total hits, and other search parameters.
>
> AllanK
>
> ______________________________
>
> Allan Kuchinsky
> Agilent Technologies
> 5301 Stevens Creek Blvd.  Mailstop 54U/SC
> Santa Clara, CA, 95051
> * phone (408) 553-2423

swaraj basu

unread,
Mar 21, 2009, 12:14:38 AM3/21/09
to cytoscape...@googlegroups.com
Hello,
          I perfectly understand your problems but it is a bit difficult to help you sort them out by email, hence if you can get yourself familiar with input output functioning of cytoscape through a screenshot tutorial I think you will get your answers. Once you are able to upload your network and attributes you will have no problem in using the Biomart plugin. One more thing is that for BINGO to give you results ultimately you need to have a network of proteins that is nodes connected by numerous edges. without a pre-developed network BINGO wont function properly. If you want to get a network from your probes then I suggest you upload your probes and use the BioNet builder plugin to build a network. http://err.bio.nyu.edu/cytoscape/bionetbuilder/tutorial/.
                                                                                                                                                                Swaraj
Reply all
Reply to author
Forward
0 new messages