ClueGO Basics

Enton Funkme

unread,

Apr 10, 2015, 5:23:49 PM4/10/15

to cytoscape...@googlegroups.com

Hello everybody,
after RNAsequencing I analysed down- and upregulation of all transcripts in my samples and identified the genes. I am now trying to create a network in ClueGO to visualize all pathways that might contain genes that are downregulated whereas some other genes of the same pathway might be upregulated and pathways that contain genes either down- or upregulated. So it is basically a cluster comparison.
I have now tried to get along with ClueGO for one week and I still don't have too much of a clue :D
Well, I entered both of my datasets (up and down regulated genes) in ClueGO and chose different colours and shapes for the nodes (see attachment), but in the network, completely different colours are applied and no differences at all. Why is that? I read in the ClueGO documentation about it, I tried to do it as described but failed...
Furthermore I don't understand: What happens to the genes that are "not found for cluster #1". What does that mean? Are they not displayed because they don't fit into any term? What does that tell me about my data? I mean I have a dataset with ca. 70 genes in each cluster and only 6-4 genes per cluster appear in the network no matter how high or low I set the network specificity. The rest of the genes is "not found for any cluster".
My third question would be: If you look into the attachment - why is "visual perception" written in big letters and the other terms are not but nodes still have the same colour? I assume it is because it is the "leading term"... But leading what? The group of all terms of the the same colour? I chose "# genes/term" as the basis for the leading term but there are other terms, that also imply four genes, just like "visual perception".
Forth qeustion: What determines the thickness of an edge?
And the last question: How come that terms that are not connected at all in the "show pathways/terms" mode are suddenly connected when "show all genes from all pathways/terms" is selected?

I have to apologize, I am a beginner in ClueGO and I am very sorry for my maybe stupid questions!! I hope I gave you enough information to help me. I would very much appreciate it.

Thank you so very much in advance!
Jette

network example.JPG

Bernhard

unread,

Apr 13, 2015, 6:30:18 AM4/13/15

to cytoscape...@googlegroups.com

Hi Jette,
I will directly answer after your questions:

On Friday, April 10, 2015 at 11:23:49 PM UTC+2, Enton Funkme wrote:

Hello everybody,
after RNAsequencing I analysed down- and upregulation of all transcripts in my samples and identified the genes. I am now trying to create a network in ClueGO to visualize all pathways that might contain genes that are downregulated whereas some other genes of the same pathway might be upregulated and pathways that contain genes either down- or upregulated. So it is basically a cluster comparison.
I have now tried to get along with ClueGO for one week and I still don't have too much of a clue :D
Well, I entered both of my datasets (up and down regulated genes) in ClueGO and chose different colours and shapes for the nodes (see attachment), but in the network, completely different colours are applied and no differences at all. Why is that? I read in the ClueGO documentation about it, I tried to do it as described but failed...

- The view with different colours is the first view ClueGO creates and it shows grouped functions. So to each group based on the selection you made a different colour and a leading term (the most significant/or the one with the most genes associated (most of the time like you saw this term remains the same)) is applied. If you want to see the difference between the two clusters (list of genes) you have to check in the "View Style Settings" the "Cluster" option to see the comparison colours you selected for the both groups. Checking "Significance" you will see the most significant enriched terms (check the "?" icon for the legend explanation).

Furthermore I don't understand: What happens to the genes that are "not found for cluster #1". What does that mean? Are they not displayed because they don't fit into any term? What does that tell me about my data? I mean I have a dataset with ca. 70 genes in each cluster and only 6-4 genes per cluster appear in the network no matter how high or low I set the network specificity. The rest of the genes is "not found for any cluster".

- "not found for cluster #1" means in fact the ids you pasted/loaded are not found in any annotation ClueGO currently has. After your screen-shot it seems that you work with Ensembl transcript ids for Danio Rerio, it is quite likely that your annotation is not updated since 2012. I put a new updated Ensembl annotation on the webserver you can download it by clicking next to the "# Automatic #" option in the menu "Load Marker List(s)". There are two files to download. Check both and download them. Now most of your ids must be found! ClueGO uses as internal ids EntrezGene ids so if your EnsemblID doesn't match any EntrezGeneID it will not be found and hence also not mapped to any GO term.

My third question would be: If you look into the attachment - why is "visual perception" written in big letters and the other terms are not but nodes still have the same colour? I assume it is because it is the "leading term"... But leading what? The group of all terms of the the same colour? I chose "# genes/term" as the basis for the leading term but there are other terms, that also imply four genes, just like "visual perception".

- The leading term is the most significant term of a group of terms (the colour if "Groups" is selected is the same for all terms in a functional group).

Forth qeustion: What determines the thickness of an edge?

- The edge thickness is base on the KappaScore value, it is kind of a similarity measure (like correlation) for two terms regarding their shared genes. The more genes are/are not shared the higher/lower is the score (can be between 1 and -1) see here for more: http://en.wikipedia.org/wiki/Cohens_kappa

And the last question: How come that terms that are not connected at all in the "show pathways/terms" mode are suddenly connected when "show all genes from all pathways/terms" is selected?

- If you select this view the KappaScore links are removed and the links between Terms and Genes are displayed. Make sure if you want to use this feature, that you selected (on the left side in red) under "CluePedia Options" a certain number e.g. 1000 (that means the first 1000 annotated gene will be added to the network for each term, there are some terms in GO that have much more than e.g. 1000 genes annotated an it can make ClueGO quite slow if you want to show them all) genes that will be displayed. Here the edge thickness between Term and Gene is base on the GO evidence code, so Experimental validated genes will have a thick line and genes with e.g. IEA (electronically inferred) will have a thin line. You can show the evidence code checking the corresponding button in the menu.

I have to apologize, I am a beginner in ClueGO and I am very sorry for my maybe stupid questions!! I hope I gave you enough information to help me. I would very much appreciate it.

I hope it is a bit more clear now. If still some ids are not found we can manually fix this by adding corresponding annotations to ClueGO if available.

Best
Bernhard

Enton Funkme

unread,

Apr 13, 2015, 3:21:05 PM4/13/15

to cytoscape...@googlegroups.com

Hello Bernhard,
thank you so much for your detailed reply - it did help me a lot! Especially the updated Ensembl annotation was a good advice. I did that and now much more genes are found. Also the "Cluster" selection in the "View Style Settings" brought up the result I had expected.
Still I don't understand, why depending on the network specificity some genes are found for a cluster (meaning its annotation were found as I understood from your reply, right?) and as soon as I change the Specificity totally different genes are found for the cluster and others disappear... The network specificity tells me on which level the groups are created, right? So "global" means groups are mainly about basic stuff just as e.g. DNA-binding, Ion channels and so on and "detailed" refers to something like hydrolase activity on esters. Or did I get it wrong? How do genes, that are found in the detailed setting, disappear in the global setting?

Thanks again for your help I so much appreciate it!!
Jette

Enton Funkme

unread,

Apr 13, 2015, 6:29:05 PM4/13/15

to cytoscape...@googlegroups.com

Hello Bernhard,
I think I figured it out now, more or less. It would be great though, if you would find an answer to my last question as well!
I also wonder, if there is any option to include expression values into a network. I mean, based on the parameter settings certain genes won't appear in the network, because they are just one gene in a high amount of huge pathways. But still, assuming that my expression value for that gene is very high, it would be fine to see the gene appear in the network based on that high value (in comparison to all the other genes in the network). So is there a way to integrate expression values into the network?

Thank you so much again!
Jette

Bernhard

unread,

Apr 14, 2015, 5:21:32 AM4/14/15

to cytoscape...@googlegroups.com

Hi Jette,
the network specificity is just a simple first adjustment to see either very general terms or very specific terms. So this adjustment slider changes the option GO level, min genes per term and min % of genes per term. This means that if look at very general terms you won't see very specific ones (so you loose them) and vice versa because the allowed GO levels are changing like 2-4 for general terms and 8-14 for very specific ones. You should use something like: GO level from min 3 and max 14 (just to remind you that GO terms can be in several levels so selecting e.g. 3 would also take terms that are in levels 2,3,4; so it is an 'or' selection), like this most oft the GO terms (except very general ones) will be considered for analysis. Then select the percentage and number of genes per term (these pre-selection avoids that you will end up with too much terms in the network). This selection depends on the initial gene list(s) you paste/load in. So if you take e.g. the 200 genes the most up-regulated and the 200 the most down-regulated then I would suggest to put a min of 3or4 genes per term and a min of 4%. This should make you reasonable network (of course it depends also on the selected genes and how they map to terms). If the size of two lists is unequal, then you should restrict the larger list a bit more or vice versa to get an equal mapping (if not you could end up with just e.g. red terms for list one and no green terms for list two). So ClueGO assumes that you make an initial choice of e.g. up and down regulated genes and then you have to adjust those parameters a bit (several trials) to get an idea what goes on in your biological sample. We don't have yet an option to integrate also the expression of a gene in to the enrichment, but what you can do is to load a dataset (right down panel "open experimental data") and map it then on the genes shown on the network. The data format can be several id column and then data columns (all columns that contain only numerical data will be considered data so be careful with ids that are numbers only, you have to exclude them then from visualisation). The data point for each sample will them be visualised in a circular clockwise way. You can the also log transform the values if needed, filter them, or make the mean over all or predefined groups (can be loaded separately) of samples.
I hope my explanations helped
Best
Bernhard

Enton Funkme

unread,

Apr 14, 2015, 7:00:03 PM4/14/15

to cytoscape...@googlegroups.com

Hello Bernhard,
thank you so much for all your help!! I got now, how to apply the specificity slider.
I also tried to include "experimental data" as you suggested to visualize the expression level of my genes. It worked out just as I wanted it to be. ClueGo indeed is a awesome software!

So thanks again, I am so lucky that you helped me so kindly! I hope your explanations will also help some other people.
Best regards, Jette

Reply all

Reply to author

Forward