Hi Jette,
I will directly answer after your questions:
On Friday, April 10, 2015 at 11:23:49 PM UTC+2, Enton Funkme wrote:
Hello everybody,
after RNAsequencing I analysed down- and
upregulation of all transcripts in my samples and identified the genes. I
am now trying to create a network in ClueGO to visualize all pathways
that might contain genes that are downregulated whereas some other genes
of the same pathway might be upregulated and pathways that contain
genes either down- or upregulated. So it is basically a cluster
comparison.
I have now tried to get along with ClueGO for one week and I still don't have too much of a clue :D
Well,
I entered both of my datasets (up and down regulated genes) in ClueGO
and chose different colours and shapes for the nodes (see attachment),
but in the network, completely different colours are applied and no
differences at all. Why is that? I read in the ClueGO documentation
about it, I tried to do it as described but failed...
- The view with different colours is the first view ClueGO creates and it shows grouped functions. So to each group based on the selection you made a different colour and a leading term (the most significant/or the one with the most genes associated (most of the time like you saw this term remains the same)) is applied. If you want to see the difference between the two clusters (list of genes) you have to check in the "View Style Settings" the "Cluster" option to see the comparison colours you selected for the both groups. Checking "Significance" you will see the most significant enriched terms (check the "?" icon for the legend explanation).
Furthermore I
don't understand: What happens to the genes that are "not found for
cluster #1". What does that mean? Are they not displayed because they
don't fit into any term? What does that tell me about my data? I mean I
have a dataset with ca. 70 genes in each cluster and only 6-4 genes per
cluster appear in the network no matter how high or low I set the
network specificity. The rest of the genes is "not found for any
cluster".
- "not found for
cluster #1" means in fact the ids you pasted/loaded are not found in any annotation ClueGO currently has. After your screen-shot it seems that you work with Ensembl transcript ids for Danio Rerio, it is quite likely that your annotation is not updated since 2012. I put a new updated Ensembl annotation on the webserver you can download it by clicking next to the "# Automatic #" option in the menu "Load Marker List(s)". There are two files to download. Check both and download them. Now most of your ids must be found! ClueGO uses as internal ids EntrezGene ids so if your EnsemblID doesn't match any EntrezGeneID it will not be found and hence also not mapped to any GO term.
My third question would be: If you look into the attachment
- why is "visual perception" written in big letters and the other terms
are not but nodes still have the same colour? I assume it is because
it is the "leading term"... But leading what? The group of all terms of
the the same colour? I chose "# genes/term" as the basis for the leading
term but there are other terms, that also imply four genes, just like
"visual perception".
- The leading term is the most significant term of a group of terms (the colour if "Groups" is selected is the same for all terms in a functional group).
Forth qeustion: What determines the thickness of an edge?
- The edge thickness is base on the KappaScore value, it is kind of a similarity measure (like correlation) for two terms regarding their shared genes. The more genes are/are not shared the higher/lower is the score (can be between 1 and -1) see here for more: h
ttp://en.wikipedia.org/wiki/Cohens_kappaAnd
the last question: How come that terms that are not connected at all in
the "show pathways/terms" mode are suddenly connected when "show all
genes from all pathways/terms" is selected?
- If you select this view the KappaScore links are removed and the links between Terms and Genes are displayed. Make sure if you want to use this feature, that you selected (on the left side in red) under "CluePedia Options" a certain number e.g. 1000 (that means the first 1000 annotated gene will be added to the network for each term, there are some terms in GO that have much more than e.g. 1000 genes annotated an it can make ClueGO quite slow if you want to show them all) genes that will be displayed. Here the edge thickness between Term and Gene is base on the GO evidence code, so Experimental validated genes will have a thick line and genes with e.g. IEA (electronically inferred) will have a thin line. You can show the evidence code checking the corresponding button in the menu.
I have to apologize, I
am a beginner in ClueGO and I am very sorry for my maybe stupid
questions!! I hope I gave you enough information to help me. I would
very much appreciate it.
I hope it is a bit more clear now. If still some ids are not found we can manually fix this by adding corresponding annotations to ClueGO if available.
Best
Bernhard