Adding new organism--file format

kve...@gmail.com

unread,

Aug 23, 2018, 11:50:29 AM8/23/18

to cytoscape-helpdesk

Hi,

I would like to add my organism to ClueGO. I was informed that I would need to submit the gene set in GO Annotation File format. The genes have not been assigned GO terms. Can I still add my organism? If so, is there a placeholder I should use in the empty GO-related columns?

Thanks,

Karen

Alex Pico

unread,

Aug 23, 2018, 4:17:57 PM8/23/18

to cytoscape...@googlegroups.com, Bernhard

Cc’ing one of the authors of the ClueGO app...

- Alex

--
You received this message because you are subscribed to the Google Groups "cytoscape-helpdesk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cytoscape-helpd...@googlegroups.com.
To post to this group, send email to cytoscape...@googlegroups.com.
Visit this group at https://groups.google.com/group/cytoscape-helpdesk.
To view this discussion on the web visit https://groups.google.com/d/msgid/cytoscape-helpdesk/7ac8e82f-3f2b-4046-8057-42b9a0e5cbad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bernhard

unread,

Aug 25, 2018, 7:11:37 AM8/25/18

to cytoscape-helpdesk

Hi Karen,

you can add an Organism also without GO or KEGG annotations. You will have to provide 2 files:

1. a gene2accession file where you define your gene ids and symbol names (it is important to have a UniqueID#MyID and a SymbolID column to make it work) and

2. an organism properties file where you set your organism's information.

You can copy those 2 files from the Organism_Homo Sapiens folder and create you own organism folder in a similar way. It is important to know that the organism name you chose is case sensitive and it has to be the same all over!

Let's say you chose 'My Custom organism' as name you have to create a folder called:

'Organism_My Custom organism' in the /YourUserHomeFolder/ClueGOConfiguration/v2.5.1/ClueGOSourceFiles (in windows use \ instead of /) folder.

Then create with e.g. Excel or LibreOffice the two files

1. 'My Custom organism.properties' (you can copy this file from the human folder and modify it with your information).

Replace the human information with this: (use # to skip lines)

##### General Settings #############################################################
organism.name = My Custom organism

#set here your right taxonomy id if you know it

organism.taxid = 999999
#organism.reactome.id = My Custom organism
#organism.wikipathway.id = My Custom organism
####################################################################################

##### KEGG options #################################################################
#organism.kegg.name = hsa

## If you have a KEGG ortholog column set 'organism.kegg.name' to 'ko' and specify the column name with the Orthology ID ids e.g. K11594!
#organism.kegg.name = ko
# e.g. id.column.to.use.for.kegg.name = KO_id

## If you have ids that are not fitting with the KEGG internal ids, set here the right id column
# e.g. id.column.to.use.for.kegg.name = EnsemblGeneID
####################################################################################

####################################################################################

##### Settings for Gene2Accession update ###########################################
## If you uncomment 'gene.info.url' and 'gene2accession.url' then automatically UniProtKB_AC will be created as the main ID in ClueGO

## Uncomment the appropriate line and also the gene2accession.url if you know that your organism is annotated in EntrezGene
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/All_Data.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/All_Mammalia.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Archaea_Bacteria/All_Archaea_Bacteria.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Fungi/All_Fungi.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Invertebrates/All_Invertebrates.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Non-mammalian_vertebrates/All_Non-mammalian_vertebrates.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Organelles.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Plants/All_Plants.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Plasmids.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Protozoa/All_Protozoa.gene_info.gz
#gene.info.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Viruses/All_Viruses.gene_info.gz

#gene2accession.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
#gene2accession.url = file:///home/berni/Download/gene2accession.gz
#gene2ensembl.url = ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz
####################################################################################

and 2. 'My Custom organism.gene2accession_2018.08.25.txt' in Excel

UniqueID#MyGeneID	SymbolID	OtherIDs
1	NAT1\|AAC1\|MNAT\|NAT-1\|NATI	AA\|BB
2	NAT2\|AAC2\|NAT-2	CC
3	NATP	DD

The first ID in SymbolID e.g. NAT1 will be shown on the network. UniqueID#MyGeneID and SymbolID are mandatory! Otherwise it will not work!

If you can not manage to create the files let us know and we will create them for you.

Best

Karen Vellacott-Ford

unread,

Aug 30, 2018, 4:46:11 PM8/30/18

to cytoscape...@googlegroups.com

Thank you for your previous response. I have a follow up question:

I read in the FAQ section that the files are based on information from KEGG, GO, Uniprot-GOA annotations, and NCBI; however, I don’t think the Hessian fly (my organism) gene set has been submitted to these databases. The Official Gene Set files (GFF and fasta files with the nucleic acid and translated amino acid gene sequences) are available on the i5k workspace:

https://i5k.nal.usda.gov/sites/default/files/data/Arthropoda/maydes-%28Mayetiola_destructor%29/Current%20Genome%20Assembly/2.Official%20or%20Primary%20Gene%20Set/OGS1.0/hf_ogs.gff

https://i5k.nal.usda.gov/sites/default/files/data/Arthropoda/maydes-%28Mayetiola_destructor%29/Current%20Genome%20Assembly/2.Official%20or%20Primary%20Gene%20Set/OGS1.0/hf_OGS1.0.cds.fasta

https://i5k.nal.usda.gov/sites/default/files/data/Arthropoda/maydes-%28Mayetiola_destructor%29/Current%20Genome%20Assembly/2.Official%20or%20Primary%20Gene%20Set/OGS1.0/hf_OGS1.0.pep.fasta

Is it still possible for me to add the Hessian fly gene set to ClueGO?

Thanks again,

Karen

--
You received this message because you are subscribed to a topic in the Google Groups "cytoscape-helpdesk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cytoscape-helpdesk/UKl_kB1hYao/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cytoscape-helpd...@googlegroups.com.

To post to this group, send email to cytoscape...@googlegroups.com.
Visit this group at https://groups.google.com/group/cytoscape-helpdesk.

To view this discussion on the web visit https://groups.google.com/d/msgid/cytoscape-helpdesk/554c3e55-0d14-496a-af68-9b3508fea407%40googlegroups.com.

kve...@gmail.com

unread,

Oct 18, 2018, 1:29:30 PM10/18/18

to cytoscape-helpdesk

I'm sorry if my question was already answered in your post. I just need to know whether it's okay if I make up these gene IDs for the first time or if they must already be connected to the gene sequences in a database.

Reply all

Reply to author

Forward