How to create and use custom gene lists

29 views
Skip to first unread message

Gaj Stan (BIGCAT)

unread,
Jul 4, 2013, 5:34:47 AM7/4/13
to go-e...@googlegroups.com
Hello all,

I have trouble generating custom gene lists to analyze within GO-Elite v1.2.5-Py. The online documentation refers twice to a 'custom gene set' and mentions only once that it can be either a txt file that contains 2 columns or some other supported formats (i.e gpml, etc). However, I've tried several situations (explained further below), of which most end up with a "<FILE> not formatted properly" error.

My main aim here is to perform an ORA analysis on several custom-made gene lists that I generated. These genelists are based on EntrezGene IDs.

Below a few attempts that were made:
  1) After examining an existing relationship file in the /database/ directory (i.e. EntrezGene-KEGG.txt) I created a table with the same 3 column content (ID, empty column name (but is basically SystemCode) and OntologyID). An example output is shown below:

     EntrezGene        OntologyID
     1244    En    GeneList1
     182    En    GeneList1
     80270    En    GeneList1
     954    En    GeneList1
     9971    En    GeneList2
     1581    En    GeneList2
     4853    En    GeneList3
     ...

  After running GO-Elite everything seems to work fine (no error message was displayed on the screen) and a custom_gene_set.txt file was generated in a new folder. However: there were no results and the custom_gene_set.txt file was empty (with the exception of the three column headers).

  2) Per genelist a seperate file was created that contained only the ID and SystemCode. This resulted in a "<FILE> not formatted properly" error. Content of a file looks like this:
     EntrezGene    SystemCode
     1244    En
     182    En
      ...

  3) An alteration of (1) where the 2nd column was removed --> resulted again in an not formatted properly error.
  4) Followed the structure of the custom_gene_set.txt file, but then the error appeared that "1244 is not a valid systemcode".

Therefore, I assume that the first approach is the closest to the solution, but I wonder what I did wrong. Does anyone have any suggestions on how to proceed? Could it also be that custom genesets perhaps rely on EnsEMBL annotations and not EntrezGene?

Best,

  -- Stan

Gaj Stan (BIGCAT)

unread,
Jul 4, 2013, 5:44:32 AM7/4/13
to go-e...@googlegroups.com
Problem solved. SystemCode column was changed to "L" (which is EntrezGene") and it now works fine. Still remains odd that in the original database association files the "En" is retained for EntrezGene IDs (-:

Nathan Salomonis

unread,
Jul 4, 2013, 7:01:57 PM7/4/13
to go-e...@googlegroups.com
When the system codes were first created, Ensembl and En made good sense. However, since the name LocusLink switched to EntrezGene, it can now be much more easily confused.

Best,
Nathan
--
You received this message because you are subscribed to the Google Groups "GO-Elite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-elite+u...@googlegroups.com.
To post to this group, send email to go-e...@googlegroups.com.
Visit this group at http://groups.google.com/group/go-elite.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages