multiple genes per locus for an XLOC id???

583 views
Skip to first unread message

Rachel Raynes

unread,
Oct 23, 2014, 8:35:45 PM10/23/14
to tuxedo-to...@googlegroups.com
Hello,
I am using Cuffdiff to analyze C. elegans RNA seq results.  I finally made it through and upon analysis have found that there are a few occasions when an XLOC id represents a locus that includes 2 or more different genes.  How do I resolve this problem?  Is this a problem in the code or do I include both genes in my gene list?

Thanks!
Rachel

Nathan Johnson

unread,
Feb 3, 2015, 10:49:59 AM2/3/15
to tuxedo-to...@googlegroups.com
I'm running into the same problem with no solution currently.


What I understand from reading is Cuffdiff merges genes that are really close as one transcript (<50 bp) then runs the statistical analysis to test for differences.  Have you found any solution?  I've run all of the relevant options in cuffdiff with no change.  

Rachel Raynes

unread,
Feb 3, 2015, 12:56:43 PM2/3/15
to tuxedo-to...@googlegroups.com
Hi Nathan.  I've spoken to A LOT of people about this.  What was recommended to me, at least in the case of C. elegans, is to list the most upstream gene first and put the one that follows in parentheses.  I then created a column in my supplementary data file to distinguish whether or not the genes listed together are in an operon.  As expected, they had SL2 sites 90% of the time, which is probably why they were sorted together.  The scientists I've spoken to about this seem to be content with the way that I've handled this issue.  Haven't submitted it for publication yet, so we'll have to wait to see what the reviewers say.  FYI, I also make sure to keep the column with the locus in the supp data file so anyone can look the region up for themselves.

A lab at my University that does RNA sequencing with human samples also runs into this issue, but less so it appears.  They told me that they just merge them as a "super gene".  Not sure what that entails exactly, but I imagine they also just list them together in a similar manner.  

Let me know if you find a better solution!  

Nathan Johnson

unread,
Feb 4, 2015, 2:29:27 PM2/4/15
to tuxedo-to...@googlegroups.com
I've run data from dog, rat, mouse, and human, which all give the similar problem.  The only thing I have found consistent is the close distance between genes.  These are also genes that have a known sequence structure as per the .gtf file, so the program is aware of their existence.  


I have found a blog post about the topic where an author of Cufflinks: Dr. Trapnell posted a response concerning the problem.  However, his solution is not present in the existing software.  But he at least explains what is happening.  

http://seqanswers.com/forums/showthread.php?t=20702&page=2 


I will definitely let you know if I find a solution!

Rachel Raynes

unread,
Feb 4, 2015, 2:45:25 PM2/4/15
to tuxedo-to...@googlegroups.com
I see. If this problem arrises consistently due to the closeness of the genes, then it makes sense that when this happens in C. elegans the gene is usually in an operon.  If this is such a persistent problem, I wonder why I never see it in publications with RNA sequencing data???  If I run across a solution, I will also let you know. :)
Message has been deleted

Nathan Johnson

unread,
Feb 7, 2015, 10:48:21 AM2/7/15
to tuxedo-to...@googlegroups.com
I found the reason why its occurring finally!

The problem arises during cuffmerge.  If I use the original .gtf file supplied by Ensembl, I do not see the pattern, but if I use cuffmerge to re-make the .gtf file then I do.  From looking at the two .gtf files manually, I do not see anything obviously different.  I'm in the process of writing a program to diagnose and fix the problem.  Shoot me a private e-mail if you would like me to share.

José Basílio

unread,
Jun 28, 2018, 10:08:34 AM6/28/18
to Tuxedo Tools Users
Dear Nathan,

I am running into the same problem with human and mouse (multiple genes per locus for an XLOC id) . Have you wrote the program to diagnose and fix this problem. Would you please share it with me?
Thanks a lot.
With Best Regards,

José
Reply all
Reply to author
Forward
0 new messages