psg output using UCSC known gene as gtf input

34 views
Skip to first unread message

Zhenhua Wu

unread,
Aug 30, 2013, 2:33:44 PM8/30/13
to psginfe...@googlegroups.com
Hi, I used UCSC known gene as gtf input, the output file: gene.results.txt is like this:

gene_id chrom   start   end     strand  num_transcripts num_paths       num_parameters  num_reads       log_likelihood  parameters
uc001aaa.3      chr1    11873   14409   +       1       1       0       153     -5645.251773    1,1,1,1
uc001aac.4      chr1    14361   29370   -       1       1       0       1310    -43392.9912489  1,1,1,1,1,1,1,1,1,1,1,1
uc001aae.4      chr1    14361   19759   -       1       1       0       2151    -65180.4352549  1,1,1,1,1,1,1,1,1,1,1
uc001aah.4      chr1    14361   29370   -       1       1       0       1672    -53253.3490026  1,1,1,1,1,1,1,1,1,1,1,1
uc001aai.1      chr1    16857   19759   -       1       1       0       1239    -34201.3210665  1,1,1,1,1,1,1
uc001aak.3      chr1    34610   36081   -       1       1       0       0       0.0     1,1,1,1
uc001aal.1      chr1    69090   70008   +       1       1       0       0       0.0     1,1
uc001aaq.2      chr1    321083  321115  +       1       1       0       0       0.0     1,1
uc001aar.2      chr1    321145  321207  +       1       1       0       0       0.0     1,1
uc001aau.3      chr1    323891  328581  +       1       1       0       2017    -73877.3844539  1,1,1,1


The problem is that every transcript has only one path. So for psg, if multiple transcript has overlapped exons, psg does not combine them together as one gene and put the multiple transcripts as multiple paths of this gene. Is it because the known gene provided by UCSC does not give the information of what multiple transcripts are different alternative splicing of the same gene? How can I fix that?

Jeremy
Reply all
Reply to author
Forward
0 new messages