Hi, I used UCSC known gene as gtf input, the output file: gene.results.txt is like this:
gene_id chrom start end strand num_transcripts num_paths num_parameters num_reads log_likelihood parameters
uc001aaa.3 chr1 11873 14409 + 1 1 0 153 -5645.251773 1,1,1,1
uc001aac.4 chr1 14361 29370 - 1 1 0 1310 -43392.9912489 1,1,1,1,1,1,1,1,1,1,1,1
uc001aae.4 chr1 14361 19759 - 1 1 0 2151 -65180.4352549 1,1,1,1,1,1,1,1,1,1,1
uc001aah.4 chr1 14361 29370 - 1 1 0 1672 -53253.3490026 1,1,1,1,1,1,1,1,1,1,1,1
uc001aai.1 chr1 16857 19759 - 1 1 0 1239 -34201.3210665 1,1,1,1,1,1,1
uc001aak.3 chr1 34610 36081 - 1 1 0 0 0.0 1,1,1,1
uc001aal.1 chr1 69090 70008 + 1 1 0 0 0.0 1,1
uc001aaq.2 chr1 321083 321115 + 1 1 0 0 0.0 1,1
uc001aar.2 chr1 321145 321207 + 1 1 0 0 0.0 1,1
uc001aau.3 chr1 323891 328581 + 1 1 0 2017 -73877.3844539 1,1,1,1
The problem is that every transcript has only one path. So for psg, if multiple transcript has overlapped exons, psg does not combine them together as one gene and put the multiple transcripts as multiple paths of this gene. Is it because the known gene provided by UCSC does not give the information of what multiple transcripts are different alternative splicing of the same gene? How can I fix that?