psg reports zero reads for most genes

21 views
Skip to first unread message

Zhenhua Wu

unread,
Sep 6, 2013, 4:18:26 PM9/6/13
to psginfe...@googlegroups.com
I used psg for one RNA-Seq sample. Part of the results is below:

30393   chrM    7586    9208    +       1       1       0       923933  -24090795.9703  1,1
30400   chrM    14856   15888   +       1       1       0       213843  -4908408.49414  1,1
30395   chrM    10059   10404   +       1       1       0       54425   -1772282.13416  1,1
A1BG-AS1        chr19   58859116        58866549        +       2       2       1       450     -10320.7133257  0.603,0.397,1,1,1,1,1,1,1,1,1,1
31848   chrY    59358328  59360854      -       1       1       0       127     -5211.85729574  1,1,1,1
31680   chrX    155255322 155257848     -       1       1       0       127     -5211.85729574  1,1,1,1
4249    chr11   5686441 5687610 +       1       1       0       53      -849.793969551  1,1
7SK     chr8    1323794 237284409       -       127     127     126     18      -1063.9012368   0.006,0.006,0.006,0.006,0.007,0.006,0.006,0.006,0.006,0.006,0.013,0.006,0.007,0.006,0.006,0.006,0.007,0.007,0.006,0.006,0.006,0.007,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0
.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.012,0.006,0.007,0.006,0.006,0.006,0.1,0.006,0.006,0.006,0.006,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.00
8,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008
,0.008,0.008,0.008,0.008,0.008,0.008,0.008,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1
8923    chr14   107034168 107035191     -       1       1       0       3       -89.9331240695  1,1,1
6441    chr12   52473479  52502034      +       1       1       0       3       -85.3321072409  1,1,1,1
18795   chr21   35269910  35272163      +       1       1       0       3       -38.0337252796  1,1
25333   chr6    168265223 168265426     +       1       1       0       1       -18.6102396834  1,1
..

You can see only some chrM genes have reads (I ranked the results by the number of reads in that genes), all the rest is pretty much zero count.

My command line for usage is (I generate a gtf from knowGenes in UCSC with the correct gene.id information):

psg_prepare_reference.py -k 1 -g ../fasta/hg19/ -a ./knownGene-isoform-06-30-2013.gtf hg19.knownGene.063013

psg_infer_frequencies.py Nalm_6_12H2_1_DMSO_6hr_L1 ./hg19.knownGene.063013 ./Nalm_6_12H2_1_DMSO_6hr_L1_1.fq ./Nalm_6_12H2_1_DMSO_6hr_L1_2.fq

What is the problem?


Jeremy
Message has been deleted

Hyunmin Kim

unread,
Oct 12, 2013, 1:21:57 AM10/12/13
to psginfe...@googlegroups.com
Hi, Jeremy
I found where it causes problem and a solution here (http://code.google.com/p/codedb-bioinformatics/wiki/PsgInfer)

Colin Dewey

unread,
Oct 12, 2013, 3:40:50 PM10/12/13
to Hyunmin Kim, psginfe...@googlegroups.com
Hi Hyunmin,

Thanks for documenting these issues.  We are working on a new version that should address them.

The sorting issue actually has to do with gene_ids and not read IDs in the FASTQ files.  To make sure things run correctly with the current version, one needs to ensure that all non-alphanumeric characters in the GTF gene_ids are converted to underscores ('_') ahead of time and also that the environment variable "LC_ALL" is set to "C" (e.g., with the command "export LC_ALL=C").  The latter is needed because of locale-dependent sorting order with the unix "sort" command.

Read id's in FASTQ files should generally not contain spaces (there may be multiple words on the title line for each read, but the first whitespace-delimited word is considered the "id"), so there shouldn't be any issues with read naming.  However, it is definitely the case that mates from the same read pair need to have matching names up to last character (which is usually 1 or 2, indicating the mate number).

Best,
Colin

--
You received this message because you are subscribed to the Google Groups "PSGInfer Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psginfer-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages