psg reports zero reads for most genes

Zhenhua Wu

unread,

Sep 6, 2013, 4:18:26 PM9/6/13

to psginfe...@googlegroups.com

I used psg for one RNA-Seq sample. Part of the results is below:

30393 chrM 7586 9208 + 1 1 0 923933 -24090795.9703 1,1

30400 chrM 14856 15888 + 1 1 0 213843 -4908408.49414 1,1

30395 chrM 10059 10404 + 1 1 0 54425 -1772282.13416 1,1

A1BG-AS1 chr19 58859116 58866549 + 2 2 1 450 -10320.7133257 0.603,0.397,1,1,1,1,1,1,1,1,1,1

31848 chrY 59358328 59360854 - 1 1 0 127 -5211.85729574 1,1,1,1

31680 chrX 155255322 155257848 - 1 1 0 127 -5211.85729574 1,1,1,1

4249 chr11 5686441 5687610 + 1 1 0 53 -849.793969551 1,1

7SK chr8 1323794 237284409 - 127 127 126 18 -1063.9012368 0.006,0.006,0.006,0.006,0.007,0.006,0.006,0.006,0.006,0.006,0.013,0.006,0.007,0.006,0.006,0.006,0.007,0.007,0.006,0.006,0.006,0.007,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0

.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.006,0.012,0.006,0.007,0.006,0.006,0.006,0.1,0.006,0.006,0.006,0.006,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.00

8,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008

,0.008,0.008,0.008,0.008,0.008,0.008,0.008,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,

1,1,1,1,1,1,1,1,1,1,1,1,1

8923 chr14 107034168 107035191 - 1 1 0 3 -89.9331240695 1,1,1

6441 chr12 52473479 52502034 + 1 1 0 3 -85.3321072409 1,1,1,1

18795 chr21 35269910 35272163 + 1 1 0 3 -38.0337252796 1,1

25333 chr6 168265223 168265426 + 1 1 0 1 -18.6102396834 1,1

..

You can see only some chrM genes have reads (I ranked the results by the number of reads in that genes), all the rest is pretty much zero count.

My command line for usage is (I generate a gtf from knowGenes in UCSC with the correct gene.id information):

psg_prepare_reference.py -k 1 -g ../fasta/hg19/ -a ./knownGene-isoform-06-30-2013.gtf hg19.knownGene.063013

psg_infer_frequencies.py Nalm_6_12H2_1_DMSO_6hr_L1 ./hg19.knownGene.063013 ./Nalm_6_12H2_1_DMSO_6hr_L1_1.fq ./Nalm_6_12H2_1_DMSO_6hr_L1_2.fq

What is the problem?

Jeremy

Message has been deleted

Hyunmin Kim

unread,

Oct 12, 2013, 1:21:57 AM10/12/13

to psginfe...@googlegroups.com

Hi, Jeremy

I found where it causes problem and a solution here (http://code.google.com/p/codedb-bioinformatics/wiki/PsgInfer)

Colin Dewey

unread,

Oct 12, 2013, 3:40:50 PM10/12/13

to Hyunmin Kim, psginfe...@googlegroups.com

Hi Hyunmin,

Thanks for documenting these issues. We are working on a new version that should address them.

The sorting issue actually has to do with gene_ids and not read IDs in the FASTQ files. To make sure things run correctly with the current version, one needs to ensure that all non-alphanumeric characters in the GTF gene_ids are converted to underscores ('_') ahead of time and also that the environment variable "LC_ALL" is set to "C" (e.g., with the command "export LC_ALL=C"). The latter is needed because of locale-dependent sorting order with the unix "sort" command.

Read id's in FASTQ files should generally not contain spaces (there may be multiple words on the title line for each read, but the first whitespace-delimited word is considered the "id"), so there shouldn't be any issues with read naming. However, it is definitely the case that mates from the same read pair need to have matching names up to last character (which is usually 1 or 2, indicating the mate number).

Best,

Colin

--
You received this message because you are subscribed to the Google Groups "PSGInfer Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psginfer-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward