psg_prepare_reference.py and psg_infer_frequencies.py errors

47 views
Skip to first unread message

GF

unread,
Nov 24, 2013, 4:56:45 AM11/24/13
to psginfe...@googlegroups.com
Hi,

First - I want to thank for PSGInfer- it looks very interesting and I am looking forward to using it.

I have 2 errors with PSGInfer, and would appreciate your help:

(1) I tried running psg_prepare_reference.py on a human gtf files downloaded from igenomes.

Here is the command:

 python /usr/local/src/PSGInfer/psginfer-1.2.0/psg_prepare_reference.py -agenes.gtf -p 3 -g /ngs001/db/genomes/human/ human_igenomes

I am getting the following error:
Error: Records do not have the same value for attribute 'strand'


(2) I ran cufflinks and cuffcompare to define also novel transcripts. I ran psg_prepare_reference.py with the gtf obtained from cufflinks, and this worked (I did not get error here, however would like also to run psginfer  with the standard gtf).

I then tried running psg_infer_frequencies.py, but got the following error:

.
.
.
java -cp /home/local/src/PSGInfer/psginfer-1.2.0/LearnPSG AlignPaired /tmp/psginfer-sample-kfEYRx/74300.psg /tmp/psginfer-sample1-kfEYRx/74300.1.align /tmp/psginfer-sample1-kfEYRx/74300.2.align 100 200 25.000000 1.000000 > /tmp/psginferegf0-kfEYRx/74300.em.out 2> /tmp/psginfer-egf0-kfEYRx/74300.em.out.log
Estimation time: 2474 seconds
Total time: 7371 seconds
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 530, in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 483, in run
  File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 272, in _handle_                                                                                                                           
<type 'exceptions.TypeError'>: 'NoneType' object is not callable


I got the following output files:
 alignment1.bowtie.gz
 alignment1.bowtie.gz.log
 alignment2.bowtie.gz
 alignment2.bowtie.gz.log
 gene.results.debug.txt
 gene.results.txt
 isoform.results.txt


when looking into genes.results.txt - the num_reads is always 0.

Can you please help out with these two errors?

Thanks a lot.

Colin Dewey

unread,
Dec 1, 2013, 4:49:16 PM12/1/13
to psginfe...@googlegroups.com
Hi GF,

(1) This error indicates that your GTF annotation has at least one gene or transcript with exons on both strands.  The latest version (v1.2.1) will output a more informative message (i.e. it gives you the gene_id or transcript_id of the bad annotation) in this case.  Please give this a try and then correct your annotation if necessary.

(2) The error you are receiving unfortunately looks like a lingering bug in Python (http://bugs.python.org/issue9207). It sounds like it should not occur every time.  Fortunately, the output of psg_infer_frequencies should still be fine even if this error occurs.  The issue you are reporting regarding num_reads = 0 for all records is probably because of a bug in handling read names with whitespace in the last version.  Could you give the new version (v1.2.1) a try to see if the issue is resolved?

Best,
Colin

--
You received this message because you are subscribed to the Google Groups "PSGInfer Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psginfer-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

GF

unread,
Dec 2, 2013, 5:08:23 AM12/2/13
to psginfe...@googlegroups.com
Hi Colin,

Thanks a lot for your reply!

(1) Regarding the gtf:
Is there a way to overcome this? As far as I know only gtf from Ensembel obey this rule. I would like to work with refseq/UCSC genes - is this possible?

(2) I downloaded the new version (I downloaded it to a directory in my account. is it OK, or is it needed to be installed by a sys administrator?)
I ran this with the prepare ref of the cufflinks output.
I don't get errors anymore, but the result gene and isoform files are empty (have only headers).
cat gene.results.txt
gene_id chrom   start   end     strand  num_transcripts num_paths       num_parameters  num_reads       log_likelihood  parameters

Any suggestions?

Thanks a lot.

Colin Dewey

unread,
Dec 2, 2013, 4:09:52 PM12/2/13
to psginfe...@googlegroups.com
Hi GF,

(1) You can use the RefSeq or UCSC gene sets if you are willing to do some processing of them first.  The main issue with these gene sets is the same gene_id is sometimes used for multiple sets of transcripts at different locations (and strands) throughout the genome.  What you will need to do is make sure that sets of transcripts that do not overlap with each other and/or are on opposite strands are given distinct gene_ids.  I have some custom scripts that I use for this purpose.  They are not quite ready for distribution, but if there is enough interest, I could put the time into getting them cleaned up for others to use.

(2) First, yes, you can install the software in your own account.  The empty output is a bit odd.  Could you send the logging output of your run of psg_infer_frequencies?

Colin

GF

unread,
Dec 9, 2013, 12:46:38 AM12/9/13
to psginfe...@googlegroups.com
Hi Colin,

Thanks a lot for the reply. I somehow missed it and saw it only now.

The command:

python psginfer-1.2.1/psg_infer_frequencies.py -p 3 human_psg out_9_12 Sample_1_R1.fastq Sample_1_R2.fastq

Here is the screen output:
Using 3 cores
Reference reading time: 7 seconds
bowtie --best --strata -p 3 -l 25 -a -m 200 human_psg/references Sample_1_R1.fastq 2> out_8_12/alignment1.bowtie.gz.log | sort -k3,3n -t'  ' -T /tmp/psginfe
r-out_8_12-c0YnMq | gzip -c > out_8_12/alignment1.bowtie.gz
Mate 1 alignment time: 4555 seconds
bowtie --best --strata -p 3 -l 25 -a -m 200 human_psg/references Sample_1_R2.fastq 2> out_8_12/alignment2.bowtie.gz.log | sort -k3,3n -t'  ' -T /tmp/psginfe
r-out_8_12-c0YnMq | gzip -c > out_8_12/alignment2.bowtie.gz
Mate 2 alignment time: 4576 seconds
Total alignment time: 9130 seconds
Estimation time: 0 seconds
Total time: 4583 seconds

In the output directory: there are the alignment files (which are not empty), but the  other files include only the header.

Do you have any idea what could be the problem?

Thanks a lot.

Colin Dewey

unread,
Dec 9, 2013, 10:49:26 AM12/9/13
to psginfe...@googlegroups.com
Hi GF,

Thanks for sending the log.  This is odd, it doesn't appear that any of the parameter estimation code is being run.  So that I can debug this situation, could you send me the following?

1. human_psg/index.txt
2. The two files that result from the following commands:

gunzip -c out_8_12/alignment1.bowtie.gz | cut -f3 | gzip > target1.gz
gunzip -c out_8_12/alignment2.bowtie.gz | cut -f3 | gzip > target2.gz

Thanks,
Colin

GF

unread,
Dec 23, 2013, 2:45:33 AM12/23/13
to psginfe...@googlegroups.com
Hi Colin,

It is working now!

Thanks a lot

Colin Dewey

unread,
Dec 23, 2013, 8:11:17 AM12/23/13
to psginfe...@googlegroups.com
Hi GF,

Did you identify what the issue was?  If there is a better error message we can report it would be good to know what the underlying problem was.

Thanks,
Colin

Reply all
Reply to author
Forward
0 new messages