Meaning of V-Phaser2 output, and conversion for V profiler problem

205 views
Skip to first unread message

RG

unread,
Nov 6, 2013, 12:24:56 PM11/6/13
to viral-to...@googlegroups.com
Hi, 
Thanks for the great software. 
I have a few questions. 

First, I am a bit unclear as to the meaning of the output from V-Phaser. 
In particular, can you please clarify:
1. what exactly are D1, D2, D3, D4 for Var and i for Cons? are these insertions and the number indicates length of insertion?
2. what is IA for variant and d for Cons? or more complicated:
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
2197 IG d 0.583 lp 1.086 IAGCATG:1:0 IAGCGTA:0:1 IAGCGTG:384:417 IATCGTG:1:0 IG:35:43 IGGCGTG:0:1 d:3024:3276

are these different observed insertions, with I indicating insertion and the subsequent sequence the observed sequence?




Second,  should I be concerned by:
1. having p values that are above 1 for the Strd_bias? actually, would be great to know if you are simply doing a fisher's test or how this is calculated. 
2. the fact that one of these results is significant and the other is not (there should be no snps in this sample, it is sequencing of the plasmid as a control for background from the technique). 
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
9083 T G 0.4557 snp 43.74 G:17:693 T:535:17
9085 T G 0.05416 snp 46.35 G:10:630 T:535:18



Finally, I tried to run the conversion script on an output (vph2vprf_format.pl). It runs out of memory (I have 64 gb) with this message: Out of memory during array extend at vph2vprf_format.pl line 163
I think its because there are a few lp with >6 variants. Any suggestions other than deleting or manually curating? 

Many thanks, 
Ron 


Xiao Yang

unread,
Nov 6, 2013, 1:35:35 PM11/6/13
to viral-to...@googlegroups.com
Hi Ron, 

See response below.


On Wed, Nov 6, 2013 at 12:24 PM, RG <ron.g...@gmail.com> wrote:
Hi, 
Thanks for the great software. 
I have a few questions. 

First, I am a bit unclear as to the meaning of the output from V-Phaser. 
In particular, can you please clarify:

I refer you to our publication http://www.biomedcentral.com/1471-2164/14/674
which should clarify a lot of questions. 
 
1. what exactly are D1, D2, D3, D4 for Var and i for Cons? are these insertions and the number indicates length of insertion?

D stands for deletion, i stands for insertion. 
number follow D means the number of deleted bases.
And as you guessed, strings following i are actual strings observed that are
inserted at that location.
 
2. what is IA for variant and d for Cons? or more complicated:
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
2197 IG d 0.583 lp 1.086 IAGCATG:1:0 IAGCGTA:0:1 IAGCGTG:384:417 IATCGTG:1:0 IG:35:43 IGGCGTG:0:1 d:3024:3276

are these different observed insertions, with I indicating insertion and the subsequent sequence the observed sequence?


 
Yes. d is the place holder for deletion. As IG occurs as variant, wrt to that, d
becomes the consensus but u do not know what consensus should be deleted.  
 


Second,  should I be concerned by:
1. having p values that are above 1 for the Strd_bias? actually, would be great to know if you are simply doing a fisher's test or how this is calculated. 

no concerns... likely a small bug. It's fishers. 
 
2. the fact that one of these results is significant and the other is not (there should be no snps in this sample, it is sequencing of the plasmid as a control for background from the technique). 
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
9083 T G 0.4557 snp 43.74 G:17:693 T:535:17
9085 T G 0.05416 snp 46.35 G:10:630 T:535:18


In cases like this, although there should not be, but it doesn't necessarily mean
the bases do not occur in your sequencing library. There's no way of telling 
when reads does contain unintended bases, they should be variants or not. 
As you look at both positions, I believe T and G both occur in the reads. 
 

Finally, I tried to run the conversion script on an output (vph2vprf_format.pl). It runs out of memory (I have 64 gb) with this message: Out of memory during array extend at vph2vprf_format.pl line 163
I think its because there are a few lp with >6 variants. Any suggestions other than deleting or manually curating? 

I do not know which version you have. this script is not publicly released but been
locally updated multiple times due to possible bugs. I attached the most recent 
version, but not sure if it fixed the issue. 

Many thanks, 
Ron 


--
You received this message because you are subscribed to the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
- Xiao
vph2vprf_format.pl

RG

unread,
Nov 7, 2013, 10:05:57 AM11/7/13
to viral-to...@googlegroups.com
Thanks, the new file worked. 
However, I now get  errors from  vprofiler (command perl ../vprofiler.pl -i profile.txt -o 917_vpro -noendvariant=10 -nt -codon):

readline() on closed filehandle AAALIGN at ../vprofiler.pl line 703.
Use of uninitialized value in numeric lt (<) at ../vprofiler.pl line 718.
rm: cannot remove ‘917_vpro_HWI-D00104:141:H08FNADXX:2:2105:7126:12507/2_AA.mfa’: No such file or directory
rm: cannot remove ‘917_vpro_HWI-D00104:141:H08FNADXX:2:2105:7126:12507/2_AA.mfa.afa’: No such file or directory
for many many reads. 

If I only keep the snp (removing lp) in the variant caller output then the program runs fine. Hence, the problem is dealing with deletions. 

Do you have any recommendations of how to have this run with the deletions or insertions still there?

Thanks again, 
Ron 

Michael C. Zody

unread,
Nov 8, 2013, 12:33:55 PM11/8/13
to viral-to...@googlegroups.com
When using the codon based variant reporting in V-profiler, if there are indels in the reads, it tries to do a translated amino acid alignment to decide where to the put the indels so that they minimally disrupt the codon calls. It looks like what's happening here is that it's failing to create the amino acid alignments and then complaining when it can't read them back in.

To be honest, I'm not sure if this is a feature that ever worked. It just turns out with V-phaser 2 we get a lot more indel calls than we used to. What will happen is that reads with indels won't contribute to the codon frequency calls.

Unfortunately, this is not trivially fixable, and is a feature we may remove in the next revision of the software anyway because it causes V-profiler to call codon variants that are not supported by the underlying nucleotide alignments, which confuses people.

Mike

bram.v...@gmail.com

unread,
Nov 26, 2013, 8:16:39 AM11/26/13
to viral-to...@googlegroups.com
Dear Xiao,

Facing a similar problem: the conversion from V-Phaser to V-Prof doesn't seem to finish.
Is there another update of the script available? Because it might be helpful for you to have a look at the input files to see what could have caused the error, I attached the problematic files.

Best regards,
Bram
 

Op woensdag 6 november 2013 19:35:35 UTC+1 schreef Xiao Yang:
Consensus.fdr.var.txt
PB1_mergedContigsOut_assembly.fa

Xiao Yang

unread,
Nov 26, 2013, 1:30:19 PM11/26/13
to viral-to...@googlegroups.com
Hi Ron,

These files do not create a problem for me. 
See attached script and output.

Xiao
output.txt
vph2vprf_format.pl

Xiao Yang

unread,
Nov 26, 2013, 1:30:55 PM11/26/13
to viral-to...@googlegroups.com
Sorry, I'm directing this msg to Bram.
--
- Xiao

Bram Vrancken

unread,
Nov 27, 2013, 3:31:43 PM11/27/13
to viral-to...@googlegroups.com
Dear Xiao,

Thanks for rerunning the file. It now also runs on my PC; I must have done something else wrong. Sorry for wasting your time!

Bram

You received this message because you are subscribed to a topic in the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/viral-tool-users/NQSg_iRI9II/unsubscribe.
To unsubscribe from this group and all its topics, send an email to viral-tool-use...@googlegroups.com.

Javier Perez Florido

unread,
May 27, 2015, 4:57:01 AM5/27/15
to viral-to...@googlegroups.com, xiao...@broadinstitute.org
Dear Xiao,
Could you please attache also the script dependence named ntfreq_raw_all.pl which is used inside vph2vprf_format.pl?
Thanks,
Javier
Reply all
Reply to author
Forward
0 new messages