Hi,
Thanks for the great software.
I have a few questions.
First, I am a bit unclear as to the meaning of the output from V-Phaser.
In particular, can you please clarify:
1. what exactly are D1, D2, D3, D4 for Var and i for Cons? are these insertions and the number indicates length of insertion?
2. what is IA for variant and d for Cons? or more complicated:
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
2197 IG d 0.583 lp 1.086 IAGCATG:1:0 IAGCGTA:0:1 IAGCGTG:384:417 IATCGTG:1:0 IG:35:43 IGGCGTG:0:1 d:3024:3276
are these different observed insertions, with I indicating insertion and the subsequent sequence the observed sequence?
Second, should I be concerned by:
1. having p values that are above 1 for the Strd_bias? actually, would be great to know if you are simply doing a fisher's test or how this is calculated.
2. the fact that one of these results is significant and the other is not (there should be no snps in this sample, it is sequencing of the plasmid as a control for background from the technique).
Ref_Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
9083 T G 0.4557 snp 43.74 G:17:693 T:535:17
9085 T G 0.05416 snp 46.35 G:10:630 T:535:18
Finally, I tried to run the conversion script on an output (
vph2vprf_format.pl). It runs out of memory (I have 64 gb) with this message: Out of memory during array extend at
vph2vprf_format.pl line 163
I think its because there are a few lp with >6 variants. Any suggestions other than deleting or manually curating?
Many thanks,
Ron