Vprofiler query

235 views
Skip to first unread message

Jyoti Sutar

unread,
Dec 12, 2013, 5:03:18 AM12/12/13
to viral-to...@googlegroups.com
Hello,

I have Illumina paired end read data for an amplicon of interest of HIV. So far, I have done the assembly, sorting, indel realignment with GATK and variant calling using Vphaser2. I need to use Vprofiler for analysis and visualization of my variant call data. However, I am facing a number of issues with that. I am not able to perform Sam to qlx conversion with given perl script. It is not giving any errors per se, it just says that the process is 'killed'. I tried perl scripts from both Vprofiler suit and the RC454 package.
Also, can you please clarify what exactly is the assembly.fasta file. May be its some silly mistake I am making. I have taken the Indel realigned bam file, converted it to sam format with samtools, converted the same to fastq with bamtools and then obtained fasta and qual files from the same. For the assembly.fasta file, I used the consensus sequence for my amplicon of interest.
The command I have used is as follows.." $ perl /mypath/samToQlx.pl HIVRealigned_O.dat.sam HIVreads.fasta HIVreads.qual HIVassembly.fasta HIV.qlx"
samToQlx perl script works fine with the test data provided.

Please Help....

- Jyoti

Michael C. Zody

unread,
Dec 12, 2013, 11:02:13 AM12/12/13
to viral-to...@googlegroups.com
It sounds like you're doing everything properly. I wonder if you don't have enough memory on your machine or are not allowed to use enough memory to complete the larger job. If you try the process with a smaller subset of your real data, can you get it to complete? That's the first explanation that comes to mind for the process being 'killed'.

Mike

Jyoti Sutar

unread,
Dec 13, 2013, 9:46:03 AM12/13/13
to viral-to...@googlegroups.com
Yes, You were right. It was a memory issue. Got around it and converted the bam files to qlx format. Qlx files look fine, but when I run Vprofiler.pl, I get following error, and none of the output files have any data. Again, the test data runs fine.

My command is

$ perl /Mypath/vprofiler.pl -i HIV4VproInput.txt -o HIV4VproOutput_9 -noendvariant=10 -nt -codon

Output looks like....
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in concatenation (.) or string at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1396, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1506, <QLXFILE> line 431856.
Use of uninitialized value in subtraction (-) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1524, <QLXFILE> line 431856.
Use of uninitialized value in numeric le (<=) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1581, <QLXFILE> line 431856.
Use of uninitialized value in numeric le (<=) at /home/vainav/Atom/NGStoolsinstallations/VpSoftwarePackage/vprofiler.pl line 1588, <QLXFILE> line 431856.
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

Error in axis(1, at = xv, labels = lv) : no locations are finite
Calls: heatmap.2 -> axis
Execution halted
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

Error in axis(1, at = xv, labels = lv) : no locations are finite
Calls: heatmap.2 -> axis
Execution halted
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

Error in axis(1, at = xv, labels = lv) : no locations are finite
Calls: heatmap.2 -> axis
Execution halted


Where am I going wrong? :(

Jyoti Sutar

unread,
Dec 14, 2013, 9:49:59 AM12/14/13
to viral-to...@googlegroups.com
Okay... So as it turns out, My Qlx files were not fine, and there was a problem with the read names. I followed the pointers mentioned at https://groups.google.com/forum/?hl=en#!topic/viral-tool-users/XY9hwYYqKQ8 and all is working fine now :) Sorry for the trouble and Thank you.


--
You received this message because you are subscribed to a topic in the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/viral-tool-users/RWWdjaNhFpQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Ms. Jyoti Sutar,
PhD student,
Department of Biochemistry and Virology,
National Institute for Research in Reproductive Health,
Mumbai.

Jyoti Sutar

unread,
Jan 21, 2014, 9:39:45 AM1/21/14
to viral-to...@googlegroups.com
Hello,

I am having some issues understanding the VProfiler results. I have 2 queries as follows.

Query 1:
I would be grateful if you could explain how Vprofiler makes the decision of 'rejecting' certain codons. In my dataset, Sanger sequencing showed 'Serine' to be present at a certain position. The same amplicon was also sequenced with Illumina. In the codon frequency table, 'Aspargine' is present at that position with 100% frequency, though the coverage seems pretty low.

Nt Position  AA Position    Coverage(HQOnly)   ConsensusCodon   Primary Codon
7606            160                96(10191)                AAT(N)                  N (100.00%)(AAT)

 
When I checked the Codon details file, Serine, in spite of having high quality reads, is included in rejected codons.

7606-7608 (coverage of 11147):
Accepted codons:
AAT (N) 100.00% (LowQual : 12.50%)
Rejected codons:
TAG (*) 39 count (37 HQ reads)
GGA (G) 48 count (44 HQ reads)
AGT (S) 10085 count (9235 HQ reads)
TGA (*) 1 count (0 HQ reads)
TGT (C) 2 count (0 HQ reads)
ATT (I) 24 count (16 HQ reads)
--T (-) 19 count (16 HQ reads)
ATC (I) 5 count (5 HQ reads)
TAT (Y) 15 count (14 HQ reads)
AAC (N) 2 count (1 HQ reads)
AGC (S) 39 count (35 HQ reads)

So, I am a little confused if I should consider Serine in my analysis or not.

Query 2:

Also, in the Vprofiler analysis, -haplo flag gives error for some samples. I understand that this happens due to indels being present in the data set. Is there any way to fix this issue? It would be a great help if that can be done.

Thank you,
Jyoti

Michael C. Zody

unread,
Jan 22, 2014, 1:23:11 PM1/22/14
to viral-to...@googlegroups.com
Looking at the code, there are two conditions for a codon to be accepted.

First, all three of the nucleotides in the codon must individually be accepted in the nucleotide-based calls.

Second, the codon itself must appear with all three positions at high quality in at least one read.

Without seeing your data, it's very difficult for me to tell what's going on, but since you mention that you have indels in your data, my guess is that it has to do with the realignment V-Profiler sometimes does around indels when computing codon frequencies. The resulting highest frequency codon may not be supported by the nucleotide data, which uses a purely nucleotide based alignment and thus may have different calls at those positions in the codon.

I believe the solution to both this and your other problem is for us to reimplement V-Profiler to no longer do codon-based realignment of reads with indels. This is not clearly an optimal decision, since viruses with real frame-shifting indels obviously result in different codons, but as you can see, if we shift the alignments so that the same alignment is not used for nucleotide-based and codon-based variant calling, the results are inconsistent and confusing.

Unfortunately, we currently have several projects in house that are higher priority and do not require this feature of the software, so a rewrite of V-Profiler is at fairly low priority for us right now. I apologize for the inconvenience.

Mike

On Thursday, December 12, 2013 5:03:18 AM UTC-5, Jyoti Sutar wrote:
Message has been deleted

RG

unread,
Jan 23, 2014, 12:05:20 PM1/23/14
to viral-to...@googlegroups.com
Sorry,re-posting this to fix a typo. 

Hi, 
For what its worth, I have the same problem. Since vprofiler is the only tool I know of that works on codon level (which is great for HIV where you can have a >1 change relative to the reference in the same codon), it is extremely valuable. Hence, I am going to resort to parsing the codon detail file, which has all the info for each codon. I am using the raw variant file from vphaser 2 (I find that the strand bias test gets rid of some real mutations, maybe because of such high numbers), and then taking any codons that show up at a reasonable level (e.g. 2% of total) and have a high percentage of HQ reads (e.g. >70%). 
Ron

Jyoti Sutar

unread,
Jan 24, 2014, 1:09:06 AM1/24/14
to viral-to...@googlegroups.com
Yes, i tried the raw variant file from V phaser 2 for the analysis, and it did show up a lot more SNPs which I could observe in my sanger's data but not in my NGS data previously. Definitely helped. Thanks a lot.

-Jo
Reply all
Reply to author
Forward
0 new messages