Problems using V-FAT for large DNA viruses

21 views
Skip to first unread message

Dan Depledge

unread,
Dec 12, 2015, 3:16:14 AM12/12/15
to Broad Viral Tool Users
Hi,

I've been experimenting with using V-FAT to orient and merge de novo contigs to herpesvirus genomes. However, the alignment of the contigs only occurs in the first ~50kb of the genome (genome size is ~150 kp) so I suspect there are some bugs present. I have subsequently BLASTed all contigs against my reference sequence and these contigs do in fact map across the whole genome

V-FAT output

1       4441    contig_0 1-4444
4442    5084    contig_0 4445-5088
5085    5726    contig_10 644-1284
5727    5787    contig_0 5731-5791
5788    6095    contig_0 5792-6099
6096    6403    contig_10 1593-1897
6404    20604   contig_0 6408-20626
20605   20877   contig_0 20627-20905
20878   21150   contig_7 20878-21150
21151   21173   contig_7 21151-21173
21174   21196   contig_4 701-723
21197   23952   contig_4 724-3479
23953   26708   contig_6 2764-5519
26709   28415   contig_6 5520-7226
28416   30122   contig_4 7944-9648
30123   34037   contig_7 30123-34037
34038   38258   contig_7 34038-38258
38259   39208   950N
39209   40423   contig_10 1932-3148
40424   41036   contig_10 3210-3825
41037   43862   2826N
43863   53081   contig_1 9074-18292


BLAST output

NODE_17_length_1291_cov_66.5906_ID_851 397 966
NODE_21_length_546_cov_41.7292_ID_859 1024 1400
NODE_9_length_2256_cov_87.8687_ID_835 1496 2359
NODE_9_length_2256_cov_87.8687_ID_835 2417 2814
NODE_9_length_2256_cov_87.8687_ID_835 2846 3796
NODE_23_length_350_cov_73.8278_ID_863 3879 4225
NODE_11_length_2109_cov_68.5535_ID_839 4442 5814
NODE_11_length_2109_cov_68.5535_ID_839 5788 6403
NODE_15_length_1718_cov_73.5874_ID_847 6572 8157
NODE_25_length_275_cov_50.2929_ID_867 8235 8510
NODE_4_length_11427_cov_83.2741_ID_825 9203 20580
NODE_5_length_10795_cov_83.8694_ID_827 20605 31262
NODE_10_length_2166_cov_77.9943_ID_837 21370 21446
NODE_13_length_2014_cov_79.4254_ID_843 31355 33314
NODE_7_length_9463_cov_78.4296_ID_831 33299 42766
NODE_3_length_14268_cov_80.2911_ID_823 42749 56768
NODE_8_length_5697_cov_81.1498_ID_833 56776 62477
NODE_2_length_18292_cov_83.7743_ID_821 62404 71620
NODE_2_length_18292_cov_83.7743_ID_821 62404 71620
NODE_8_length_5697_cov_81.1498_ID_833 62474 62547
NODE_22_length_500_cov_43.0355_ID_861 71865 72353
NODE_12_length_2111_cov_64.2074_ID_841 72749 74223
NODE_12_length_2111_cov_64.2074_ID_841 74306 74784
NODE_1_length_42352_cov_78.8134_ID_819 74789 117153
NODE_5_length_10795_cov_83.8694_ID_827 95263 95381
NODE_25_length_275_cov_50.2929_ID_867 117864 118139
NODE_15_length_1718_cov_73.5874_ID_847 118217 119802
NODE_11_length_2109_cov_68.5535_ID_839 119971 120586
NODE_11_length_2109_cov_68.5535_ID_839 120560 121932
NODE_23_length_350_cov_73.8278_ID_863 122149 122495
NODE_9_length_2256_cov_87.8687_ID_835 122578 123528
NODE_9_length_2256_cov_87.8687_ID_835 123560 123957
NODE_9_length_2256_cov_87.8687_ID_835 124015 124878
NODE_21_length_546_cov_41.7292_ID_859 124974 125350
NODE_17_length_1291_cov_66.5906_ID_851 125408 125977
NODE_17_length_1291_cov_66.5906_ID_851 126406 126600
NODE_17_length_1291_cov_66.5906_ID_851 126685 126838
NODE_17_length_1291_cov_66.5906_ID_851 127119 127472
NODE_18_length_1120_cov_71.8476_ID_853 127587 128706
NODE_19_length_733_cov_48.4177_ID_855 129594 130319
NODE_16_length_1533_cov_73.2527_ID_849 130389 131920
NODE_10_length_2166_cov_77.9943_ID_837 132059 132428
NODE_10_length_2166_cov_77.9943_ID_837 132478 134236
NODE_6_length_9467_cov_79.7971_ID_829 134242 143723
NODE_14_length_1727_cov_88.957_ID_845 143836 145572
NODE_10_length_2166_cov_77.9943_ID_837 145583 145719
NODE_10_length_2166_cov_77.9943_ID_837 145769 146138
NODE_16_length_1533_cov_73.2527_ID_849 146277 147808
NODE_19_length_733_cov_48.4177_ID_855 147878 148603
NODE_18_length_1120_cov_71.8476_ID_853 149491 150610
NODE_17_length_1291_cov_66.5906_ID_851 150725 151078
NODE_17_length_1291_cov_66.5906_ID_851 151359 151512
NODE_17_length_1291_cov_66.5906_ID_851 151597 151791

(Note the contig names are different due to how V-FAT renames them during its operation)

Has anyone tried V-Fat with genomes larger than 50kb and if so, did you experience similar troubles and do you have any solutions?

Thanks,

Dan

Daniel Park

unread,
Feb 1, 2016, 11:35:20 AM2/1/16
to Broad Viral Tool Users
Hi Dan,

I've also recently tried to use V-FAT on viruses from the herpes family and it fails to produce any output at all (empty output files) despite good coverage data going in. I'm currently attempting to replace it with something MUMmer-based, but it is still a work in progress and has not been merged in yet.

Danny

Reply all
Reply to author
Forward
0 new messages