analyzing .vcf file

246 views
Skip to first unread message

Ananya Pathak

unread,
Dec 12, 2016, 10:27:37 AM12/12/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage, Anoop Anand Malik, sbhu...@teri.res.in
Greetings,

I used Tassel 5 (updated version) to get the .vcf file directly instead of hapmap file by running the command :


./run_pipeline.bat -ProductionSNPCallerPluginV2 -db ../GBSv2.db -e ApeKI -i ../GBSv2TestData/GBS/Chr9_10-20000000/ -k ../GBSv2TestData/GBS/Pipeline_Testing_key.txt -kmerLength 64 -minPosQS 1 -o ../GBSv2TestData/GBS/tempDir/maizeTestGBSGenosMinQ1.vcf -endPlugin  



In Tassel 3 I got a specific sequence from where we could easily identify SNP's for a particular sample as a hapmap file, but I am unable to get a similar result using Tassel 5 updated version.

I am hereby attaching a snapshot of my vcf file which I got from the above mentioned command. I hereby request to kindly explain me how to analyze the result from this file.






Thanks and regards

Ananya Pathak

Lynn Carol Johnson

unread,
Dec 12, 2016, 11:15:14 AM12/12/16
to tas...@googlegroups.com, Anoop Anand Malik, sbhu...@teri.res.in
Are you saying your vcf file has errors?  Have you ProductionSNPCallerPluginV2 requesting .h5 file as output?  Does that contain the data you expect?

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/f2a7c383-c844-42a8-98f2-62fbb7178938%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anoop Anand Malik

unread,
Dec 12, 2016, 11:59:54 AM12/12/16
to Lynn Carol Johnson, tas...@googlegroups.com, sbhushan
Thanks for your prompt reply Lynn. No, I think there is no error in our VCF file. We also got .h5 file but we are enable to interpret the SNP from vcf file. Please explain us how to identify snp from our VCF file.

We have also attached a screenshot of our VCF file but how to identify snp in this???

Thanks
Anoop

On 12-Dec-2016 9:45 PM, "Lynn Carol Johnson" <lc...@cornell.edu> wrote:
Are you saying your vcf file has errors?  Have you ProductionSNPCallerPluginV2 requesting .h5 file as output?  Does that contain the data you expect?

From: <tas...@googlegroups.com> on behalf of Ananya Pathak <pathaka...@gmail.com>
Reply-To: "tas...@googlegroups.com" <tas...@googlegroups.com>
Date: Monday, December 12, 2016 at 10:27 AM
To: "TASSEL - Trait Analysis by Association, Evolution and Linkage" <tas...@googlegroups.com>
Cc: Anoop Anand Malik <anoopan...@gmail.com>, "sbhu...@teri.res.in" <sbhu...@teri.res.in>
Subject: [TASSEL-Group] analyzing .vcf file

Greetings,

I used Tassel 5 (updated version) to get the .vcf file directly instead of hapmap file by running the command :


./run_pipeline.bat -ProductionSNPCallerPluginV2 -db ../GBSv2.db -e ApeKI -i ../GBSv2TestData/GBS/Chr9_10-20000000/ -k ../GBSv2TestData/GBS/Pipeline_Testing_key.txt -kmerLength 64 -minPosQS 1 -o ../GBSv2TestData/GBS/tempDir/maizeTestGBSGenosMinQ1.vcf -endPlugin  



In Tassel 3 I got a specific sequence from where we could easily identify SNP's for a particular sample as a hapmap file, but I am unable to get a similar result using Tassel 5 updated version.

I am hereby attaching a snapshot of my vcf file which I got from the above mentioned command. I hereby request to kindly explain me how to analyze the result from this file.






Thanks and regards

Ananya Pathak

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+unsubscribe@googlegroups.com.

Lynn Carol Johnson

unread,
Dec 12, 2016, 1:56:57 PM12/12/16
to Anoop Anand Malik, tas...@googlegroups.com, sbhushan
GBSv2 ProductionSNPCallerPluginV2 returns a list of chrom/positions where a SNP was identified.  This list is filtered based on the minimum quality score value given in the plugin’s minPosQS parameter (which defaults to 0).  Your output example below shows the list of SNPs with taxa information

If you’re looking for a means to show the tags from your DB where a SNP occurred, you can run the GBSv2 pipeline’s SNPCutPosTagVerificationPlugin, giving it “snp” as the “type” parameter.  Please see the documentation for details on running this plugin against your database.



Thanks - Lynn

Ananya Pathak

unread,
Dec 14, 2016, 7:01:04 AM12/14/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage, anoopan...@gmail.com, sbhu...@teri.res.in
Hi Lynn

I tried running the SNPCutPosTagVerificationPlugin .The input and output are as follows:


administrator@administrator-Inspiron-580s:/media/administrator/01D1ECC34D35E2D0/PTCMB/tassel-5-standalone$ ./run_pipeline.pl -fork1 -SNPCutPosTagVerificationPlugin -db ../GBSv2.db -chr 1 -pos 7323 -strand 1 -type snp -outFile ../testout.txt -endPlugin -runfork1
./lib/colt.jar:./lib/ahocorasick-0.2.4.jar:./lib/batik-awt-util.jar:./lib/batik-css.jar:./lib/batik-dom.jar:./lib/batik-ext.jar:./lib/batik-gui-util.jar:./lib/batik-gvt.jar:./lib/batik-parser.jar:./lib/batik-svg-dom.jar:./lib/batik-svggen.jar:./lib/batik-util.jar:./lib/batik-xml.jar:./lib/biojava-alignment-4.0.0.jar:./lib/biojava-core-4.0.0.jar:./lib/biojava-phylo-4.0.0.jar:./lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:./lib/commons-codec-1.10.jar:./lib/commons-math3-3.4.1.jar:./lib/ejml-0.23.jar:./lib/forester.jar:./lib/geronimo-spec-activation-1.0.2-rc4.jar:./lib/guava-19.0.jar:./lib/htsjdk-1.138.jar:./lib/itextpdf-5.1.0.jar:./lib/javax.json-1.0.4.jar:./lib/jcommon-1.0.6.jar:./lib/jfreechart-1.0.3.jar:./lib/json-simple-1.1.1.jar:./lib/junit-4.10.jar:./lib/log4j-1.2.13.jar:./lib/mail-1.4.jar:./lib/poi-3.0.1-FINAL-20070705.jar:./lib/postgresql-9.4-1201.jdbc41.jar:./lib/slf4j-api-1.7.10.jar:./lib/slf4j-simple-1.7.10.jar:./lib/snappy-java-1.1.1.6.jar:./lib/sqlite-jdbc-3.8.5-pre1.jar:./lib/trove-3.0.3.jar:./lib/xercesImpl.jar:./lib/xml.jar:./lib/xmlParserAPIs.jar:./sTASSEL.jar
Memory Settings: -Xms512m -Xmx1536m
Tassel Pipeline Arguments: -fork1 -SNPCutPosTagVerificationPlugin -db ../GBSv2.db -chr 1 -pos 7323 -strand 1 -type snp -outFile ../testout.txt -endPlugin -runfork1
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.31  Date: October 20, 2016
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 1365 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_111
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 4
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -SNPCutPosTagVerificationPlugin, -db, ../GBSv2.db, -chr, 1, -pos, 7323, -strand, 1, -type, snp, -outFile, ../testout.txt, -endPlugin, -runfork1]
net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin: time: Dec 14, 2016 17:19:36
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
SNPCutPosTagVerificationPlugin Parameters
db: ../GBSv2.db
chr: 1
pos: 7323
strand: 1
type: snp
outFile: ../testout.txt

size of all tags in tag table=332832
size of all tissues in tissue table=0
size of all tags in mappingApproach table=2
size of all taxa in taxa table=96
size of all positions in snpPosition table=56039
[pool-1-thread-1] ERROR net.maizegenetics.analysis.gbs.v2.UpdateSNPPositionQualityPlugin - SNPCutPosTagVerificationPlugin: caught error java.lang.NullPointerException
java.lang.NullPointerException
    at net.maizegenetics.dna.tag.TagDataSQLite.getAllelesTagTaxaDistForSNP(TagDataSQLite.java:807)
    at net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin.processData(SNPCutPosTagVerificationPlugin.java:96)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:111)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1750)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin: time: Dec 14, 2016 17:19:38
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin: time: Dec 14, 2016 17:19:38: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.SNPCutPosTagVerificationPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.


This is the main error message: [pool-1-thread-1] ERROR net.maizegenetics.analysis.gbs.v2.UpdateSNPPositionQualityPlugin - SNPCutPosTagVerificationPlugin: caught error java.lang.NullPointerException

Kindly help us to rectify the above mentioned error.


Thanks and regards

Ananya

Lynn Carol Johnson

unread,
Dec 14, 2016, 7:54:50 AM12/14/16
to tas...@googlegroups.com, anoopan...@gmail.com, sbhu...@teri.res.in
The error below indicates the position for which you’ve requested information is not found in the db.  Have you tried running this with other positions?  Output from SNPQualityProfilerPlugin will give you valid positions found in the db.

From: <tas...@googlegroups.com> on behalf of Ananya Pathak <pathaka...@gmail.com>
Reply-To: "tas...@googlegroups.com" <tas...@googlegroups.com>
Date: Wednesday, December 14, 2016 at 7:01 AM
To: "TASSEL - Trait Analysis by Association, Evolution and Linkage" <tas...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.

To post to this group, send email to tas...@googlegroups.com.

Anoop Anand Malik

unread,
May 29, 2018, 9:19:52 AM5/29/18
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Dear Lynn,

Thanks for all  your help in standardizing the TASSEL Linux based pipeline protocol in our Lab. Since now we have a complete vcf file for 96 samples, I would like to clarify my doubts regarding the filtered SNPs. 

(1) How to retrieve back the 100 bp raw sequence of a particular SNP from a read as we want to validate it by using SNP primers in the wet lab.

(2) How to get the actual number of reads sequenced for one particular SNP. Suppose S1_234563 [a/g] SNP is present in the vcf file so to find this SNP how many times the read was sequenced.


Looking forward for your valuable suggestions.

Thanks and regards

Anoop Anand Malik

Anoop Anand Malik

unread,
May 29, 2018, 9:19:52 AM5/29/18
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Dear Lynn,

Thanks for all  your help in standardizing the TASSEL Linux based pipeline protocol in our Lab. Since now we have a complete vcf file for 96 samples, I would like to clarify my doubts regarding the filtered SNPs. 

(1) How to retrieve back the 100 bp raw sequence of a particular SNP from a read as we want to validate it by using SNP primers in the wet lab.

(2) How to get the actual number of reads sequenced for one particular SNP. Suppose S1_234563 [a/g] SNP is present in the vcf file so to find this SNP how many times the read was sequenced.


Looking forward for your valuable suggestions.

Thanks and regards

Anoop Anand Malik


On Wednesday, 14 December 2016 18:24:50 UTC+5:30, Lynn Johnson wrote:
Reply all
Reply to author
Forward
0 new messages