Question about discovery step - Does it really take this long or am I screwing something up here?

99 views
Skip to first unread message

Zachary Winn

unread,
Oct 26, 2022, 2:15:52 PM10/26/22
to TASSEL - Trait Analysis by Association, Evolution and Linkage

Hi,

I am very new to bash scripting and running pipelines. I have recently taken over the GBS pipeline of the CSU wheat breeding program. My predecessor left me a standard operating procedure that has TASSEL in it. I am currently working on doing a trial discovery run with a small number of lines n<400. I have successfully gotten through the following pipeline steps: GBSSeqToTagDBPlugin , TagExportToFastqPlugin, bwa, and SAMToGBSdbPlugin. However, I get to DiscoverySNPCallerPluginV2, and the process suddenly grinds to a slow crawl. The program seems to be running properly… however it is taking days to get through a couple chromosmes, and my predecessor told me this usually takes a couple hours at most. Attached are my logs for each step. Hopefully I am just doing something wrong, because if it takes this long to discover 400 lines, I would hate to think what would happen if we tried 40,000… I have attached a picture of the lscpu of the server I am working on. 

Here is an example of the code I have been running to do this.

############################################

###Here is an example of discovery written in plain code###

############################################

/mnt/wheatdrive/smallgrainslab/gbs_pipeline/dependencies/tassel-5-standalone/run_pipeline.pl -Xmx450G -fork1 -GBSSeqToTagDBPlugin -e PstI-MspI -i /mnt/wheatdrive/smallgrainslab/gbs_pipeline/test_area -db /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery.db -k /home/zjwinn/key_files/2022_avery_x_CO11D1397_keyfile.tsv -kmerLength 65 -mnQS 0 -mxKmerNum 5000000 -deleteOldData true -endPlugin

 /mnt/wheatdrive/smallgrainslab/gbs_pipeline/dependencies/tassel-5-standalone/run_pipeline.pl -Xmx450G -fork1 -TagExportToFastqPlugin -db /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery.db -o /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery_gbs_tags.fa.gz -endPlugin -runfork1

 bwa mem -t 63 /mnt/wheatdrive/smallgrainslab/gbs_pipeline/ref_genos/wheat/refseqv2.0/iwgsc_refseqv2.0_all_chromosomes.fa /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery_gbs_tags.fa.gz > /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery_gbs_tags.sam

 /mnt/wheatdrive/smallgrainslab/gbs_pipeline/dependencies/tassel-5-standalone/run_pipeline.pl -Xmx450G -fork1 -SAMToGBSdbPlugin -i /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery_gbs_tags.sam -db /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery.db -aLen 0 -aProp 0.0 -endPlugin -runfork1

 /mnt/wheatdrive/smallgrainslab/gbs_pipeline/dependencies/tassel-5-standalone/run_pipeline.pl -Xmx450G -fork1 -DiscoverySNPCallerPluginV2 -db /mnt/wheatdrive/smallgrainslab/gbs_pipeline/results/discovery_files/disco_2022/avery_x_CO11D1397_discovery.db -mnMAF 0.01 -mnLCov 0.1 -deleteOldData true -ref /mnt/wheatdrive/smallgrainslab/gbs_pipeline/ref_genos/wheat/refseqv2.0/iwgsc_refseqv2.0_all_chromosomes.fa -endPlugin -runfork1

 

Tell me if you need anything else.

 

Thanks,

Zach

 

Zachary Winn (he/his)

PhD Crop Science

Postdoctoral Fellow

Colorado State University

Fort Collins, CO

Phone: [redacted]

 

SAMToGBSdbPlugin.log
DiscoverySNPCallerPluginV2.log
GBSSeqToTagDBPlugin.log
Screenshot_20221025_035350.png
TagExportToFastqPlugin.log

Lynn Carol Johnson

unread,
Oct 27, 2022, 8:48:35 AM10/27/22
to tas...@googlegroups.com

Hi Zach –

 

We updated the sqlite jar recently and that could be the problem.  We’ve had issues with all the newer versions of this jar causing large gbs queries to hang.  

I have a jar that should work, but I’m unable to send it to you as the server blocks the file.  I’ll try sending this to you in a direct message.

 

Lynn

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/d6551b01-cbf7-4979-aa6b-12a35cb39576n%40googlegroups.com.

Message has been deleted

Zachary Winn

unread,
Oct 27, 2022, 6:10:46 PM10/27/22
to TASSEL - Trait Analysis by Association, Evolution and Linkage

To people who have found this message and have the same problem,

I worked with Lynn to fix this problem. The issue is that the sqlite jar file found in the lib folder of tassel-5-standalone needed to be replaced with the a new .jar file. To fix this issue, request that a .jar file be sent to you and replace the old .jar file in the lib folder found at "./tassel-5-standalone/lib/sqlite-jdbc-3.39.2.1.jar". This will allow the discovery to proceed correctly.

Thanks TASSEL team!

-Zach 

Lynn Carol Johnson

unread,
Oct 28, 2022, 9:02:19 AM10/28/22
to tas...@googlegroups.com

The sqlite jar that fixes this problem can be found at:
https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.8.5-pre1/

This is an old jar, but the only sqlite jar that works for this issue.  We have seen this issue in the past, so previously kept the old sqlite jar.  Recently repository was updated to the new jar to accommodate people who are using Apple with the M1 chip.

 


Date: Thursday, October 27, 2022 at 6:12 PM
To: TASSEL - Trait Analysis by Association, Evolution and Linkage <tas...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.

Mao

unread,
Jul 26, 2023, 3:15:04 PM7/26/23
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,
It seems I may be having the same issue. Which jar script should I download and use to replace the old one?
Could you directly send it to my email?
Thank you
Mao

Lynn Carol Johnson

unread,
Jul 26, 2023, 4:10:07 PM7/26/23
to tas...@googlegroups.com

Mao –

 

What version of TASSEL are you running?  We put in a fix that we hoped would take care of this issue.

 

Lynn

Mao

unread,
Jul 26, 2023, 7:21:23 PM7/26/23
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,
I"m using the "tassel-5-standalone".
Is that already having this jar file fixed?

Mao

Mao

unread,
Jul 26, 2023, 7:25:58 PM7/26/23
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,
I see inside the /lib folder there is this "sqlite-jdbc-3.39.2.1.jar", which is like what Zachary has indicated.
I am using a old lab member's tassel previously being downloaded. So I wonder if this is what's causing the problem where 
      DiscoverySNPCallerPluginV2 step seems to get stuck at "Calling getCutPosForStrand FORWARD strands..."
If so, which jar script I should be downloading to replace this one inside the /lib ?

Thanks

Mao

Terry Casstevens

unread,
Jul 26, 2023, 7:35:23 PM7/26/23
to tas...@googlegroups.com
Please update to the latest TASSEL version.

You can use command "git pull" in the top level directory
> To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/8471c1f5-85b3-49be-938a-e01b1880cacdn%40googlegroups.com.

Muhammad Atif Wahid

unread,
Jul 27, 2023, 8:56:35 AM7/27/23
to tas...@googlegroups.com
I use Tassel 2.1 for SSR Markers

Mao

unread,
Jul 27, 2023, 2:16:51 PM7/27/23
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry et al.
Thanks. 
After updating to the newest tassel version. The discovery step is moving forward smoothly and quickly now.

Cheers
Mao

Muhammad Atif Wahid

unread,
Jul 27, 2023, 3:03:49 PM7/27/23
to tas...@googlegroups.com
Tassel 5 analyze SSR Data for association analysis.

Lynn Carol Johnson

unread,
Jul 28, 2023, 7:22:24 AM7/28/23
to tas...@googlegroups.com

Hi Mao –

 

As Terry suggested, do a “git pull” in your tassel-5-standalone folder to get the latest version.  Note this is not a change to the sqlite jar.  The changes was to the sql query in TASSEL that was slow.  Let us know if you see better results once you’ve updated tassel.  If you send us the log file we can verify the TASSEL load you are running.

 

Lynn

Lynn Carol Johnson

unread,
Jul 28, 2023, 7:23:12 AM7/28/23
to tas...@googlegroups.com

Hi Mao –

 

I didn’t see this message before responding to the previous one.  I’m glad to hear the discovery step is working better now.

 

Thanks - Lynn

Reply all
Reply to author
Forward
0 new messages