Any tips on speeding up output from HDF5?

14 views
Skip to first unread message

P St. Amand

unread,
Jan 16, 2026, 4:09:57 PMJan 16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Any tips on speeding up output from HDF5?

For large projects, outputting a VCF or Hapmap from the HDF5 is quite slow. Does anyone know of ways to speed up this process? Below are examples of how I am creating VCFs and Hapmaps from HDF5 files. Any tips would be greatly appreciated!


/usr/local/bin/tassel-5-standalone/run_pipeline.pl -Xms80g -Xmx190g -fork1 -h5 HDF5/$Study\_productioHapMap_noKO.h5 -filterAlign -filterAlignMinFreq $MAF -filterAlignRemMinor -export ./hapmap/$Study.vcf -exportType VCF -runfork1 > ./logs/VCFFromHDF5.log

/usr/local/bin/tassel-5-standalone/run_pipeline.pl -Xms80g -Xmx190g -h5 HDF5/$Study\_productioHapMap_noKO.h5 -filterAlign -filterAlignMinFreq $MAF -filterAlignRemMinor -export hapmap/$Study.hmp.txt -exportType Hapmap > ./logs/HapmapFromHDF5.log

Brandon Monier

unread,
Jan 16, 2026, 4:19:05 PMJan 16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
A couple of things to try out are:
  • not include allele depth (e.g., -noDepth) if you don't need it,
  • split and export by chromosome (e.g., run_pipeline.pl -Xmx190g -SplitHDF5ByChromosomePlugin -i HDF5/$Study.h5) - this may avoid cache issues that you could experience with exporting a whole genome's worth of variants.
Best,
Brandon M.

P St. Amand

unread,
Jan 20, 2026, 10:31:57 AMJan 20
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Brandon,

I need the depth info, so that is out for me. I did time splitting by chromosome and outputting a VCF. However, once you include the time to split the HDF5 file, output each of the VCFs, and merge the VCF, you don't save a significant amount of time. I saved less than 2% of the original 11 hours for outputting a VCF from the non-split HDF5 file.

Thanks for the idea though!
Paul

Reply all
Reply to author
Forward
0 new messages