Any tips on speeding up output from HDF5?

9 views
Skip to first unread message

P St. Amand

unread,
Jan 16, 2026, 4:09:57 PM (3 days ago) Jan 16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Any tips on speeding up output from HDF5?

For large projects, outputting a VCF or Hapmap from the HDF5 is quite slow. Does anyone know of ways to speed up this process? Below are examples of how I am creating VCFs and Hapmaps from HDF5 files. Any tips would be greatly appreciated!


/usr/local/bin/tassel-5-standalone/run_pipeline.pl -Xms80g -Xmx190g -fork1 -h5 HDF5/$Study\_productioHapMap_noKO.h5 -filterAlign -filterAlignMinFreq $MAF -filterAlignRemMinor -export ./hapmap/$Study.vcf -exportType VCF -runfork1 > ./logs/VCFFromHDF5.log

/usr/local/bin/tassel-5-standalone/run_pipeline.pl -Xms80g -Xmx190g -h5 HDF5/$Study\_productioHapMap_noKO.h5 -filterAlign -filterAlignMinFreq $MAF -filterAlignRemMinor -export hapmap/$Study.hmp.txt -exportType Hapmap > ./logs/HapmapFromHDF5.log

Brandon Monier

unread,
Jan 16, 2026, 4:19:05 PM (3 days ago) Jan 16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
A couple of things to try out are:
  • not include allele depth (e.g., -noDepth) if you don't need it,
  • split and export by chromosome (e.g., run_pipeline.pl -Xmx190g -SplitHDF5ByChromosomePlugin -i HDF5/$Study.h5) - this may avoid cache issues that you could experience with exporting a whole genome's worth of variants.
Best,
Brandon M.
Reply all
Reply to author
Forward
0 new messages