Different vcf.gz sizes between different runs

20 views
Skip to first unread message

Sajal Sthapit

unread,
Feb 1, 2023, 8:41:00 AM2/1/23
to STITCH imputation
Hi Robbie,

I was running STITCH with different combinations of cores and RAM per core (256Gb total in all cases) on 1436 samples and 200,000 variants to find a good optimum for my use case in terms of duration as well as job wait time in the SLURM queue. Here is a summary table of the results. One this that was unexpected was the variation in the size of the vcf.gz output files (510-522Mb). Is this expected because some dosage values and genotypes calls are going to be different in different runs of STITCH? Thank you for your help.

Cores    RAM/core (Gb)    Hours    vcf.gz size (Mb)    Total RAM used (Gb)
1    256    5.74    522    66
2    128    3.91    515    100
4    64    2.57    521    119
8    32    1.90    521    122
16    16    1.42    517    142
32    8    1.30    516    236
64    4    NA    NA    256
128    2    2.22    510    187

Robbie Davies

unread,
Feb 1, 2023, 10:07:05 AM2/1/23
to Sajal Sthapit, STITCH imputation
Hi,

I think that's a plausible idea. The variation (SD~=5 Mb) is about 1% of the file size so that seems about right. 

Note that as nCores increases, some of the memory is shared and some is core specific, hence the total RAM going up.

Best,
Robbie

--
You received this message because you are subscribed to the Google Groups "STITCH imputation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stitch-imputat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stitch-imputation/b7da2c59-0752-4d6b-bb43-3e2f8b96cdb0n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages