Hostname: <redacted>
Working directory: <redacted>
Start time: Wed Jun 29 21:09:28 2022
Random number seed: 1656562168
385624 MiB RAM detected; reserving 30000 MiB for main workspace.
Using 1 compute thread.
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to <path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.20220627-merge.psam
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.20220627-merge.pgen
+
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.20220627-merge.pvar
.
487409 samples (264247 females, 222956 males, 206 ambiguous; 487409 founders)
loaded from
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.20220627-merge.psam.
Error: Line 1 of
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.20220627-merge.pvar
has fewer tokens than expected.
End time: Thu Jun 30 20:28:07 2022
2
________________________
To get to this point, I used the following code to rename the variants and convert bgen-formatted UKB genetic data to plink2 binary pgen files, adapted from here: https://groups.google.com/g/plink2-users/c/i942_CQaBc4/m/TQM-b8cKEQAJ
________________________
plink2 \
--bgen $scratchpath/ukb22828_c${c}_b0_v3.bgen ref-first \
--sample $scratchpath/ukb22828_b0_v3_s487203.sample \
--make-pgen \
--memory 10000 \
--threads 1 \
--new-id-max-allele-len 100 missing \
--set-all-var-ids @:#\$r,\$a \
--out $scratchpath/ukb.renamed.chr${c}.${outdate}
________________________
I checked the headers of individual chromosome files to ensure that they were more or less in line with the description that you wrote up here: https://github.com/chrchang/plink-ng/blob/master/pgen_spec/pgen_spec.pdf
psam file header (missing paternal & maternal IDs, phenotype information):
#FID IID SEX
pvar header
#CHROM POS ID REF ALT
-----------
I then submitted a merge job, where the file list contained files listed out in .pgen .psam .pvar:
________________________
for chr in `seq 1 1 22`
do
echo "$pgenfiles/ukb.renamed.chr${chr}.${date}.pgen $pgenfiles/ukb.renamed.chr${chr}.${date}.pvar $pgenfiles/ukb.renamed.chr${chr}.${date}.psam" >> $autofile
done
# Merge autosomes list using plink2alpha3.3 (downloaded into home directory)
<path-to-files./plink2_v2.00_3.3/plink2 \
--pmerge-list $autofile \
--make-pgen \
--merge-max-allele-ct 2 \
--threads 1 \
--memory 30000 \
--out $outpath/ukb.autosomes.renamed.merged.${outdate}
_______________________
The output from the merge job also appears to have the same format for the pvar headers, and each individual pvar chromosome file appears to have the correct number of columns as well. We've had some storage issues on the cluster that have lead to some incomplete files, but I believe those issues should have been fixed. Is there an error in the plink code I submitted that is leading to this error?
I've experimented with a truncated set of chromosomes (1-3) to see if the error replicates but was able to get it to run successfully (log file below)
_______________________
PLINK v2.00a3.3LM 64-bit Intel (3 Jun 2022)
Options in effect:
--make-pgen
--memory 30000
--merge-max-allele-ct 2
--out <path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630
--pmerge-list <path_to_file>/merged_autosome_bfile/autosomefiles_pgen_trunc.list
--threads 1
Hostname: <redacted>
Working directory: <path_to_file>/Genetic_Data_Cleaning
Start time: Thu Jun 30 15:46:15 2022
Random number seed: 1656629175
1031989 MiB RAM detected; reserving 30000 MiB for main workspace.
Using 1 compute thread.
--pmerge-list: 3 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630-merge.psam
.
--pmerge-list: 3 .pvar files scanned.
Concatenation job detected.
Concatenating... 22228534/22228534 variants complete.
Results written to
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630-merge.pgen
+
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630-merge.pvar
.
487409 samples (264247 females, 222956 males, 206 ambiguous; 487409 founders)
loaded from
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630-merge.psam.
22228534 variants loaded from
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630-merge.pvar.
Note: No phenotype data present.
Writing
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630.psam
... done.
Writing
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630.pvar
... done.
Writing
<path_to_file>/merged_autosome_bfile/ukb.autosomes.renamed.merged.trunc.20220630.pgen
... done.
_______________________
I'm currently running a version of the merge script that includes flags for specifying irregular input files; I'll update on whether that job is successful once it finishes running. Any help would be appreciated! Feel free to also point me to specific documentation - I've read over the plink2 pages but may have missed something. Thanks for your help!
_______________________
<path-to-software>/plink2_v2.00_3.3/plink2 \
--no-parents \
--no-pheno \
--pmerge-list $autofile \
--make-pgen \
--merge-max-allele-ct 2 \
--threads 1 \
--memory 30000 \
--out $outpath/ukb.autosomes.renamed.merged.${outdate}