Hi Chris
I want to use --ref-allele to create a .pgen file with a new set of reference alleles and use pgenlib in Python to read the data. I directly generated the .pgen file using
plink2 \
--pfile /n/scratch3/users/j/jz286/imp_geno/ukb_imp_chr${CHROM}_v3\
--ref-allele ${REF_FILE} 10 3\
--make-pgen \
--out ${OUTPUT_PATH}/ukb_imp_chr${CHROM}_v3_aa
and read the .pgen file in python using
with pg.PgenReader(bytes(pgen_file, encoding="utf8")) as reader:
mat_X = np.empty([10, n_sample], np.int8)
reader.read_range(0, 10, mat_X)
I received the following error message in Python:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-5-bf3d03963356> in <module>
1 # Directly using --ref-allele is not compatible with pgenlib
2 pgen_file = '/n/scratch3/users/j/jz286/imp_geno_aa.direct/ukb_imp_chr22_v3_aa.pgen'
----> 3 with pg.PgenReader(bytes(pgen_file, encoding="utf8")) as reader:
4 mat_X = np.empty([10, n_sample], np.int8)
5 reader.read_range(0, 10, mat_X)
pgenlib.pyx in pgenlib.PgenReader.__cinit__()
AssertionError:
Alternatively, if I first generate a .bed file (instead of .pgen file) when calling --ref-allele, and then convert the .bed file to .pgen file, the pgenlib in Python will not produce such an error.
Thanks a lot!