Hello,
I have some PRS weights (calculated using PRScs, a single txt file for all chromosomes) as well as .bgen-files (one per chromosome) of a study population and wanted to calculate the resulting PRS. I am unsure if I did it correctly though, specifically when combining the results obtained per chromosome.
I did the following:
I called plink2 with the --bgen and --score options for each of my bgen files using a command like so:
for i in range(1,23):
cmd = f"""plink2 --bgen [path]/bgen_file_{i}.bgen 'ref-first' --rm-dup 'exclude-all' --oxford-single-chr {i} --score [path]prs_coefficients.txt 2 4 6 --out [output-path]"""
!{cmd}
As output, I obtained 22 .sscore-files with 4 columns:
#IID, ALLELE_CT, NAMED_ALLELE_DOSAGE_SUM, SCORE1_AVG
Here, I was a bit unsure, as in the documentation at
https://www.cog-genomics.org/plink/2.0/formats#sscore there were more columns listed.
Anyway, in undisplayed R code, I multiplied the SCORE1_AVG with ALLELE_CT to get non-averaged sums per study participant and chromosome, which I then simply added to obtain the PRS values for the participants. (Simply adding the averages would result in a bias towards SNPs on smaller chromosomes, I think.)
Is my thinking here correct? Is "ALLELE_CT" the denominator for the average (and if not, what is)?
I also tried using PLINK1.9, but as per the answer here
https://groups.google.com/g/plink2-users/c/iaQn0AC-7SU I think it is not suited for .bgen-files. I also saw that there is a --score-list option, but as I understand the documentation, it is used when one has multiple weight/score-files, not multiple genotype files, correct?
Best,
Leon Hendrian