Can we add additional columns to ped and map files in PLINK?

531 views
Skip to first unread message

Nirodha Epasinghe

unread,
Mar 31, 2021, 4:50:46 PM3/31/21
to plink2-users

Hello,

I need to know whether we can add additional columns to PLINK file formats. Specifically, I am trying to create PLINK file formats, by using my simulated data. I need to know that in a .ped file format in PLINK, can we add additional columns (ex: a column represent age of individuals) after the first six mandatory columns? I need to store additional information which is important for further studies. Also, can we do this to map files as well? (ex: In map file can we store additional details about SNPs, like selection coefficients, dominance coefficients etc.).

If these are not possible in PLINK's standard file formats do we have an option to keep these information in PLINK file format?

Thank you very much for your time.

Nirodha

Christopher Chang

unread,
Mar 31, 2021, 5:48:24 PM3/31/21
to plink2-users

No, you can't do this in the .ped + .map file format, which has been obsolete for close to a decade. You can with PLINK 2.0's file format; see https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 for an example where the sample-information file (.psam) has extra "Population" and "SuperPop" columns, and the variant-information file (.pvar) has per-variant annotations in the INFO column.

Nirodha Epasinghe

unread,
Mar 31, 2021, 6:02:55 PM3/31/21
to plink2-users
Hi , 

Thank you very much for this. Yes, I got this answer. However, can we create these psam and pvar files ourselves? That means can we create them as text files ? After we create them how PLINK uses use these files?

Regards,
Nirodha.

Christopher Chang

unread,
Mar 31, 2021, 6:18:36 PM3/31/21
to plink2-users
See the file formats page.  .psam and .pvar are text formats that you can edit yourself; just avoid adding (non-comment) lines outside of plink2.

Nirodha Epasinghe

unread,
Mar 31, 2021, 7:51:13 PM3/31/21
to plink2-users
Hi,

Thank you very much for this. I went through this page. I think that, for .psam and .pvar formats we need to use binary file format for genotype data. I can not understand how we get .pvar and .psam files. Can we provide them as text files?. My main interest is that, the information I do have from my simulation study need to be stored in an informative way. I decided to use PLINK file format. But according to PLINK2 how do I create these file formats? Can I add them juts to a text file? I really appreciate your help.

Regards,
Nirodha.

Christopher Chang

unread,
Mar 31, 2021, 7:57:01 PM3/31/21
to plink2-users
You can provide any input format listed under https://www.cog-genomics.org/plink/2.0/input , and plink2's --make-pgen command will convert it to a .pgen + .pvar + .psam fileset.  Then you can add the resulting .pvar and/or .psam.

One of the supported input formats is plink 1.x's .bed + .bim + .fam, so for any input format (such as .ped + .map) which isn't yet supported by plink2, you can use plink 1.9 to convert to .bed + .bim + .fam first.

Christopher Chang

unread,
Mar 31, 2021, 7:57:57 PM3/31/21
to plink2-users
"Then you can add the resulting .pvar and/or .psam" should read "Then you can add your columns to the resulting .pvar and/or .psam".

Nirodha Epasinghe

unread,
Mar 31, 2021, 8:17:49 PM3/31/21
to Christopher Chang, plink2-users
Hi,

Yes, I understood that ped and map file formats are the input formats that support PLINK. My problem was in these file formats we can not add additional details. Ex. in ped files we can not add sampled individuals age as one column. And in map files we can not store additional information of markers like dominance coefficient etc. I was wondering in PLINK2 .psam and pvar file formats are created by user or PLINK it self. If it creates itself then can we add these information? 

Thank you very much for your time,
Nirodha

Sent from my iPhone

On Mar 31, 2021, at 4:57 PM, Christopher Chang <chrch...@gmail.com> wrote:

"Then you can add the resulting .pvar and/or .psam" should read "Then you can add your columns to the resulting .pvar and/or .psam".
--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/323d779c-3a57-4253-af73-0c79e9ce59dbn%40googlegroups.com.

Christopher Chang

unread,
Mar 31, 2021, 8:25:11 PM3/31/21
to plink2-users
plink2 --make-pgen can create (.pgen and) .pvar and .psam files from all its supported input formats; then you can edit them.

Nirodha Epasinghe

unread,
Apr 1, 2021, 8:02:15 PM4/1/21
to plink2-users
Hi,

I followed your guidelines and I created .psam and .pvar files using PLINK2. Now, when we edit them do we have any plink commands to follow or do we need to edit the text file ourselves? (Just add our interest columns to these text files. )Also, can we add any information to .psam and .pvar files?

Regards,
Nirodha.

Nirodha Epasinghe

unread,
Apr 6, 2021, 3:20:33 PM4/6/21
to Christopher Chang, plink2-users
Hi Christopher,

I am still struggling with how we can add these additional columns to .psam and .pvar files. As an example how can I add an INFO field into a .pvar file. Can you help me on this? Thank you very much.

Regards,
Nirodha.

Christopher Chang

unread,
Apr 6, 2021, 3:23:35 PM4/6/21
to plink2-users
You just... add a column?...  Please elaborate on what you're having trouble with.

You can see an example .pvar file with an INFO column, and an example .psam file with two categorical-phenotype columns, at https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 .

Nirodha Epasinghe

unread,
Apr 6, 2021, 3:32:13 PM4/6/21
to Christopher Chang, plink2-users
Hi,

I created .pvar and .psam files. But in these files I do not have a column named INFO in .pvar file and AGE column in .psam file. I need to add these columns into these two files. I saw these examples. How can I get these columns in the two files. Do I have any commands in PLINK for that or can I just add the columns manually ?

Regards,
Nirodha. 

Christopher Chang

unread,
Apr 6, 2021, 3:37:59 PM4/6/21
to plink2-users
It's possible to use plink2 to add a phenotype/covariate column to the .psam, but that isn't really any easier than adding the column yourself.  In both cases you're just updating a tab-delimited file.

plink2 does not have a built-in command to add to the INFO column; it's normally imported from a VCF file.  Refer to the VCF specification for how the INFO column is organized, and what header lines should go with it.

Nirodha Epasinghe

unread,
Apr 6, 2021, 3:51:29 PM4/6/21
to Christopher Chang, plink2-users
Thank you for this. That means, for .pvar adding an INFO column also should be done manually right ? We need to create it according to the VCF file format and then add it by ourselves. 

Nirodha.

Christopher Chang

unread,
Apr 7, 2021, 12:00:30 PM4/7/21
to plink2-users
For your use case, it may be simplest to add the column manually.  An alternative is to export to VCF/BCF, use "bcftools annotate", and then convert the result back to plink2-format.

HugoH

unread,
Apr 28, 2021, 8:07:03 AM4/28/21
to plink2-users
Hi,
" for any input format (such as .ped + .map) which isn't yet supported by plink2, you can use plink 1.9 to convert to .bed + .bim + .fam first. "
What if the input file includes multiallelic variants which can not be converted to .bim?
In brief,if I have genotyping files with multiallelic variants(such as .vcf) and pedigree information (such as .fam),is there any way to convert these data to .pgen+.psam+.pvar?

Christopher Chang

unread,
Apr 28, 2021, 12:27:56 PM4/28/21
to plink2-users
From the plink2 --vcf documentation: "You can combine [--vcf] with --fam/--psam. If you do, PLINK 2 will verify the sample IDs match and appear in the same order in the two files, and the sample information will be loaded. (It may be necessary to use --double-id/--const-fid/--id-delim to get the IDs to match.)"

HugoH

unread,
Apr 29, 2021, 5:18:08 AM4/29/21
to plink2-users
 --vcf combined with --fam was tried but reported an error as shown below:
Snipaste_2021-04-29_17-17-03.png
However,ex.bed/.bim/.fam and ex.vcf works fine seperately as shown below:
Snipaste_2021-04-29_17-09-28.pngSnipaste_2021-04-29_17-11-55.png
I don't know what went wrong.
Thanks. 

Chris Chang

unread,
Apr 29, 2021, 11:07:59 AM4/29/21
to HugoH, plink2-users
--fam requires a full file name ("ex.fam").

On Thu, Apr 29, 2021 at 8:07 AM Chris Chang <chrch...@gmail.com> wrote:
--fam requires a full file name ("ex.fam").

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages