EIGENSOFT conversion

370 views
Skip to first unread message

Nancy Parmalee

unread,
Feb 20, 2020, 11:34:41 AM2/20/20
to plink2-users
Hi Chris,

I have some data I was given in EIGNESOFT format. I need to merge it with my cohort data for analysis. My data is in plink format. I ran CONVERTF to convert the data I was given into PED format but it isn't exactly compatible yet. The .pedsnp (.map) file looks like this, which is fine:

1     rs3094315     0.020130       752566 G A
1     rs7419119     0.022518       842013 T G
1    rs13302957     0.024116       891021 G A
1     rs6696609     0.024457       903426 C T
1        rs8997     0.025727       949654 A G
1     rs9442372     0.026288      1018704 A G
1   rs147606383     0.026665      1045331 G A
1     rs4970405     0.026674      1048955 A G
1    rs11807848     0.026711      1061166 T C
1     rs4970421     0.028311      1108637 G A

The first line of the .ped output from CONVERTF looks like this:

1      sampleID 0 0 1 1  3 3  4 4  1 1  4 4  1 1  3 3  3 3  1 1  4 4  1 3  1 1  2 4  4 4  3 3  2 2  2 2  1 1  2 4

I see you are also an author on EIGENSOFT. I hope you can give me some advice. Is there a recode option in plink that can handle the alleles being designated 1, 2, 3, 4? My aim is to make this compatible with my data and merge.

Also, I will have to do some text editing since the first 6 columns aren't in ped format. the .pedind file looks like:

1      SampleID 0 0 1 1

From the CONVERTF documentation I believe only FID and IID are preserved in the .ped output but I wonder if you know for sure whether columns 3 and 4 in the .ped are genotypes, or if they come from the .ind file.

Thanks for all your support..

Christopher Chang

unread,
Feb 20, 2020, 11:41:29 AM2/20/20
to plink2-users
1. The --alleleACGT flag (https://www.cog-genomics.org/plink/1.9/data#update_map ) should do what you want.
2. ".pedind" is just a renamed .fam file.  Columns 3 and 4 are parental IIDs, where '0' is treated as unknown.

Nancy Parmalee

unread,
Feb 20, 2020, 11:54:27 AM2/20/20
to Christopher Chang, plink2-users
Thanks for the very quick response. This is incredibly helpful.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/2a4801fb-cd8b-4c36-a062-03f5cd52e6c4%40googlegroups.com.

Nancy Parmalee

unread,
Feb 20, 2020, 3:19:24 PM2/20/20
to Christopher Chang, plink2-users
Hi Chris,

As I'm working with the data I realize I have another issue. I don't know if you can help, but if so it would be greatly appreciated.

I ran CONVERTF with 7744 individuals. It ran with no errors, I'm pasting the log below. In the output, the .pedind file still has 7744 lines but the .ped file (using wc) has only 2705 lines. I expect one line per individual. Do you have any idea what could be going on?

Thanks,
Nancy

...

parameter file: par.EIGENSTRAT.PED
genotypename: input.geno
snpname: input.snp
indivname: input.ind
outputformat: PED
genotypeoutname: output.ped
snpoutname: output.pedsnp
indivoutname: output.pedind
read 1073741824 bytes
read 1156901328 bytes
packed geno read OK
numvalidind:   7744  maxmiss: 7744001
ped output
##end of convertf run

Christopher Chang

unread,
Feb 21, 2020, 3:48:03 PM2/21/20
to plink2-users
Nope, would need more information to make a guess.


On Thursday, February 20, 2020 at 12:19:24 PM UTC-8, Nancy Parmalee wrote:
Hi Chris,

As I'm working with the data I realize I have another issue. I don't know if you can help, but if so it would be greatly appreciated.

I ran CONVERTF with 7744 individuals. It ran with no errors, I'm pasting the log below. In the output, the .pedind file still has 7744 lines but the .ped file (using wc) has only 2705 lines. I expect one line per individual. Do you have any idea what could be going on?

Thanks,
Nancy

...

parameter file: par.EIGENSTRAT.PED
genotypename: input.geno
snpname: input.snp
indivname: input.ind
outputformat: PED
genotypeoutname: output.ped
snpoutname: output.pedsnp
indivoutname: output.pedind
read 1073741824 bytes
read 1156901328 bytes
packed geno read OK
numvalidind:   7744  maxmiss: 7744001
ped output
##end of convertf run
On Thu, Feb 20, 2020 at 10:54 AM Nancy Parmalee wrote:
Thanks for the very quick response. This is incredibly helpful.

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages