Example conversion PLINK to Merlin

402 views
Skip to first unread message

Yann

unread,
Aug 27, 2018, 4:24:30 AM8/27/18
to Mega2
Hi all,

Could you provide an example of code/files/command line to convert files with PLINK format (.bed/.bim/.fam or .ped/.map) to Merlin format ? I also can input an additional file (SOLAR format) with pedigree information to be taken into account by Merlin for association analysis in related individuals ?

I have read the documentation

It seems converting format is a basic feature of mega2. However, I am not very sure where to look for

mega2_v5.0.0_src/example$ ls
bed.bed    frequency.annotated map.annotated        MEGA2.BATCH.bed    MEGA2.BATCH.preannotated  ped.frequency    pedin.preannotated  penetrance.annotated studyBCF2.bcf.csi  study.pen
bed.bim    go.sh map.ex        MEGA2.BATCH.impute  MEGA2.BATCH.vcf      pedin.05       ped.map   study1.6.chr06.bcf study.bcf.bcfidx   study.phe
bed.fam    impute.impute map.preannotated       MEGA2.BATCH.ped    names.annotated      pedin.annotated  ped.ped   study1.6.map study.fam    study.vcf
datain.05  impute.sample MEGA2.BATCH.annotated  MEGA2.BATCH.post    names.preannotated      pedin.ex       ped.penetrance   study.bcf study.freq    study.vcf.vcfidx
datain.ex  map.05 MEGA2.BATCH.bcf        MEGA2.BATCH.pre    omit.ex      pedin.pre.05     ped.phe   studyBCF2.bcf study.map


Best, Yann

Dan Weeks

unread,
Aug 28, 2018, 9:44:52 AM8/28/18
to mega2...@googlegroups.com
OK, attached is an example interactive run that first reads in the bed.bim, bed.fam, bed.bed example PLINK files into a Mega2 database, and then, in the second Mega2 run, reads from that database and exports the data in Merlin format.   Does this answer your question?

Mega2 does not accept SOLAR format data as input, but the structure of the pedigree can be similarly encoded in the 'bed.fam' PLINK family file.

Hope that helps,
-- Dan -- 


example_run.txt

Dan Weeks

unread,
Aug 28, 2018, 9:48:07 AM8/28/18
to mega2...@googlegroups.com
The test of the example run was truncated in the last post when I pasted it in, so in the attached file 'example_run.txt', you should be able to see the complete run.  Again, this is an example interactive run that first reads in the bed.bim, bed.fam, bed.bed example PLINK files into a Mega2 database, and then, in the second Mega2 run, reads from that database and exports the data in Merlin format. 

-- Dan -- 
example_run.txt

Yann

unread,
Aug 29, 2018, 10:42:56 AM8/29/18
to Mega2
Thank you for your detailed answer. I start understanding how mega2 works. Not exactly as I would have expected.

I followed the procedure until the input files (PLINK format are read). However I get the following errors (see below):

Input Format: PLINK binary PED format (bed)
Pedigree and map files specified as PLINK format.
omit, penetrance, and frequency files are always in Mega2 format.
Input files will be read in as PLINK or Mega2 format files as appropriate.
Reading PLINK map file for names: MY_PATH/autosomes.bim


===== Errors/warnings of type "read_names": 
ERROR: Line 2029912: Found fewer than 2 columns (at least 2 required).
ERROR: Line 2087807: Found fewer than 2 columns (at least 2 required).
ERROR: Line 2087808: Found fewer than 2 columns (at least 2 required).
ERROR: Line 3817214: Found fewer than 2 columns (at least 2 required).
ERROR: Line 4326159: Found fewer than 2 columns (at least 2 required).
ERROR: Line 4596305: Found fewer than 2 columns (at least 2 required).
ERROR: Line 6883409: Found fewer than 2 columns (at least 2 required).
ERROR: Line 8097321: Found fewer than 2 columns (at least 2 required).
ERROR: Line 12143854: Found fewer than 2 columns (at least 2 required).
ERROR: Line 15120021: Found fewer than 2 columns (at least 2 required).
===== Too many "read_names" records, display is temporarily suspended ..
===== 24 total records of type "read_names" are in MEGA2.ERR

Found lines with less than 2 columns, see MEGA2.ERR for details.
ERROR: Found fatal errors in locus file.
ERROR: Please fix errors and restart.
ERROR: mrecode.cpp:3001 Mega2 terminated. Error "INPUT_DATA_ERROR" (#4).
===========================================================


I looked into the .bim file at the mentioned lines
/mega2_v5.0.0_src/example$ sed -n 2029912p MY_PATH/autosomes.bim
1 1:240676261_G_C 0 240676261 G C
mega2_v5.0.0_src/example$ sed -n 2087807p MY_PATH/autosomes.bim
1 1:246174604_T_G 0 246174604 G T
mega2_v5.0.0_src/example$ sed -n 2087808p MY_PATH/autosomes.bim
1 1:246174989_TAGG_T 0 246174989 T TAGG
/mega2_v5.0.0_src/example$ sed -n 3817214p MY_PATH/autosomes.bim
2 2:180341915_C_T 0 180341915 T C


Not sure why these errors occur for these lines, while they are pretty much the same as others. Any ideas ?

Dan Weeks

unread,
Jan 24, 2019, 1:27:33 PM1/24/19
to Mega2
[ This query to the Mega2 Google Group was answered offline a while ago, but for completeness we are addressing it here.]

Thank you for sharing more of your input.  We believe the reason Mega2 is generating these errors is that your marker names are longer than it can handle.

The code does an “fgets" into a FILENAME_LENGTH buffer (255 bytes).  If for example, the marker name is 191 bytes, the whole line is >>> 255 bytes so the “gets” will stop reading somewhere in the second allele.  That rest of that allele will be a one token line (i.e. fewer than 2 columns). 

While we will work to make Mega2 more robust to this, in the meantime, I think you could get Mega2 to process your data if you either use shorter marker names (e.g., just the chr:position without the allele names or rsIDs), or you set the FILENAME_LENGTH buffer size to be much larger (e.g., larger than the length of the line containing your very longest marker name) and recompile Mega2 (we have not yet tested this second option).

Using shorter marker names would also have the advantage of making the output generated by your target analysis programs more readable.

-----
P.S. The current release of Mega2 is robust to this.
Reply all
Reply to author
Forward
0 new messages