A very low-importance question:
I noticed that a PLINK2 version from Nov 2024 correctly complained of duplicate samples IDs when doing a PED/MAP import, but then it went sideways and core-dumped.
I then wondered if current PLINK2 build resolves this. And it does - sort of. No error of any kind is given and a FAM file with duplicate sample IDs is created. Is that intentionally allowed?
Here it is, boiled down.
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1191 $ cat T.map
1 rs12345 0 100000000
1 rs67890 0 111111111
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1192 $ cat T.ped
0 TEST1 0 0 2 0 G G A A
0 TEST1 0 0 2 0 G G A A
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1193 $ $PLINK2BIN --map T.map --ped T.ped --make-bed --out T
PLINK v2.0.0-a.6.2LM 64-bit Intel (24 Nov 2024) cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to T.log.
Options in effect:
--make-bed
--map T.map
--out T
--ped T.ped
Start time: Fri May 30 19:56:15 2025
257237 MiB RAM detected, ~241229 available; reserving 128618 MiB for main
workspace.
Using 1 compute thread.
--pedmap: 2 variants in .map file.
--pedmap: 2 samples present, genotypes extracted to T-temporary.bed.smaj .
Error: Duplicate sample ID "0 TEST1".
*** Error in `/fg/saxenalab/code/PLINK/plink2': corrupted size vs. prev_size: 0x0000000003b27110 ***
======= Backtrace: =========
[0x2897341]
[0x289e7f5]
[0x28a0400]
[0x28a4177]
[0x289a96b]
[0x2898f0e]
[0x28912ef]
[0x521e49]
[0x60f2ea]
[0x41d26e]
[0x2868796]
[0x2868d85]
[0x422029]
======= Memory map: ========
00400000-02aec000 r-xp 00000000 00:51 19155625460 /fg/saxenalab/code/PLINK/plink2
02ceb000-02d1a000 rw-p 026eb000 00:51 19155625460 /fg/saxenalab/code/PLINK/plink2
02d1a000-02dd1000 rw-p 00000000 00:00 0
03b24000-03cc8000 rw-p 00000000 00:00 0 [heap]
2b9bcba91000-2b9bcbcd5000 rw-p 00000000 00:00 0
2b9bcbcd5000-2b9bcbcd6000 ---p 00000000 00:00 0
2b9bcbcd6000-2b9bcbcf7000 rw-p 00000000 00:00 0
2b9bcbd56000-2b9bcbd57000 ---p 00000000 00:00 0
2b9bcbd57000-2b9bcbd77000 rw-p 00000000 00:00 0
2b9bcfcd6000-2bbb366d7000 rw-p 00000000 00:00 0
2bbb38000000-2bbb38033000 rw-p 00000000 00:00 0
2bbb38033000-2bbb3c000000 ---p 00000000 00:00 0
7ffcdf954000-7ffcdf976000 rw-p 00000000 00:00 0 [stack]
7ffcdf9c7000-7ffcdf9c9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1194 $ ./plink2 --map T.map --ped T.ped --make-bed --out T
PLINK v2.0.0-a.5.25LM 64-bit Intel (15 May 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to T.log.
Options in effect:
--make-bed
--map T.map
--out T
--ped T.ped
Start time: Fri May 30 19:56:25 2025
257237 MiB RAM detected, ~241225 available; reserving 128618 MiB for main
workspace.
Using 1 compute thread.
--pedmap: 2 variants in .map file.
--pedmap: 2 samples present, genotypes extracted to T-temporary.bed.smaj .
Transposing sample-major .bed to T-temporary.pgen , and setting major alleles
to provisional-REF.
Pass 1/1: transposing and compressing... done.
Transpose complete.
--pedmap: T-temporary.pgen + T-temporary.pvar + T-temporary.psam written.
.bed.smaj and .fam.tmp temporary files deleted.
2 samples (2 females, 0 males; 2 founders) loaded from T-temporary.psam.
2 variants loaded from T-temporary.pvar.
Note: No phenotype data present.
Writing T.fam ... done.
Writing T.bim ... done.
Writing T.bed ... done.
End time: Fri May 30 19:56:25 2025
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1195 $ cat T.fam
0 TEST1 0 0 2 -9
0 TEST1 0 0 2 -9