Should duplicate samples be allowed?

21 views
Skip to first unread message

Matthew Maher

unread,
May 30, 2025, 4:04:31 PM5/30/25
to plink2-users
A very low-importance question: 

I noticed that a PLINK2 version from Nov 2024 correctly complained of duplicate samples IDs when doing a PED/MAP import, but then it went sideways and core-dumped.  
I then wondered if current PLINK2 build resolves this.  And it does - sort of.   No error of any kind is given and a FAM file with duplicate sample IDs is created.  Is that intentionally allowed?

Here it is, boiled down. 

(base) -bash:uger-d046:~/MattM/PLINK_experiment 1191 $ cat T.map
1 rs12345 0 100000000
1 rs67890 0 111111111
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1192 $
cat T.ped
0 TEST1 0 0 2 0 G G A A
0 TEST1 0 0 2 0 G G A A
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1193 $
$PLINK2BIN --map T.map --ped T.ped --make-bed --out T
PLINK v2.0.0-a.6.2LM 64-bit Intel (24 Nov 2024)    cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to T.log.
Options in effect:
  --make-bed
  --map T.map
  --out T
  --ped T.ped

Start time: Fri May 30 19:56:15 2025
257237 MiB RAM detected, ~241229 available; reserving 128618 MiB for main
workspace.
Using 1 compute thread.
--pedmap: 2 variants in .map file.
--pedmap: 2 samples present, genotypes extracted to T-temporary.bed.smaj .
Error: Duplicate sample ID "0 TEST1".
*** Error in `/fg/saxenalab/code/PLINK/plink2': corrupted size vs. prev_size: 0x0000000003b27110 ***
======= Backtrace: =========
[0x2897341]
[0x289e7f5]
[0x28a0400]
[0x28a4177]
[0x289a96b]
[0x2898f0e]
[0x28912ef]
[0x521e49]
[0x60f2ea]
[0x41d26e]
[0x2868796]
[0x2868d85]
[0x422029]
======= Memory map: ========
00400000-02aec000 r-xp 00000000 00:51 19155625460                        /fg/saxenalab/code/PLINK/plink2
02ceb000-02d1a000 rw-p 026eb000 00:51 19155625460                        /fg/saxenalab/code/PLINK/plink2
02d1a000-02dd1000 rw-p 00000000 00:00 0
03b24000-03cc8000 rw-p 00000000 00:00 0                                  [heap]
2b9bcba91000-2b9bcbcd5000 rw-p 00000000 00:00 0
2b9bcbcd5000-2b9bcbcd6000 ---p 00000000 00:00 0
2b9bcbcd6000-2b9bcbcf7000 rw-p 00000000 00:00 0
2b9bcbd56000-2b9bcbd57000 ---p 00000000 00:00 0
2b9bcbd57000-2b9bcbd77000 rw-p 00000000 00:00 0
2b9bcfcd6000-2bbb366d7000 rw-p 00000000 00:00 0
2bbb38000000-2bbb38033000 rw-p 00000000 00:00 0
2bbb38033000-2bbb3c000000 ---p 00000000 00:00 0
7ffcdf954000-7ffcdf976000 rw-p 00000000 00:00 0                          [stack]
7ffcdf9c7000-7ffcdf9c9000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted (core dumped)
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1194 $
./plink2 --map T.map --ped T.ped --make-bed --out T
PLINK v2.0.0-a.5.25LM 64-bit Intel (15 May 2025)   cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to T.log.
Options in effect:
  --make-bed
  --map T.map
  --out T
  --ped T.ped

Start time: Fri May 30 19:56:25 2025
257237 MiB RAM detected, ~241225 available; reserving 128618 MiB for main
workspace.
Using 1 compute thread.
--pedmap: 2 variants in .map file.
--pedmap: 2 samples present, genotypes extracted to T-temporary.bed.smaj .
Transposing sample-major .bed to T-temporary.pgen , and setting major alleles
to provisional-REF.
Pass 1/1: transposing and compressing... done.
Transpose complete.
--pedmap: T-temporary.pgen + T-temporary.pvar + T-temporary.psam written.
.bed.smaj and .fam.tmp temporary files deleted.
2 samples (2 females, 0 males; 2 founders) loaded from T-temporary.psam.
2 variants loaded from T-temporary.pvar.
Note: No phenotype data present.
Writing T.fam ... done.
Writing T.bim ... done.
Writing T.bed ... done.
End time: Fri May 30 19:56:25 2025
(base) -bash:uger-d046:~/MattM/PLINK_experiment 1195 $
cat T.fam
0 TEST1 0 0 2 -9
0 TEST1 0 0 2 -9

Chris Chang

unread,
May 30, 2025, 6:07:34 PM5/30/25
to Matthew Maher, plink2-users
The post-error-message segfault is a bug, and is fixed in today's build; thanks for reporting it.

a5.25 does not error out because it's still an alpha 5 build; the sample ID sanity check was introduced with alpha 6.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/8eebef58-d275-4cea-867b-59a32877e4a4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages