samples IDs in plink.mdist.missing file

42 views
Skip to first unread message

Denis

unread,
May 6, 2022, 9:58:19 AM5/6/22
to plink2-users
Hi,

I used "--cluster missing" option in PLINK 1.9, but there are no sample names in the "plink.mdist.missing" file.

According to the documentation : ".cluster3[.missing] files contain one line per sample, with their FID and IID as the first two fields (not merged with an underscore here), followed by a sequence of nonnegative integers representing the sample's cluster assignment at each stage of the clustering process."

Why i have not sample names in my case and how can i add them to the my "plink.mdist.missing" file? I need that to visualize the distances in R environment.

Regards,
Denis

Christopher Chang

unread,
May 6, 2022, 11:11:14 AM5/6/22
to plink2-users
Please post a full .log file, and preferably also the input files used in the run (these could be small, as long as they illustrate the issue you're having).

Denis

unread,
May 6, 2022, 12:36:06 PM5/6/22
to plink2-users
Hi,

Thanks for your reply. The  "plink.mdist.missing" file is a matrix :

".mdist.missing (identity-by-missingness matrix)
Produced by "--cluster missing".

A triangular space-delimited text file with identity-by-missingness coefficients."

But without sample IDs available as a column and row names of the matrix it is not possible to correctly interpret and visualize IBM distances between samples. How can i get the same file, but with sample IDs for further analysis of the matrix?


The log file is:
"
PLINK v1.90b6.26 64-bit (2 Apr 2022)
Options in effect:
  --allow-extra-chr
  --cluster missing
  --keep-allele-order
  --vcf Diploids_GBS.vcf

Hostname: node03
Working directory: /mnt/lustre/tkiy/Diploids_GBS/GBS_pipeline
Start time: Fri May  6 15:44:17 2022

Random number seed: 1651841057
48258 MB RAM detected; reserving 24129 MB for main workspace.
--vcf: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written.
109444 variants loaded from .bim file.
96 people (0 males, 0 females, 96 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using up to 23 threads (change this with --threads).
Before main variant filters, 96 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.476451.
109444 variants and 96 people pass filters and QC.
Note: No phenotypes present.
IBM matrix written to plink.mdist.missing .
Clustering... done.
Cluster solution written to plink.cluster1 , plink.cluster2 , and
plink.cluster3.missing .
"
Please let me know if my VCF file is still needed to replicate the issue. I'll try to prepare a toy example of my VCF for sending in that case.
Regards,
Denis

пятница, 6 мая 2022 г. в 18:11:14 UTC+3, chrch...@gmail.com:

Christopher Chang

unread,
May 6, 2022, 12:44:44 PM5/6/22
to plink2-users
Oh, this isn't built into plink 1.x, sorry.  You should use a short shell script that reads the sample IDs from the .fam file and attaches them to the .mdist.missing file in the manner you prefer.

Denis

unread,
May 6, 2022, 12:56:15 PM5/6/22
to plink2-users
Thank you! I'm wondering if i'd get a correct matrix (with column and row names) using PLINK 1.07?

Regards,
Denis

пятница, 6 мая 2022 г. в 19:44:44 UTC+3, chrch...@gmail.com:

Christopher Chang

unread,
May 6, 2022, 12:58:46 PM5/6/22
to plink2-users
"plink 1.x" means plink 1.07 has the same behavior.

Christopher Chang

unread,
May 6, 2022, 1:01:12 PM5/6/22
to plink2-users
(Note that, because of plink 1.x's two-part sample IDs, there is no obvious way to define what you're calling a "correct matrix".)

On Friday, May 6, 2022 at 9:56:15 AM UTC-7 Denis wrote:

Denis

unread,
May 6, 2022, 1:39:59 PM5/6/22
to plink2-users
Is the IDs order in the "plink.cluster3.missing" ( the first two fields ) and  "plink.mdist.missing" files identical? May i just coppy these and paste to the  "plink.mdist.missing" file? Could you confirm that please?

Regards,
Denis

пятница, 6 мая 2022 г. в 20:01:12 UTC+3, chrch...@gmail.com:

Christopher Chang

unread,
May 6, 2022, 1:43:30 PM5/6/22
to plink2-users
Yes, the order is the same.

Denis

unread,
May 6, 2022, 1:45:24 PM5/6/22
to plink2-users
Thank you so much for the clarification!

Best,
Denis

пятница, 6 мая 2022 г. в 20:43:30 UTC+3, chrch...@gmail.com:
Reply all
Reply to author
Forward
0 new messages