Mismatched IDs

Skip to first unread message

Edward Blake

Jul 15, 2024, 9:46:29 AM (yesterday) Jul 15
to plink2-users

I have been stuck on this problem for a couple of days.

So I have 1000+ manta vcf files that were merged with jasmine, note these are structural variants not SNPs.

I used a function to extract all the IIDs then created a .psam file with the required indices and correct corresponding data assigned to each IID:
#FID IID SEX Phenotype Age

I then did many checks to see if anything was mismatched (nothing is).

After I created a Keepfile to analysis around 75% of the entire .psam file.

Then processed to run the following:

# Hardcoded paths to the files

# Create the output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Convert VCF to PLINK2 format with sex information and handling pseudoautosomal regions
plink2 --vcf $VCF_FILE \
--make-pgen \
--psam $PSAM_FILE \
--allow-extra-chr \
--vcf-half-call m \
--split-par hg38 \
--keep $KEEP_FILE

echo "Conversion complete. Output files are located at ${OUTPUT_PREFIX}"


PLINK v2.00a5.12LM AVX2 Intel (25 Jun 2024)    www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /homes/eblak01/rtest/sv_pipeline/input/jas/output.log.
Options in effect:
  --keep /homes/eblak01/rtest/sv_pipeline/input/jas/all_SVs_VCF-CaCon.psam
  --out /homes/eblak01/rtest/sv_pipeline/input/jas/output
  --psam /homes/eblak01/rtest/sv_pipeline/input/jas/all_SVs_VCF.psam
  --split-par hg38
  --vcf /SV_VCF/manta_results/manta_samples_merge/statistics/jasmine_merged_noreplicates_HLA_fixed_strands_50bp_sorted.vcf
  --vcf-half-call m

Start time: Mon Jul 15 07:36:34 2024
772687 MiB RAM detected, ~568796 available; reserving 386343 MiB for main
Using up to 80 threads (change this with --threads).
Error: Mismatched IDs between --vcf file and
End time: Mon Jul 15 07:36:34 2024
Conversion complete. Output files are located at /homes/eblak01/rtest/sv_pipeline/input/jas/output

This method works perfectly for my SNP vcf merge. 

But for some reason no matter how much I try there always exist a mismatch for my manta vcf file (merged with jasmine).  Please note, I have also tried the method using SURVIVOR merged manta vcf files and still a mismatched occurs.

Could this problem be due to the fact that I am using Structural Variants instead of SNPs?

Is there some function that can give me a more detailed explanation on why it is mismatched?


Christopher Chang

Jul 15, 2024, 8:46:38 PM (19 hours ago) Jul 15
to plink2-users
What are the first 10 sample IDs in the VCF file, and the first 11 lines of the .psam?

Edward Blake

3:34 AM (13 hours ago) 3:34 AM
to Christopher Chang, plink2-users
First 10 IDs from my VCF:
(stats) eblak01@targetid-01:/scripts$ bcftools query -l jasmine_merged_noreplicates_HLA_fixed_strands_50bp_sorted.vcf

Head of the .psam file (note its been reordered via IID):
(stats) eblak01@targetid-01:/scripts$ head all_SVs_VCF.psam
#FID IID SEX Phenotype Age
5 D1 1 3 40
2 D10 1 3 43
35 D100 2 3 67
D1000 D1000 1 1 34
D1001 D1001 2 1 -9
D1002 D1002 1 1 -9
D1003 D1003 1 1 32
D1004 D1004 1 2 63
D1005 D1005 1 1 47

The Located 10 IDs from the VCF:
(stats) eblak01@targetid-01:/scripts$  awk 'NR==1 || /D799|D354|D706|D773|D321|D474|D401|D868|D108|D933/' all_SVs_VCF.psam

#FID IID SEX Phenotype Age
53 D108 1 1 33
D1080 D1080 1 1 53
D1081 D1081 1 1 -9
D1084 D1084 1 1 66
D1085 D1085 1 1 52
D1086 D1086 1 1 43
D321 D321 2 1 28
D354 D354 1 1 65
D401 D401 1 2 72
D474 D474 1 2 61
D706 D706 1 2 36
68 D773 1 3 52
D799 D799 1 2 57
D868 D868 2 1 -9
D933 D933 2 1 23

On Tue, Jul 16, 2024 at 2:46 AM Christopher Chang <chrch...@gmail.com> wrote:
What are the first 10 sample IDs in the VCF file, and the first 11 lines of the .psam?

You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/ec4bcfe3-a2f0-45cc-be32-fc7552f53939n%40googlegroups.com.
Reply all
Reply to author
0 new messages