Issues with ROH analysis

19 views
Skip to first unread message

Gabriel Garrido

unread,
Jul 9, 2024, 4:26:26 PM (7 days ago) Jul 9
to plink2-users

Hello, I want to perform an ROH analysis on the rat genomic data supplied in the Ensembl database (https://www.ensembl.org/info/data/ftp/index.html?redirect=no). This is to compare with a particular rat strain that's been recently sequenced, to see if there's a change in the distribution of homozygous zones.

The problem I'm having is that when I create the .bim, .bed, and .fam files using the following code:

plink --vcf rattus_norvegicus.vcf --make-bed --out rnor

I get the error:

PLINK v1.90b7.2 64-bit (11 Dec 2023) www.cog-genomics.org/plink/1.9/ (C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3 

Logging to rnor.log. 

Options in effect: --make-bed --out rnor --vcf rattus_norvegicus.vcf 

15733 MB RAM detected; reserving 7866 MB for main workspace. Error: No samples in .vcf file.

I can circumvent this by adding the --allow-no-samples flag, but this produces an empty .fam file, which makes the ROH analysis error out:

plink --bfile rnor --allow-extra-chr --homozyg 
PLINK v1.90b7.2 64-bit (11 Dec 2023) www.cog-genomics.org/plink/1.9/ (C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3 
Logging to plink.log. 
Options in effect:
 --allow-extra-chr 
--bfile rnor 
--homozyg 15733 MB RAM detected; reserving 7866 MB for main workspace. 
9572703 variants loaded from .bim file. 
Error: Nobody in .fam file.

I'm honestly a bit at loss on what to do, the data on the .vcf I downloaded looks exactly like the data on the other datasets I have, on which I can run an ROH analysis with zero problem. As you've probably guessed, I'm a newbie at this, but I haven't been able to find anyone else with the same problem.

Chris Chang

unread,
Jul 9, 2024, 6:57:20 PM (7 days ago) Jul 9
to Gabriel Garrido, plink2-users
Please spell out what you mean by "looks like exactly like the data on the other datasets I have".  Did those other datasets have NO GENOTYPES?

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/6fef9cc5-67b6-41fb-9ff7-12589337a947n%40googlegroups.com.

Gabriel Garrido

unread,
Jul 9, 2024, 7:59:04 PM (7 days ago) Jul 9
to plink2-users
No, I mean that opening it with a text editor yields this sort of data structure:

#CHROM POS ID REF ALT QUAL FILTER INFO
1 37977 rs3318811150 C G . . EVA_4;TSA=SNV

I should've said that the other, ROH compatible datasets, look similar, like this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT *SAMPLE NAME*
1 5929 . GT G 44.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=39.5;QD=22.14;SOR=2.303;ANN=G|intergenic_region|MODIFIER|CHR_START-ENSRNOG00000065394|CHR_START-ENSRNOG00000065394|intergenic_region|CHR_START-ENSRNOG00000065394|||n.5930delT|||||| GT:AD:DP:GQ:PL 1/1:0,2:2:6:56,6,0

I'm thinking I'll have to reconstruct the vcf from the fasta file provided by Ensembl, through GATK and the sort.

Chris Chang

unread,
Jul 10, 2024, 9:58:56 AM (6 days ago) Jul 10
to Gabriel Garrido, plink2-users
Yes, you need to backtrack and generate a VCF that has genotype calls, since your current one does not have them.

Reply all
Reply to author
Forward
0 new messages