Blast error

44 views
Skip to first unread message

Harry Andrews

unread,
Sep 8, 2025, 10:06:47 PM9/8/25
to dartR
Hi, I'm trying to blast my SNPs against a reference chromosome, but every time I try to run the command I get this error

Command line argument error: Argument "query". File is not accessible:  `/var/folders/t9/kykj7p990rl8yzmskgts36kw0000gn/T//RtmpmLapFB/fasta.input'
Error in if (file_size > 0) { : missing value where TRUE/FALSE needed

My reference .fasta files are in my working directory/project folder

Jose Luis Mijangos

unread,
Sep 10, 2025, 7:44:10 PM9/10/25
to dartR

Hi Harry,

To map your SNPs (dartseq data) to the dolphin reference genome and check where they fall along the chromosomes and whether they’re in coding or non-coding regions, you’ll need to download the following files:

- Latest reference genome: GCF_011762595.2_mTurTru1.mat.Y_genomic.fna.gz
- GFF file: GCF_011762595.2_mTurTru1.mat.Y_genomic.gff.gz

You can find both files here:

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/011/762/595/GCF_011762595.2_mTurTru1.mat.Y/ 

# Install the development version of dartR.base and dartR.popgen from GitHub
# (only needs to be run once, or whenever you want to update)
devtools::install_github("green-striped-gecko/dartR.base@dev")
devtools::install_github("green-striped-gecko/dartR.popgen@dev")
# Load the dartRverse package (which includes dartR.base)
library(dartRverse)
# Specify the path to your reference genome FASTA file
RefGenome <- "GCF_011762595.2_mTurTru1.mat.Y_genomic.fna.gz"
# Load your input genlight object
t1 <- your_genlight
# if necessary, rename your Trimmed sequence filed
t1$other$loc.metrics$TrimmedSequence <- t1$other$loc.metrics$TrimmedSequenceSnp
# Perform BLAST of each locus in t1 against the reference genome
# This annotates each locus with alignment metrics in t2$other$loc.metrics
t2 <- gl.blast(t1, ref_genome = RefGenome)
# Extract the chromosome accession (sacc) from the BLAST results
# and store it as a factor in a new column ‘chromosome’
t2$chromosome <- as.factor(t2$other$loc.metrics$sacc)
# Compute the absolute SNP position by adding the BLAST start coordinate (sstart)
# to the original relative position in the locus object
t2$position <- t2$other$loc.metrics$sstart + t2$position
# SNP density plot
p1 <- gl.plot.snp.density(
  x             = t2,
  bin.size      = 1e+06,
  min.snps      = 50,
  min.length    = 1e+06,
  color.palette = viridis::viridis,
  chr.info      = TRUE,
  plot.title    = "SNP density"
)

gff_file <- "GCF_011762595.2_mTurTru1.mat.Y_genomic.gff"
# returns any overlapping gene(s) for each SNP

r2 <- gl.find.genes.for.loci(
  x        = t2,
  gff.file = gff_file,
  loci     = locNames(t2)
)

Cheers,
Luis 

Harry Andrews

unread,
Sep 17, 2025, 12:55:57 AM9/17/25
to dartR
Hi, the "gl.find.genes.for.loci" command doesn't seem to be working correctly, it only finds genes for 9 loci, out of a dataset containing over 11,000 SNPs. I've created a CSV with locus info "all_info <- data.frame(TrimmedSeq = t2$other$loc.metrics$TrimmedSequence, CloneID = t2$other$loc.metrics$CloneID, LocNames = locNames(t2),Chr = as.character(t2$chromosome), Pos = t2$position)" "write.csv(all_info, "all_info.csv", row.names = FALSE)" based on steps outlined in another conversation, is there any way to compare this directly to my reference GFF?
Reply all
Reply to author
Forward
0 new messages