Blast error

39 views
Skip to first unread message

Harry Andrews

unread,
Sep 8, 2025, 10:06:47 PMSep 8
to dartR
Hi, I'm trying to blast my SNPs against a reference chromosome, but every time I try to run the command I get this error

Command line argument error: Argument "query". File is not accessible:  `/var/folders/t9/kykj7p990rl8yzmskgts36kw0000gn/T//RtmpmLapFB/fasta.input'
Error in if (file_size > 0) { : missing value where TRUE/FALSE needed

My reference .fasta files are in my working directory/project folder

Jose Luis Mijangos

unread,
Sep 10, 2025, 7:44:10 PMSep 10
to dartR

Hi Harry,

To map your SNPs (dartseq data) to the dolphin reference genome and check where they fall along the chromosomes and whether they’re in coding or non-coding regions, you’ll need to download the following files:

- Latest reference genome: GCF_011762595.2_mTurTru1.mat.Y_genomic.fna.gz
- GFF file: GCF_011762595.2_mTurTru1.mat.Y_genomic.gff.gz

You can find both files here:

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/011/762/595/GCF_011762595.2_mTurTru1.mat.Y/ 

# Install the development version of dartR.base and dartR.popgen from GitHub
# (only needs to be run once, or whenever you want to update)
devtools::install_github("green-striped-gecko/dartR.base@dev")
devtools::install_github("green-striped-gecko/dartR.popgen@dev")
# Load the dartRverse package (which includes dartR.base)
library(dartRverse)
# Specify the path to your reference genome FASTA file
RefGenome <- "GCF_011762595.2_mTurTru1.mat.Y_genomic.fna.gz"
# Load your input genlight object
t1 <- your_genlight
# if necessary, rename your Trimmed sequence filed
t1$other$loc.metrics$TrimmedSequence <- t1$other$loc.metrics$TrimmedSequenceSnp
# Perform BLAST of each locus in t1 against the reference genome
# This annotates each locus with alignment metrics in t2$other$loc.metrics
t2 <- gl.blast(t1, ref_genome = RefGenome)
# Extract the chromosome accession (sacc) from the BLAST results
# and store it as a factor in a new column ‘chromosome’
t2$chromosome <- as.factor(t2$other$loc.metrics$sacc)
# Compute the absolute SNP position by adding the BLAST start coordinate (sstart)
# to the original relative position in the locus object
t2$position <- t2$other$loc.metrics$sstart + t2$position
# SNP density plot
p1 <- gl.plot.snp.density(
  x             = t2,
  bin.size      = 1e+06,
  min.snps      = 50,
  min.length    = 1e+06,
  color.palette = viridis::viridis,
  chr.info      = TRUE,
  plot.title    = "SNP density"
)

gff_file <- "GCF_011762595.2_mTurTru1.mat.Y_genomic.gff"
# returns any overlapping gene(s) for each SNP

r2 <- gl.find.genes.for.loci(
  x        = t2,
  gff.file = gff_file,
  loci     = locNames(t2)
)

Cheers,
Luis 

Harry Andrews

unread,
Sep 17, 2025, 12:55:57 AM (8 days ago) Sep 17
to dartR
Hi, the "gl.find.genes.for.loci" command doesn't seem to be working correctly, it only finds genes for 9 loci, out of a dataset containing over 11,000 SNPs. I've created a CSV with locus info "all_info <- data.frame(TrimmedSeq = t2$other$loc.metrics$TrimmedSequence, CloneID = t2$other$loc.metrics$CloneID, LocNames = locNames(t2),Chr = as.character(t2$chromosome), Pos = t2$position)" "write.csv(all_info, "all_info.csv", row.names = FALSE)" based on steps outlined in another conversation, is there any way to compare this directly to my reference GFF?
Reply all
Reply to author
Forward
0 new messages