Re: Blast and SNP calling: Error: Error in if (file_size > 0) { : missing value where TRUE/FALSE needed.

64 views
Skip to first unread message

Md. Shahin Iqbal

unread,
Feb 27, 2023, 6:15:02 PM2/27/23
to da...@googlegroups.com

Hi
I have a DArT metadata mapped to an old reference genome. Now I am planning to align to a new reference genome. I am using R version 4.2.2 (2022-10-31 ucrt) on R studio.

library(dartR) version: 2.7.2
gl.install.vanilla.dartR(flavour = "dev")

res2 <- gl.blast(x="gl1", ref_genome = "Crystal.fasta", task = "megablast",
                Percentage_identity = 70,
                Percentage_overlap = 0.8,
                bitscore = 50,
                number_of_threads = 2,
                verbose = NULL)
I am getting this error:
Starting BLASTing

Command line argument error: Argument "query". File is not accessible:  `C:\Users\22657713\AppData\Local\Temp\Rtmp6zOoIn/fasta.input'
Error in if (file_size > 0) { : missing value where TRUE/FALSE needed

How can I fix the problems?


After blasting, how can I call SNP or any details pipeline about recalling SNP aligned to the new reference genome from DArT metadata?

Thanks in advance
Shahin

--





-----------------------------------------
Md. Shahin Iqbal
Scientific Officer (Plant Breeding)
Pulses Research Centre,
Bangladesh Agricultural Research Institute,
Ishwardi, Pabna, Bangladesh.
Mobile No: +880-1729991507

Jose Luis Mijangos

unread,
Feb 28, 2023, 7:51:37 PM2/28/23
to dartR
Hi Shahin,

Thank you for reporting this bug. The fix is available in the developing version of dartR which can be installed as below:

library(dartR)
gl.install.vanilla.dartR(flavour = "dev")

You can assign the updated chromosome and position information to SNPs as follows:

library(dartR)
# running blast
platypus_blast <- gl.blast(x= platypus.gl,ref_genome = 'chr_1_mOrnAna1.pri.v4.fa')
# assigning new chromosome to SNPs
platypus_blast$chromosome <- as.factor(platypus_blast$other$loc.metrics$sacc)
# assigning new position to SNPs
pos <-  platypus_blast$other$loc.metrics$SnpPosition + platypus_blast$other$loc.metrics$sstart
platypus_blast$position <- as.integer(pos)

Cheers,
Luis

Md. Shahin Iqbal

unread,
Mar 2, 2023, 1:07:46 AM3/2/23
to dartR
Hi Luis

Thanks for fixing the bug. It is now working.

Kind regards
Shahin

kw k

unread,
Sep 5, 2023, 7:43:52 PM9/5/23
to dartR
Hi Luis,

Thank you for providing the code to 'assign the updated chromosome and position information to SNPs.' However, I am still unclear about by the code you demonstrated, how the blasted chromosome and position information were updated in the SNP genelight file that we will be using for filtering and analysis. It seems that your code only copies the chromosome and position columns to the platypus_blast dataframe. Could you please guide me on how to incorporate the blasting results into the genelight file or suggest a method for filtering the loci dataset to align with a specific reference genome (gl.blast)?

Thank you,
Christy

Jose Luis Mijangos

unread,
Sep 5, 2023, 11:23:08 PM9/5/23
to dartR
Hi Christy,

The blast results from the function gl.blast are automatically merged into the genlight object as described in the function documentation:

"If the input is a genlight object: returns a genlight object with one hit per sequence merged to the slot $other$loc.metrics. If the input is a fasta file: returns a dataframe with one hit per sequence."

you can access the function documentation by typing in the R console a question mark followed by the function name, see for example code below, as described at the top of the home page of this google groups forum. 

?gl.blast

Cheers,
Luis 

kw k

unread,
Sep 6, 2023, 12:31:13 AM9/6/23
to dartR
Hi Luis,

Thank you for your prompt reply!

I checked both the input genelight object (i.e. platypus.gl in the example), and the output data frame (i.e. platypus_blast) after running the blast, I am afraid there is no $other$loc.metrics$sacc in both objects - I accessed the loc.metrics by platypus.gl@other$loc.metrics$sacc and platypus_blast$other$loc.metrics$sacc, what returned are 'NULL'. I am sure my gl.blasted ran properly, as I saw the 3 output reports in the tempory folder and they are making sense. 

The function documentation does not say how to isolate the subset of loci that is mapped by gl.blast for further analysis. Could you please guide me on how to access the incorporated blasted result (one hit per sequence) in the genelight, or how to isolate the subset of loci that is mapped by gl.blast?

Your generous help means a lot to me.

Cheers,
Christy

Jose Luis Mijangos

unread,
Sep 6, 2023, 2:23:56 AM9/6/23
to dartR
Hi Christy,

You have to provide a fasta file with the specific reference genome of your species using the parameter "ref_genome". 

Please find attached the fasta file I'm using as example in the code below (chromosome X4). After filtering you should end up with 19 SNPs in res_2. 

After you downloaded, unzipped and move it to your working directory and please try running again the function. 

Please let me know if it worked. 

to filter out those loci that were not mapped to the reference genome you can use:

One way to do it is by filtering on evalue or bitscore, see for example, the code below. You can get more information about these two values here:


library(dartR)
res <- gl.blast(x= platypus.gl,
                ref_genome = "chr_X4_mOrnAna1.pri.v4.fa")
               
res_2 <- gl.filter.locmetric(res,
                             metric = "evalue",
                             lower = min(res$other$loc.metrics$evalue,
                                         na.rm = TRUE),
                             upper = 1e-20)

Cheers,
Luis 
chr_X4_mOrnAna1.pri.v4.fa.gz
Reply all
Reply to author
Forward
0 new messages