WARNING: Cannot add sequence number 66112 (lcl|FTPD00000000.1) because it has zero-length.
[formatdb 2.2.22] FATAL ERROR: Fatal error when adding sequence to BLAST database.
Is there a way to get formatdb to ignore these "zero-length" sequences? makeblastdb made a similar comment when I used it, but it just skipped over them, so they didn't cause a problem.
Is it possible that the no BLAST hits is a result of the database being formed with makeblastdb, and then ran with legacy BLAST?
Sorry if there are too many questions here. Let me know if I should upload anything else.
Thanks for the help!
...I did come across what would seem to be the equivalent problem when using makeblastdb. Namely that makeblastdb baulks at blank lines and deflines without sequences in the FASTA file. If that is the problem here with formatdb, as it appears to be, it’s unclear to me why he didn’t also have that problem with makeblastdb. Or maybe he did but it didn’t throw an error, and his blast database was somehow corrupted, contributing to his no blast results problem?
Anyway, I would suggest pre-cleaning the FASTA file before running either of those utilities. I used the following in bash to clean mine and it seemed to do the trick:
# get rid of blank lines ...
grep -v '^$' ./fungi2017.fna > ./fungi2017_noblanks.fna
# ... and deflines without sequences
awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' ./fungi2017_noblanks.fna > ./fungi2017_noblanks_noempty.fna
# get rid of the intermediate file
rm ./fungi2017_noblanks.fna
Hope that helps!
formatdb -i output_noblanks/nifH2017_noblanks_noempty.fasta -p F -o T -n nifH_db_formatdb
And then I used assign taxonomy.py, and still got no blast hits with both my set of sequences and a subset of sequences I used to create the database:
assign_taxonomy.py -i Christian_nif_joined_seqs_Q30_all.fna -t output_noblanks/nifH2017_noblanks_noempty_accession_taxonomy.txt -m blast -b nifH_db_formatdb
Any ideas for how I should go about solving this problem / troubleshooting to see what the problem is with the database?
Thanks,
Christian