[fai_load] fail to open FASTA index for the iteration 4

80 views
Skip to first unread message

xue....@gmail.com

unread,
Feb 27, 2017, 11:28:14 AM2/27/17
to EMIRGE users
Hi,

I meet a very strange error when running EMIRGE when performing the iteration step.


At the first 3 iterations, the program works fine, but suddenly there is no mapping reads in the 4th iterations, and the previous fasta reads just change to many Ns, which caused an error.

I extracted the mapping info here:

# reads processed: 500000
# reads with at least one reported alignment: 490840 (98.17%)
# reads that failed to align: 9160 (1.83%)

# reads processed: 500000
# reads with at least one reported alignment: 487633 (97.53%)
# reads that failed to align: 12367 (2.47%)
  self.posteriors[-1].data = self.posteriors[-1].data / denom[(self.posteriors[-1].col,)]  # index out denom with column indices from coo format.

# reads processed: 500000
# reads with at least one reported alignment: 481723 (96.34%)
# reads that failed to align: 18277 (3.66%)

# reads processed: 500000
# reads with at least one reported alignment: 473542 (94.71%)
# reads that failed to align: 26458 (5.29%)

# reads processed: 500000
# reads with at least one reported alignment: 34 (0.01%)
# reads that failed to align: 499966 (99.99%)
    self.cluster_sequences(consensus_filename)  # merges sequences that have evolved to be the same (USEARCH)

 
We could see at the mapping dropped from 96% to 0.01% in the 4th iteration.
Then I take a look at the fasta file for each iteration, I found that in the 3th iteration, the same fasta record just turn into many Ns:

└─[$]> head iteration_0/emirge/*/*.cons.fasta
==> iteration_0/emirge/iter.00/iter.00.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA

==> iteration_0/emirge/iter.01/iter.01.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA

==> iteration_0/emirge/iter.02/iter.02.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA

==> iteration_0/emirge/iter.03/iter.03.cons.fasta <==
>KM454219.1.1366
AGAGTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

==> iteration_0/emirge/iter.04/iter.04.cons.fasta <==

I attacted the full log file also.
I suspect it's because the reference database and USEARCH. I used SILVA_123.1_SSURef_Nr99_tax_silva_trunc.fasta as reference, and usearch v8.1.1861_i86linux32.

Does someone know why it happened and how to fix this problem?

Best regards,

Yaxin


iteration_0_emirge.log
Reply all
Reply to author
Forward
0 new messages