Hi,
I meet a very strange error when running EMIRGE when performing the iteration step.
At the first 3 iterations, the program works fine, but suddenly there is no mapping reads in the 4th iterations, and the previous fasta reads just change to many Ns, which caused an error.
I extracted the mapping info here:
# reads processed: 500000
# reads with at least one reported alignment: 490840 (98.17%)
# reads that failed to align: 9160 (1.83%)
# reads processed: 500000
# reads with at least one reported alignment: 487633 (97.53%)
# reads that failed to align: 12367 (2.47%)
self.posteriors[-1].data = self.posteriors[-1].data / denom[(self.posteriors[-1].col,)] # index out denom with column indices from coo format.
# reads processed: 500000
# reads with at least one reported alignment: 481723 (96.34%)
# reads that failed to align: 18277 (3.66%)
# reads processed: 500000
# reads with at least one reported alignment: 473542 (94.71%)
# reads that failed to align: 26458 (5.29%)
# reads processed: 500000
# reads with at least one reported alignment: 34 (0.01%)
# reads that failed to align: 499966 (99.99%)
self.cluster_sequences(consensus_filename) # merges sequences that have evolved to be the same (USEARCH)
We could see at the mapping dropped from 96% to 0.01% in the 4th iteration.
Then I take a look at the fasta file for each iteration, I found that in the 3th iteration, the same fasta record just turn into many Ns:
└─[$]> head iteration_0/emirge/*/*.cons.fasta
==> iteration_0/emirge/iter.00/iter.00.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA
==> iteration_0/emirge/iter.01/iter.01.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA
==> iteration_0/emirge/iter.02/iter.02.cons.fasta <==
>KM454219.1.1366
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTTATGCATGCAAGTCGAAC
GAGCTTACCGGGACTTCGGTTCTGGGACGGCTAGTGGCAGACGGCTAAGTAACACGTAAG
CAACCAACCTTCCAGTGGGGGATAATCCGCCGAAAGGCGGTCTAATACCGCATACGATCG
CTCCTTGATGGGGAGTGATGAAAGACTTGTCGCTGGAAGACGGGCTTGCGGCCTATCAGG
TAGTTGGCGGGGTAACGGCCCACCAAGCCGAAGACGGGTACCTGGTCTGAGAGGACGATC
AGGCAGAAGGGGACTGAGACACGGCCCCTACTCCTACGGGAGGCAGCAGCAGGGAATCTT
GCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTCG
TAAACTCCTTTTCTGGGGGAAGATAATGACGGTACCCCAGGAATAAGCGACGGCTAACTA
CGTGCCAGCAGCCGCGGTAACACGTAGGTCGCAAGCGTTGTCCGGATTTACTGGGCGTAA
==> iteration_0/emirge/iter.03/iter.03.cons.fasta <==
>KM454219.1.1366
AGAGTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
==> iteration_0/emirge/iter.04/iter.04.cons.fasta <==
I attacted the full log file also.
I suspect it's because the reference database and USEARCH. I used SILVA_123.1_SSURef_Nr99_tax_silva_trunc.fasta as reference, and usearch v8.1.1861_i86linux32.
Does someone know why it happened and how to fix this problem?
Best regards,
Yaxin