Dear ABySS-users,
We
assembled bacterial contigs and submitted to NCBI. Some of the contigs
have runs of N's in it. They have the following queries regarding the
same. Please help us to resolve the queries.
[A] Did the assembly program combine the sequences into scaffolds
using runs of N's to represent gaps between ordered and oriented
contiguous sequences? Alternatively, did you randomly merge the
sequences into a single sequence (for example, may be you just linked
the sequences together by size without using an assembly program)?
[B] Does every N in your sequence represent a gap?
Alternatively, does your sequence include single or short runs of N's
that represent ambiguous base calls? If not every N is a gap, what is
the minimum number of N's that represent a gap? In order for us to add
the assembly_gap features for you, we need to input a minimum gap size
to tell our software which N's to convert.
[C] In each gap, does the number of N's represent the
estimated gap size? Alternatively, are all or some of the gap sizes
unknown? If there are unknown gaps, please specify which ones. For
example, are all of the gaps of 100 N's unknown length?
With regards,M. Milner Kumar,