What version of ABySS are you running? Is the duplicated region the full
length of the contig or a portion of the contig? Is the duplicated
region perfectly 100% identical or are there any mismatches?
Cheers,
Shaun
It will affect the troubleshooting a great deal depending on whether the
duplicated regions are perfectly 100% identical or very nearly
identical. I'll wait to hear back from you.
What value are you using for the parameter s (seed contig size)? If you
haven't specified a value, the default is 100.
Cheers,
Shaun
In the file ${name}-5.path, the first column is the contig ID, and the
rest of the line is the IDs of the subsequences that compose that
contig. Can you find the two lines for your two contigs and report them
here?
Cheers,
Shaun
The two contigs have this sequence in common:
2456+ 253- 2137- 1430- 1794- 3539+ 2166+ 3542+ 3005- 2634- 1448- 986- 1427- 93N 1056- 734- 754- 2457- 658- 2072+ 1273- 247- 1097+ 1099+ 342+ 2295+ 3190- 1874+ 920+ 2601- 3190- 1874+ 1338- 1749- 3316-
This sequence is unique to 3661:
3490- 1260+ 3435+ 55+ 2602- 2207+ 1529- 3118+ 1565+ 99+ 2051+ 2626- 597+ 721+ 395+ 1033- 2770- 16+
and this sequence is unique to 3649 (reverse complement):
1526- 119- 3171+ 3537- 1591-
What are the sizes of the contigs unique to 3649? You can find these
contigs in the ${name}-[345].fa files.
As a last resort, you could edit the ${name}-5.path file manually to
remove the duplicated sequence and run MergeContigs to generate a new
${name}-contigs.fa file.
Cheers,
Shaun
I've seen this behaviour in ABYSS before. Like Shaun said there are
specific parts in the contigs that are unique and different in the
path I guess. I've seen some cases where you have almost the same
contig but in reverse comp. Since there is no strand specific
sequencing, shouldn't this contigs be the same? Of course, there are
differences but they could be due to sequencing errors or maybe true
SNPs...
The only thing is that here we have a team for manual curation but
like Anthony said, it would be great to report this things or probably
have a tool/script to perform the filtering.
Cheers.
The paired-end reads from the smaller contig 1591 to a larger contig
resulted in that smaller contig being extended and the duplication of
the larger contig. Reducing this sort of duplication is an active area
of development for ABySS.
The file ${name}-contigs.dot lists overlapping contigs. For example in
the following line, the contigs 10140 and 10455 overlap by 2516 bp.
"10140+" -> "10455+" [d=-2516]
How long are your reads, and what is your coverage depth?
Cheers,
Shaun
k=31 is quite small for 75-bp reads with 25x coverage. Have you tried
larger values of k? Say around 50?
Cheers,
Shaun