abyss "best practices"?

Sven

unread,

Oct 20, 2015, 4:35:58 AM10/20/15

to ABySS

Hi,

I was wondering what are the "best practices" for data pre-processing for whole genome assemblies with abyss.

E.g. some assemblers require the fastq data to be unprocessed (=raw), because quality clipping/error correction and/or duplicate removal is done within the assembly process itself.

What about abyss? What are the requirements/recommendations for (illumina) input data?

best,
Sven

Ben Vandervalk

unread,

Oct 20, 2015, 12:44:12 PM10/20/15

to Sven, ABySS

Hi Sven,

There are no official best practices, but we (the ABySS team) tend to get good results by doing one or more of the following:

* quality trimming with the abyss-pe `q` option, typically with `q=15` (we do this mostly when we want to reduce memory usage)

* merge overlapping read pairs with `abyss-mergepairs`

* merge non-overlapping read pairs with `konnector`

We don't normally use error correction tools for preprocessing, but IMO ABySS has good internal algorithms for error correction.

For post-processing, we often use `abyss-sealer` for gap filling.

Common gotchas for `abyss-pe` parameter settings:

* minimum alignment length `l` is too high (yielding poor paired-end alignments in the contig and scaffolding stages) -- by default `l` is assigned the same value as `k` which is often too high since `l` specifies the minimum length for a perfect match in both reads of a pair. Typically fixing `l=40` or `l=50` is a good setting.

* minimum sequence length cutoff `s` for contig and scaffolding stages is too low. According to Shaun Jackman (ABySS guru), it should be at least 2*k. I think it is `s=200` by default, whereas I normally set it to `s=1000` to filter out some junk.

I plan to fix those default parameter settings in the next ABySS release.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sven

unread,

Oct 20, 2015, 2:46:37 PM10/20/15

to ABySS, sir.sv...@gmail.com

Hi Ben,

thanks for the insights how you guys are working with abyss .. :-)
I'll check the suggestions with my test dataset ...

best regards,
Sven

Sven

unread,

Oct 21, 2015, 7:28:01 AM10/21/15

to ABySS, sir.sv...@gmail.com

Hi Ben,

while trying to apply your suggestions I noted that I need some short notes on the three steps you mentioned.
I know this is very basic stuff, but I still need to get a feeling for abyss and his friends ;-)

(1) quality trimming
(2) merge overlapping read pairs
(3) merge non-overlapping reads

I am especially unsure about steps (2) and (3). Merging of reads should take place before the assembly, correct?
So the above mentioned steps could also be read as "Sometimes we do (1), mainly for reducing memory consumption, in other cases we do (2) and (3) before the assembly."?
Assuming quality clipping is done on unmerged reads..?

There's little information about the abyss-* tools and their application within an assembly pipeline. Or at least I haven't found any docs addressing this issue. Hints and pointers are welcome ;-)

thanks for your patience,
Sven

Ben Vandervalk

unread,

Oct 21, 2015, 1:04:51 PM10/21/15

to Sven, ABySS

Hi Sven,

Your understanding is correct -- read merging should be done before assembly and quality trimming is generally pointless on merged reads (the read merging tools clean up most of the sequencing errors as a side effect of the merging). Merging overlapping reads prior to assembly is standard practice these days and there are many tools to do it (abyss-mergepairs, PEAR, COPE, FLASh, etc.). Merging non-overlapping reads (e.g. `konnector`, `GapFiller` by Nadalin et al.) is more ambitious and `konnector` is probably the best in that category currently (IMHO as one of the authors).

You are right that the documentation of the ABySS tools could be a lot better. Right now, the README.md and doc/abyss-pe.1 (man page) are the main source of documentation for the main assembly pipeline and the `--help` output is the main source of documentation for the individual tools. Sealer has a nice README with examples in the `Sealer` subdirectory. If you are planning to try konnector, it would probably help to have a look through the papers as it is a complex tool with many parameters [1][2].

- Ben

[1] http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6999126
[2] http://www.biomedcentral.com/1755-8794/8/S3/S1

Sven

unread,

Oct 22, 2015, 2:29:58 AM10/22/15

to ABySS, sir.sv...@gmail.com

Hi Ben,

thanks for your assistance.

Our insert sizes are usually larger than 2x readlength (currently I am working with a bird genome dataset, PE libs have 500 and 1000bp inserts, sequenced with 2x100bp runs).
So the fraction of merged reads is only about 10% ... nevertheless I can use abyss-mergepairs to a) get *some* merged pairs and b) to quality clip the non-merged reads (--trim-quality=N).

Thanks for the pointers to some more reading stuff :-)

best,
Sven

Sven

unread,

Oct 22, 2015, 4:14:31 AM10/22/15

to ABySS, sir.sv...@gmail.com

well, ok, using konnector should to the trick .. I will test ..

best,
Sven

Reply all

Reply to author

Forward