You can use abyss-pe with single-end reads like so:
abyss-pe name=assembly k=$k n=10 v=-v se=input.bam
Your coverage is 10x, which is low:
$ calc 16e6*50/80e6
10
If you consider that only 58% of your reads map, it's even lower (5.8x).
Assembly requires at least 20x coverage for reasonable results. 50x or
more is better.
With low-coverage high-quality data, you could try an overlap assembler
(rather than a de Bruijn assembler) such as String Graph Assembler
(SGA):
https://github.com/jts/sga
You'll want to try adjusting the minimum overlap parameter.
Cheers,
Shaun
Most likely, the smaller assemblies are subsets of the larger
assemblies, and there will be nothing (or very little) to be gained by
merging these assemblies.
If you do try an overlap assembler as I suggested in my previous email,
you could try assembling the reads along with the ABySS contigs. I'd be
curious to hear of your experience if you try this approach.
Cheers,
Shaun
On Tue, 2011-03-22 at 21:59 -0700, bioinfo wrote:
Ah, I didn't realize that you had RNAseq data. We have a separate
software package, Trans-ABySS (which uses ABySS) for that. It merges the
assemblies using BLAT.
http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss
and the discussion group:
https://groups.google.com/forum/#!forum/trans-abyss
Those N50 sound very low though. Did you calculate those numbers using
abyss-fac (included with ABySS)?
Cheers,
Shaun
On Wed, 2011-03-23 at 12:35 -0700, tsuc...@gmail.com wrote:
> Thanks Shaun for the reply!!
> Actually the smaller assemblies have N50 size about double that of N50
> of larger assemblies. For example, for k=20, my N50 is just 24 whereas
> for k approaching 35, the N50 goes upto 45. Now since N50s are not
> real metrics for judging assemblies in RNAseq data, I am not sure
> which k value is appropriate. At the same time, the larger assemblies
> have very short contigs.
>
> Actually our covergae is not 10X as pointed by you in your earlier
> email. The genome size is 80 MB, but the total genes may be around
> 8-10 MBs. Since this is RNAseq data, my assumption is our coverage is
> way over 50X. I will certainly try and use an overlap assembler and
> see how it works. Do you know of any good assemblers that does this
> type of job specifically?
>
> Thanks
>
> Sucheta
You can use abyss-pe for assembly of single-end data like so:
abyss-pe name=${name} k=${k} E=0 n=10 v=-v se=input.bam
Can you report the results of running abyss-fac on your contigs?
You're best off contacting the Trans-ABySS team on their discussion
forum for Trans-ABySS-specific issues.
https://groups.google.com/forum/#!forum/trans-abyss
Cheers,
Shaun
You can pass multiple files to abyss-fac:
abyss-fac k??/assembly-1.fa
The FASTA header output by ABySS is
>ID LENGTH COVERAGE
Coverage is the total k-mer coverage, which is not read coverage.
The ${name}-bubbles.fa file contains variant sequences. You can align
these sequences back to your contigs.
With RNAseq data, small values of k better assemble less-expressed
transcripts, and larger values of k better assemble more-expressed
transcripts. For this reason, rather than picking a single value of k, I
recommend merging the assemblies. Trans-ABySS uses BLAT to merge the
assemblies.
Cheers,
Shaun