running ABySS

138 views
Skip to first unread message

Seta

unread,
Apr 5, 2011, 10:03:32 AM4/5/11
to ABySS
Hello,

Just checking that I got the things that were on these groups:

1) You can not do parallel runs with single and reads
2) You can use the "abyss-pe name=assembly k=28 n=10 in=name.fastq
assembly-3.fa" command to assemble single end data
3) BUT you can also use abyss -k28 name.fastq -o name_contig.fa to
assemble single end data.

is this correct? But what is the difference then between the two last
commands? And how can you check whether ABySS uses the sparse hash?
And what does it mean when the %memory used to run ABYSS drops after a
half an hour, and how can this be solved?

Thanks for your help.

Rod Docking

unread,
Apr 5, 2011, 12:50:30 PM4/5/11
to Seta, ABySS
Hi Seta:


1) You can not do parallel runs with single and reads

I’m not quite sure what you mean here.  You can certainly run the parallel assembler (ABYSS-P) using only single end reads.  Indeed, the first stage of the assembly process (i.e., the single end stage) doesn’t discriminate between single-end and paired-end reads – it’s all k-mers at that point.


2) You can use the "abyss-pe name=assembly k=28 n=10 in=name.fastq
assembly-3.fa" command to assemble single end data

This is correct.  If you run your same command with the additional “--recon” argument, you can see that ABySS, Adjlist, PopBubbles, and MergeContigs will be run.


3) BUT you can also use abyss -k28 name.fastq -o name_contig.fa to
assemble single end data.

This is also correct!  In this case you are directly calling the single-end assembler (which should be “ABYSS”).  The difference between the two commands is that in the first, you’re calling the paired-end pipeline, but only specifying enough arguments for the first few steps.  In the second case, you’re calling the single-end assembler directly.

And how can you check whether ABySS uses the sparse hash?

When you configured ABySS, you should have seen messages like:

checking google/sparse_hash_map usability... yes
checking google/sparse_hash_map presence... yes
checking for google/sparse_hash_map... yes

In addition, if sparsehash wasn’t found, you’ll see something like:

warning: ABySS should be compiled with Google sparsehash to
    reduce memory usage. It may be downloaded here:
    http://code.google.com/p/google-sparsehash

And what does it mean when the %memory used to run ABYSS drops after a
half an hour, and how can this be solved?

Again, I’m not quite sure I understand you here – usually memory usage dropping is a good thing!  Does the assembly process fail or did you see any error messages?

Hope that helps,
Rod


--
Rod Docking
Canada's Michael Smith Genome Sciences Centre
rdoc...@bcgsc.ca
www.bcgsc.ca

Seta

unread,
Apr 5, 2011, 2:53:17 PM4/5/11
to ABySS
Hello Rob,

Thank you for your help! I'm only a bit confused with the last part
about the memory usage:
Whenever I start to run ABySS on my mac the first half hour it uses
almost 95-100% of the working memory. However this drops to 2-3% after
30minutes. At that time nothing actually really happens, even if I let
it run for more than 12hours. I wonder how this can be and how I can
solve this problem...

Cheers,
Seta

On 5 apr, 18:50, Rod Docking <rdock...@bcgsc.ca> wrote:
> Hi Seta:
>
> 1) You can not do parallel runs with single and reads
>
> I'm not quite sure what you mean here.  You can certainly run the parallel assembler (ABYSS-P) using only single end reads.  Indeed, the first stage of the assembly process (i.e., the single end stage) doesn't discriminate between single-end and paired-end reads - it's all k-mers at that point.
>
> 2) You can use the "abyss-pe name=assembly k=28 n=10 in=name.fastq
> assembly-3.fa" command to assemble single end data
>
> This is correct.  If you run your same command with the additional "--recon" argument, you can see that ABySS, Adjlist, PopBubbles, and MergeContigs will be run.
>
> 3) BUT you can also use abyss -k28 name.fastq -o name_contig.fa to
> assemble single end data.
>
> This is also correct!  In this case you are directly calling the single-end assembler (which should be "ABYSS").  The difference between the two commands is that in the first, you're calling the paired-end pipeline, but only specifying enough arguments for the first few steps.  In the second case, you're calling the single-end assembler directly.
>
> And how can you check whether ABySS uses the sparse hash?
>
> When you configured ABySS, you should have seen messages like:
>
> checking google/sparse_hash_map usability... yes
> checking google/sparse_hash_map presence... yes
> checking for google/sparse_hash_map... yes
>
> In addition, if sparsehash wasn't found, you'll see something like:
>
> warning: ABySS should be compiled with Google sparsehash to
>     reduce memory usage. It may be downloaded here:
>    http://code.google.com/p/google-sparsehash
>
> And what does it mean when the %memory used to run ABYSS drops after a
> half an hour, and how can this be solved?
>
> Again, I'm not quite sure I understand you here - usually memory usage dropping is a good thing!  Does the assembly process fail or did you see any error messages?
>
> Hope that helps,
> Rod
>
> --
> Rod Docking
> Canada's Michael Smith Genome Sciences Centre
> rdock...@bcgsc.cawww.bcgsc.ca

Rod Docking

unread,
Apr 5, 2011, 3:05:37 PM4/5/11
to Seta, ABySS
Hi Seta:

    Could you report the output of running ABySS with verbose output enabled?  e.g.:

    ABYSS -v -k28 name.fastq -o name_contig.fa

    I'm curious which stage is taking so long.  Could you also confirm that you're using the most recent release? (abyss 1.2.6)

Cheers,
Rod  

Seta

unread,
Apr 5, 2011, 3:53:02 PM4/5/11
to ABySS
Hi,
I'm using the most recent release and the last thing it the verbose
tells me, before it slows down, is:
Read 3000000 reads. Hash load: 68055256 / 74066549 = 0.919 using 0 B
Read 3100000 reads. Hash load: 69908696 / 74066549 = 0.944 using 0 B
Read 3200000 reads. Hash load: 71744388 / 74066549 = 0.969 using 0 B
Read 3300000 reads. Hash load: 73649464 / 74066549 = 0.994 using 0 B


On 5 apr, 21:05, Rod Docking <rdock...@bcgsc.ca> wrote:
> Hi Seta:
>
>     Could you report the output of running ABySS with verbose output enabled?  e.g.:
>
>     ABYSS -v -k28 name.fastq -o name_contig.fa
>
>     I'm curious which stage is taking so long.  Could you also confirm that you're using the most recent release? (abyss 1.2.6)
>
> Cheers,
> Rod
>

Rod Docking

unread,
Apr 5, 2011, 4:59:35 PM4/5/11
to Seta, ABySS
Hi Seta:

    It looks like you're running out of memory - how much RAM do you have?  You might have to move to a machine with more memory available.  Alternatively,
you could try assembling at a different value for k.  

Regards,
Rod

Seta

unread,
Apr 6, 2011, 1:40:02 AM4/6/11
to ABySS
I've 4BG RAM memory...isn't that enough?

On 5 apr, 22:59, Rod Docking <rdock...@bcgsc.ca> wrote:
> Hi Seta:
>
>     It looks like you're running out of memory - how much RAM do you have?  You might have to move to a machine with more memory available.  Alternatively,
> you could try assembling at a different value for k.
>
> Regards,
> Rod
>

Rod Docking

unread,
Apr 6, 2011, 1:06:59 PM4/6/11
to Seta, ABySS
Hi Seta:

    We typically do our assemblies on machines with much more RAM than that.  The amount of RAM required will depend on how many reads you're trying to assemble, the size and complexity of the genome, and the value of k that you choose.  Can you give me an idea of how many reads you're trying to assemble?  

  You could also try re-compiling ABySS using an argument to configure like:  
./configure --enable-maxk=32

Good luck,
Rod
Reply all
Reply to author
Forward
0 new messages