ABySS Assembly

65 views
Skip to first unread message

Meaghan Pimsler

unread,
May 25, 2012, 10:20:14 AM5/25/12
to ABySS
Hi everyone. Like others, I am a total noob- just about to start my
third year of my PhD. I am doing a de novo transcriptome assembly of
a non-model fly without a reference genome on a cluster using Illumina
Hi-seq with paired-ends. The reads are all 100bp in length.

I thought I had figured things out- ABySS compiled properly, with
Boost libraries and MPI. However, I don't think that ABySS is running
properly, because I am not getting any contig.fa files. This might be
due to the fact that I am using 1.3.2 (so I can use Trans-ABySS down
the line) or maybe it's because of the options I'm using... below I
have copied the abyss-pe command stuff I've been using. The data
files I'm trying to use are 42 GB each, forward and reverse, and I
have not added a \1 or \2 to anything. I'm trying to run it in
parallel on multiple (8) cpu's on a single node...

abyss-pe OVERLAP_OPTIONS=--no-scaffold SIMPLEGRAPH_OPTIONS=--no-
scaffold -j8 k=25 n=10 q=20 c=100 in='/path/to/BA3_T_F_forward.fastq /
path/to/BA3_T_F_reverse.fastq' name=/path/to/BA3_T_F

The output files I'm getting are:

BA3_T_F_5-1.adj
20Mb BA3_T_F_5-1.fa
64Kb BA3_T_F_5-1.path
3.5Mb BA3_T_F_5-2.adj
116Kb BA3_T_F_5-2.path
3.1Mb BA3_T_F_5-3.adj
19Mb BA3_T_F_5-3.fa
20b BA3_T_F_5-3.sam.gz
2.8Mb BA3_T_F_5-bubbles.fa
261Kb BA3_T_F_5-indel.fa

Any explanation of these files, or why I'm not getting a contigs.fa
output, would be greatly appreciated. I am hoping to get a reasonable
sort of output, so I can start creating multi-k assemblies to put into
Trans-ABySS... but first ABySS needs to be working...

Shaun Jackman

unread,
May 25, 2012, 5:52:18 PM5/25/12
to Meaghan Pimsler, ABySS
Hi Meaghan,

The -j8 should be j=8. That only affects the number of CPUs though.
Unless you only want the most highly expressed transcripts, you probably don’t want to add the c=100 parameter.
Neither of these are the cause of your trouble though.
Add v=-v to your command line to ask for verbose logging and post the resulting log.

Cheers,
Shaun

Meaghan Pimsler

unread,
May 25, 2012, 8:40:30 PM5/25/12
to ABySS
Awesome, Mr. Jackman! Thank you! I'll let you know how it goes with
these changes.

Regards,
Meaghan

Meaghan Pimsler

unread,
May 27, 2012, 12:29:53 PM5/27/12
to ABySS
Dear Mr. Jackman-

I am disappointed to report that it did not work. I called the
program with the suggested changes (j=8, and I changed to c=10).

The main issue, based on my:
qstat -f <job number>
output is that the amount of virtual memory was too high (greater than
the ram on the node) and so
resources_used.cput = 00:00:00

I changed the call script to run on only one cpu on the node, and
ABySS is running now, with the same files... This is likely an
infrasctructure issue more than anything else.

Thanks for your advice!

Regards,
Meaghan

Shaun Jackman

unread,
May 28, 2012, 1:39:44 PM5/28/12
to Meaghan Pimsler, ABySS
Hi Meaghan,

The abyss-map aligner first indexes the contig sequence, and then aligns the reads to the contigs. The indexing step takes the most memory, roughly ten time the size of the FASTA file, but can be run on a different machine with more memory. Run
abyss-index -v BA3_T_F_5-3.fa
which will create to files, BA3_T_F_5-3.fa.fai and BA3_T_F_5-3.fa.fm, then resubmit your job.

In the mean time, there are unitigs in the file BA3_T_F_5-3.fa. You could start looking at their sequence if you want to get an early start on your analysis.

Cheers,
Shaun
Reply all
Reply to author
Forward
0 new messages