De novo paired end Assembly with Abyss. Asking for help!!!

135 views
Skip to first unread message

Mehmet göktay

unread,
Jan 6, 2015, 4:31:44 PM1/6/15
to abyss...@googlegroups.com
Hi everyone,

In our laboratory we are not interested in having long enough contigs where we can develop ssr markers rather than whole genome assembly.

After using couple of different assembly tools, I have decided to use Abyss, In our case I wanted to fix "n" and "k" parameters. I made couple of assemblies with different n values from 2 to the 8.
k=25 is same for these runs.

Because our coverage is so low, the longest contigs is around 30k. (in scaffold file, n=2, k=25)

- My first question is; After each run I just focused on scaffold file rather than contigs file because it looks like it has longer contigs. Do you think this is a normal approach?

- My second question is :Does any one has any idea how can I improve my assembly? Even though I'm not interested to have whole genome assembly but I thought I was going to have longer contigs, scaffolds etc.
Thus playing with "n" and "k" parametersi Do you think it is enough?



Thank you for your kind answers,
Mehmet

Ben Vandervalk

unread,
Jan 6, 2015, 4:58:56 PM1/6/15
to Mehmet göktay, abyss...@googlegroups.com
Hi Mehmet,

Sure, using the scaffold file sounds reasonable to me.  Depending on whether your downstream analysis tools can tolerate 'N's, you might want to split the scaffolds into scaftigs (by breaking them where sequences of N's occur).  You can do that with:

$ abyss-fatoagp -f scaftigs.fa scaffolds.fa > /dev/null

FYI, there is a list of assembly parameters in the README.md: https://github.com/bcgsc/abyss#assembly-parameters.  'k' is probably the most important parameter.  It might be worth setting the minimum coverage threshold ('c') and minimum erosion coverage ('e') to zero and seeing if that helps, i.e.

$ abyss-pe c=0 e=0 <... your other params ...>

If you have overlapping paired-end reads, you can merge them with abyss-mergepairs.  (Assembling with merged reads generally allows you to use a larger k and improves contiguity.)

Hope that helps,

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mehmet göktay

unread,
Jan 6, 2015, 6:28:32 PM1/6/15
to Ben Vandervalk, abyss...@googlegroups.com
Hi Ben,

Thank you for your kind advice. If it is possible I also would like to ask another question about parameter sets. What I was planning is first according to the outcome of couple of assemblies, fixing "n" than I was planning to iterating over the number of k values to decide which k fits best. (Until now I have just used k=25)

My question is; is there any better solution to find best k? I saw a program which is called kmergenie, but it doesn't accept paired end reads as far as I understand, thus I couldn't use it. 

For this specific purpose, could you give me any advice?

Best Regards,
Mehmet

--
Mehmet Göktay, MSc student
Department of Molecular Biology and Genetics
Izmir Institute of Technology
35430, Urla, Izmir, TURKEY


Ben Vandervalk

unread,
Jan 6, 2015, 6:36:27 PM1/6/15
to Mehmet göktay, abyss...@googlegroups.com
Hi Mehmet,

I have never used kmergenie, but I don't see any reason why you can't run it on paired-end read data.  (It mentions paired-end reads in the README.)

Personally, I prefer to run a set of assemblies across different k values to determine which k is best, since it leaves no room for doubt.  But if you don't have the computing resources to do that, kmergenie is a good option.

- Ben

Mehmet göktay

unread,
Jan 6, 2015, 6:38:16 PM1/6/15
to Ben Vandervalk, abyss...@googlegroups.com
Hi Ben,

Thank you for your quick response. I'll double check it. 

Best Regards,
Mehmet
Reply all
Reply to author
Forward
0 new messages