Merging assemblies with ABySS

274 views
Skip to first unread message

Jennifer Shelton

unread,
Aug 8, 2013, 12:49:18 PM8/8/13
to abyss...@googlegroups.com
Hi,

I have two genome assemblies. One was assembled from sanger data the other from illumina. The illumina assembly was performed with ABySS over a range of k-mers. The unitigs from these assemblies and the illumina reads were then assembled with ABySS to merge the products of the single k-mer assemblies. This merge strategy is based on the following post by Shaun Jackman "Another approach that I’ve used before and works reasonably well is to run the individual k assemblies to the unitigs stage, then reassemble the reads plus the unitigs from the first assemblies at a larger value of k." https://groups.google.com/d/msg/abyss-users/RXIbiucgmPs/VotHOWKcjhMJ.

Has anyone had success merging finished assemblies with ABySS? I also have access to the sanger and illumina reads and the illumina unitigs.

We have Fosmid and BAC end sequences for scaffolding of the sanger reads. We have paired end long jumping distance libraries for the illumina data.

Our first thought was to assemble the unitigs from the single k-mer assemblies and the contigs from the sanger assembly as our single end library. Then we would use our fosmids, Bac ends, and long jumping distance libraries as our mate pair libraries (because it seems to me that ABySS only expects long distance pairing information from mate pair libraries).

Does anyone have any thoughts, suggestions, or warnings regarding this idea?

Thanks

Shaun Jackman

unread,
Aug 8, 2013, 2:18:48 PM8/8/13
to Jennifer Shelton, ABySS
Hi Jennifer,

Your strategy seems quite reasonable to me. I'll be interested to hear
how it works for you. Is your Sanger coverage deep enough to assemble
on its own, and which software did you use for the Sanger assembly? My
only suggestion would be to try including the Sanger reads in the
assembly in place of or along with the Sanger assembly.

The Fosmid and BAC-end libraries should work at the scaffolding stage,
except that DistanceEst will be very slow (currently the time
complexity unfortunately scales with the square of the fragment size).
You can use `LIBRARY_de=--mean` to use a faster but less accurate
distance estimation algorithm during scaffolding, where LIBRARY is the
name of the library. Be sure to check the fragment size histograms
(LIBRARY-6.hist) to check that they have the expected size and shape.
You may have to replace them with a synthetic histogram. You should
also try decreasing the minimum number of reads needed from the Fosmid
and BAC-end libraries required for scaffolding (default 10) with the
parameter `LIBRARY_n=3` (for example, depending on your coverage).

Hope this helps. Cheers,
Shaun

On 8 August 2013 09:49, Jennifer Shelton
> --
> You received this message because you are subscribed to the Google Groups
> "ABySS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to abyss-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Jennifer Shelton

unread,
Sep 30, 2013, 5:06:35 PM9/30/13
to abyss...@googlegroups.com, Jennifer Shelton, Shaun Jackman
Thanks,

I am trying your suggestions. Abyss failed to create a histogram and the .dist files appear incomplete (e.g. 465609 ;
465630 ;465661 ;465901 ;466226 ;466325 ;466354 ;). It sounded like this would happen from your comments but I thought I should try to be sure.

When you suggest creating a synthetic histogram, have you ever done this? Is there anything in particular I should keep in mind (e.g. the number of pairs listed in the histogram must accurately match the number of pairs in the library). Should I also attempt to create a .dist file?

Thanks again,
Jennifer

Shaun Jackman

unread,
Oct 19, 2013, 8:55:35 PM10/19/13
to Jennifer Shelton, ABySS, Jennifer Shelton
Hi Jennifer,

Yes, I have had occasion to use a synthetic histogram. The number of pairs in the histogram does not need to add up to any particular number. You can create the histogram with whichever tool suits you, but RStudio works well. Once the histogram file is created, you can resume the abyss-pe pipeline. It will start up where it left off. Use the `abyss-pe --dry-run` option to make sure of this. The next command that it will run should be DistanceEst.

Good luck! Cheers,
Shaun

Alejandro Sanchez

unread,
Oct 19, 2013, 9:08:26 PM10/19/13
to Shaun Jackman, Jennifer Shelton, Jennifer Shelton, ABySS

Hi all,

If you allow me, I can recommend you GARM for this task.

http://garm-meta-assem.sourceforge.net

Is very easy yo use. We'll release a new version soon and it will be published un a couple of months.

Let me know if you need some help.

Cheers.

hicham538

unread,
Jul 11, 2014, 10:10:08 AM7/11/14
to abyss...@googlegroups.com, sjac...@gmail.com, kstate.bio...@gmail.com, jennifer...@gmail.com, ale...@ibt.unam.mx
Hi Alejandro,
I've installed GARM and all seem be installed good, but when I tried it with example data i the error above,

have you an idea what could be the problem?

Thank you very much

----------------------------------------------------------------------------------------------------------------------------------------------------
4: FINISHING DATA
sh: contigs.fa.modname.fasta.min_200_split_all.fasta.vs.fasta.mgaps: No existe el fichero o el directorio
ERROR: postnuc returned non-zero
cat: contigs.fa.modname.fasta.min_200_split_all.fasta.vs.[0-9]*.coords: No existe el fichero o el directorio
RUNNING O-L-C PIPELINE ##
0 sequence found
0 sequence found

fasta files empty or wrong location...
readline() on closed filehandle COORDS at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 24.
readline() on closed filehandle FASTA at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 208.
readline() on closed filehandle FASTA at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 208.
readline() on closed filehandle FASTA at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 208.
readline() on closed filehandle FASTA at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 208.
readline() on closed filehandle FASTA at /mnt/scratch/users/pab_001_uma/hicham/sseneg_genomics/join_scaffolds/join_garm/GARM/lib/GetData.pm line 208.
454.genome1_scaffold00011_1     I can't find this sequence... ARGH!!!!
454.genome1_scaffold00011_2     I can't find this sequence... ARGH!!!!
454.genome1_scaffold00011_3     I can't find this sequence... ARGH!!!!
.
.
.
.

----------------------------------------------------------------------------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages