Hi Theodore,
I have CC'd Ben Vandervalk, who is the lead developer of ABySS.
I think it really depends on what kind of biology of the genome you are interested in. For example, a gene represented by multiple contigs could be scaffolded together.
And, you can still scaffold the contigs with just the paired end reads. However, you would get longer scaffolds from a mate-pair library.
Regards,
Ka Ming
--
Ka Ming Nip
Graduate Student | Dr. Inanc Birol Lab (BTL)
Canada's Michael Smith Genome Sciences Centre
________________________________________
From:
trans...@googlegroups.com [
trans...@googlegroups.com] On Behalf Of Tianhe Yu [
theod...@gmail.com]
Sent: February 22, 2017 6:16 PM
To:
trans...@googlegroups.com
Subject: Re: Benchmark for Abyss
Hi Ka Ming,
Thanks a lot for your last reply, I am actually using the E.coli data as input and it works now. However, I am a bit concerned about the scaffold part.
I read from abyss website that the abyss-pe can only assemble the fragments to contigs whereas we have to have mate-pair data to assemble them to scaffold. So my questions are:
1. From biology point of view, is it meaningful if we assemble it only to contigs but not scaffolds?
Just like the E.coli data you mentioned above, presumably it will never be able to assemble it to scaffolds if it doesn't have mate-pair library.
2. Will you suggest another data set that has mate-pair library and be less than several GB?
Thanks a lot!!
Regards,
Theodore
From:
trans...@googlegroups.com<mailto:
trans...@googlegroups.com> [
trans...@googlegroups.com<mailto:
trans...@googlegroups.com>] On Behalf Of tianhe yu [
theod...@gmail.com<mailto:
theod...@gmail.com>]
Sent: February 16, 2017 9:15 PM
To: Trans-ABySS
Subject: Benchmark for Abyss
Hi guys,
I am a CS student trying to accelerate the Abyss algorithm from pure computer science perspective.
Currently we just started profiling the job, which is evaluating the time consumption for different part of the assembly process.
We only care which part takes how much time and will try to accelerate the most time-consuming part first.
I only found the "Abyss for Intel Xeon Processor" document which uses ERR194147, a 2*50GB human genome data, which is too large for our job.
An input data set less than 1G is large enough for us to profile it. Any suggestions on which specific data set can we use for this?
Thanks a lot in advance.
Cheers,
Theodore
--
You received this message because you are subscribed to the Google Groups "Trans-ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
trans-abyss...@googlegroups.com<mailto:
trans-abyss%2Bunsu...@googlegroups.com><mailto:
trans-abyss...@googlegroups.com<mailto:
trans-abyss%2Bunsu...@googlegroups.com>>.
To unsubscribe from this group and all its topics, send an email to
trans-abyss...@googlegroups.com<mailto:
trans-abyss%2Bunsu...@googlegroups.com>.
You received this message because you are subscribed to the Google Groups "Trans-ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
trans-abyss...@googlegroups.com<mailto:
trans-abyss...@googlegroups.com>.