Geneome Sequencing Costs

108 views
Skip to first unread message

Mega

unread,
Feb 23, 2013, 8:13:42 AM2/23/13
to diy...@googlegroups.com
Hi, 

Here is an article that says that in 2012 sequencing 1 Mbp costed 0.1$ already?!? 


If that were true, one could have the genome of e.g. the bioluminescent fungus Panellus Stipticus sequenced without spending much money??
(Note, there's the European and the American P.Stipticus, while only the American is biouminescent! While I guess they share some 99.8% of the genes, the mutations (new genes) make it luminescent obviously) 

http://mycor.nancy.inra.fr/blogGenomes/?tag=mushroom Here is the inky cap which has 37-megabases according to this page. That would mean 3.7$ for its entire genome? I would even pay 37$   :D


But is it possible to get this sequenced as a private customer (or also via university) at this price?? 



Jeswin

unread,
Feb 23, 2013, 10:00:10 AM2/23/13
to diy...@googlegroups.com
On Sat, Feb 23, 2013 at 8:13 AM, Mega <masters...@gmail.com> wrote:
> Hi,
>
> Here is an article that says that in 2012 sequencing 1 Mbp costed 0.1$
> already?!?
> http://singularityhub.com/2011/03/05/costs-of-dna-sequencing-falling-fast-look-at-these-graphs/
>
>

How is this done nowadays? I looked at wikipedia on whole-genome
sequencing. They mention nanopore sequencing and pyrosequencing. Are
those methods up and running?

I looked at the shotgun sequencing method. If I understand correctly,
that needs the dna fragments to be subcloned? Won't that take many
hours? You can't just drop some purified DNA and let the machine piece
it together, right?

Cory Tobin

unread,
Feb 23, 2013, 3:20:44 PM2/23/13
to diy...@googlegroups.com
That price is somewhat misleading. The way genome sequencing works is
you break the DNA into many small pieces and sequence the small
fragments. To be able to assemble all these small reads into a large,
contiguous chromosome the reads need to be overlapping. In fact, you
need a lot of overlap. If you are sequencing a genome for the first
time (de novo) then you probably need 100x coverage to be able to get
a good assembly. That means for every base pair in your genome you
need at least 100 reads to cover that spot. If you are re-sequencing
an already sequenced genome the necessary coverage goes down, but you
still need more than 1x coverage for a number of different reasons,
such as error minimization.

So if you are de novo sequencing something with a 37Mbp genome, you
will probably need at least 3.7Gbp worth of sequencing.

A couple of other complicating factors...

You can't just give your sample to a sequencing company and say, "give
me 1 million bases worth of sequencing." Assuming you are using an
Illumina HiSeq machine, your sample is going to be loaded into a thing
called a flow cell. That flow cell will give you a certain quantity
of base pairs regardless of how much you paid. That quantity depends
on how well your sample was prepared, how long the reads are, the
phase of the moon etc. So the cost is fixed per flow cell regardless
of how many of those reads you actually needed to get good coverage of
your genome.

Also, you have to factor in the cost of sample preparation. A lot of
work goes into preparing a sample for sequencing and the company will
probably charge you for it unless they have already factored that in
to the price. If you do a DNA extraction/purification (which costs
money) and send it off for sequencing, the company may determine it's
not pure enough or the DNA is too fragmented and you will probably get
charged for the assays they ran even though you got zero data.

Additionally, assuming you are de novo sequencing, after your first
run you go to assemble all those reads into a full chromosome. You
will probably find that you can't get them to assemble into a single
contiguous chromosome. You get a bunch of fragments which vary in
size from kilobases to megabases, called "contigs." How do you fill
in the gaps? Assuming the gaps are caused by regions of long repeats,
you can try to do some very long read sequencing, like 454 or PacBio
to bridge the gaps. If you think the gaps are caused by regions of
very high GC content, you can try to do some more Illumina sequencing
with an alternative kit that favors GC, or at least doesn't bias
towards regions of 40-60% GC. If you still can't fill in the gaps you
could try to scaffold your contigs against the genome of a related
species which already has a finished genome. This will give you a way
to order your contigs and estimate the size of the gaps so you can try
to use PCR to amplify the gaps. If you can clone the gap into a
plasmid or BAC then you can start Sanger sequencing on the gap DNA.
All of this costs more more money.

The upshot is that (1) it's impossible to determine upfront how much
it will cost to get a complete of genome of X megabases, and (2) you
can't sequence your mushroom for $37, unfortunately.

Sorry for the wall of text :)

-cory

Andreas Sturm

unread,
Feb 23, 2013, 3:25:15 PM2/23/13
to diy...@googlegroups.com
Ok thanks!!  Though it's a pity ;) 

jlund256

unread,
Feb 24, 2013, 11:36:05 AM2/24/13
to diy...@googlegroups.com
Currently, an Illumina MiSeq run costs $900 in reagents and generates 4.5Gb of sequence.  The sequence is 2 x 150 bp paired end sequence.  As one of the other commenters mentioned, 100X sequencing is the standard coverage target for de novo sequencing.  This means 1-2 genomes can be combined and sequenced together in one run.  A university core can likely be found that would perform the run for $150-$500.   

This amount of sequence will allow the genome to be assembled into 50-200 contigs covering ~99% of the genome.  The gaps between contigs would be small.

A sequencing library needs to be prepared from the fungus DNA sample.  Short DNA fragments are generated and ligated to oligos with DNA barcodes and sequences compatible with the Illumina flow cell. 

Preparing a sequencing library using Illumina's NexteraXT kit is very easy, and costs about $50 / sample for a 24 sample kit.  Making a DIY sequencing library would cost about $250 in oligos, and and some minor other equipment (gel boxes, PCR setup, etc).

So the cost for one fungus genome with DIY sequencing library:
$400 sequencing library
$1050-$1400 sequencing run
------------------
$1450 - $1800 total

Sequencing two fungus genomes with a DIY sequencing library:
$700 seq. libraries
$1050-$1400 seq. run
------------------
$875 - $1000 total per genome

Sequencing two fungus genomes with a Nextera sequencing library:
--22 samples in the Nextera kit left over
$1500 Nextera libraries
$1050-$1400 seq. run
------------------
$1225 - $1450 total per genome

Cheers,

Jim Lund

Dietrich Dehlinger

unread,
Feb 26, 2013, 7:12:50 PM2/26/13
to diy...@googlegroups.com
You don't need to use the whole capacity of a single hiseq lane (of which there are 8-16). You can dump multiple samples in one lane and append a short DNA tag upfront that acts as an indexing code. 

Cathal Garvey

unread,
Feb 27, 2013, 8:55:41 AM2/27/13
to diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Separate molecules in the same lane, or ligated end-on-end? Wouldn't
parallel reactions lead to multiple peaks simultaneously, leading to
difficulty establishing which molecule has which nucleotide?*

*Unless you did something really hackish with real-time-quantitative
PCR methods and added your molecules in discernably different molar
ratios, so you could try to correlate emission peak height with each
separate solution... O_o

On 02/27/2013 12:12 AM, Dietrich Dehlinger wrote:


- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRLhBbAAoJEL0iNgSYi5CZ0ScP/RZtuVryfz+c0w7sLaq62uUn
7dL82T93nXSaFau+XyVg1P02nvGzblW2L2J2fYxu3QklOMHk4rziYly+oCY1YWKW
LzzK973da9CAl3j5L+iJBYS/CnxmtkqYXiG/HSVWJbgubF/DhxVnUAU0UrctT3KS
C2hGNzHHCqiBQLTPe8Bc9VgZS4vVIDGltna/KoNrdjdEAAFEocH5C5+DDlx7LIfF
qUsinEeKr8GJK4PCHTMzmH7GB58FXR9jf8oJVaPcqTTW3dE612ockCs+Opfxqhrm
RyrZwgzkaqhHJ5fnWPulFnbXyG+8BxL7C5/JvY2hE0lb4ruRvOaWEfRIi88eT7zs
xTTh8f74CnoXFcOQGo2m9lCheYY4LTDxQiWO2Qw+lwr78VAMqJAtYlx2DTblsglT
wGBF93riJZEUuXTBMCpHq0njcAFMJMIJfMIzbUTEAZTpi3qMPPK4TAAY85y7wrh8
EBFbJ+WzSzY6wFrOaK0K3mqe5QIlzj41g/UP688tc/wQKgbAge3Vau5s9Z3lSxNe
QgGgybfNUC9bup7rm9bPq7xSXXPv8+PeJ8xif+9Rl2g8aq13NK7KWu94rdBuymwP
bCCp4ghjLHx94kqPhs9p6wgzdiqlAIRer2ne/FjT5x+My6wD8JschNxf6ip06i4X
gLTlEgnKecmjSiEv5n7L
=OzmZ
-----END PGP SIGNATURE-----

Cory Tobin

unread,
Feb 27, 2013, 2:18:43 PM2/27/13
to diy...@googlegroups.com
> Separate molecules in the same lane, or ligated end-on-end? Wouldn't
> parallel reactions lead to multiple peaks simultaneously, leading to
> difficulty establishing which molecule has which nucleotide?*

Illumina HiSeq can handle millions of reads per lane. The problem
isn't separating the peaks, but rather differentiating which read came
from which sample, assuming you merged multiple samples together. The
solution is to ligate a different short sequence on to the end of each
sample. So, for example, sample 1 gets ATGGCA ligated on the end and
sample 2 gets GCCTAA. That way you can separate the reads out later.

-cory

jlund256

unread,
Feb 27, 2013, 4:23:23 PM2/27/13
to diy...@googlegroups.com, cathal...@cathalgarvey.me
The way the Illumina next gen sequencing works is different from Sanger sequencing.  In Illumina sequencing, DNA is amplified on the surface of a glass slide using primers attached to the surface.  The DNA is dilute enough that each PCR is seeded by a single strand of DNA, and they are far enough apart that a spot of PCRed DNA is separated from neighboring DNA fragments. 
 
Sequencing is done by adding fluorescently labeled nucleotides with a reversible terminator.  Only one base gets added to each piece of DNA per step.  A picture is taken of the surface, and each DNA fragment spot is one of four colors.  Then the termintor is removed and the cycle gets repeated.  The sequence of colors at a particular spot indicates the sequence of that fragment.
Jim Lund

jlund256

unread,
Feb 28, 2013, 10:04:46 AM2/28/13
to diy...@googlegroups.com, cathal...@cathalgarvey.me
Here is some more info on next generation sequencing.  Illumina uses a reversible terminators, and other groups have developed other chemistries.  Basically, the 3' OH gets blocked, and then the blocking group is chemically or enzymatically cleaved:
http://jimlund.org/blog/pics/Metzker_2009.pdf
 
The other next generation sequencing companies (454, Ion Torrent, SOLID) do the sequencing differently, this talk has some useful slides on the other technologies:
https://www.iths.org/sites/www.iths.org/files/eventmedia/ITHS_2ndGenerationSequencingTalk.pdf
 
The Church lab's papers are worth looking up, they developed a DIY next gen system using emulsion PCR:
http://arep.med.harvard.edu/gmc_pub.html
Esp: Challenges of Sequencing by Synthesis (2009), Overview of DNA sequencing strategies (2008), and Polony DNA sequencing.(2006), and Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome (2005)
 
Jim Lund

Nathan McCorkle

unread,
Feb 28, 2013, 4:53:37 PM2/28/13
to diy...@googlegroups.com
Cool! My thinking is throw the reversible terminators in a pipeline
with terminal transferase, and voila, green chemistry DNA synthesis
(well depending on the reversible terminators!)
> --
> -- You received this message because you are subscribed to the Google Groups
> DIYbio group. To post to this group, send email to diy...@googlegroups.com.
> To unsubscribe from this group, send email to
> diybio+un...@googlegroups.com. For more options, visit this group at
> https://groups.google.com/d/forum/diybio?hl=en
> Learn more at www.diybio.org
> ---
> You received this message because you are subscribed to the Google Groups
> "DIYbio" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to diybio+un...@googlegroups.com.
> To post to this group, send email to diy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diybio?hl=en.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/diybio/-/pzXvYWv1Gz8J.
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
-Nathan

Patrik D'haeseleer

unread,
Feb 28, 2013, 5:50:55 PM2/28/13
to diy...@googlegroups.com
On Thursday, February 28, 2013 1:53:37 PM UTC-8, Nathan McCorkle wrote:
Cool! My thinking is throw the reversible terminators in a pipeline
with terminal transferase, and voila, green chemistry DNA synthesis
(well depending on the reversible terminators!)

That is precisely how a lot of DNA synthesis works. It's also how a lot of microarrays get synthesized, by using light-directed synthesis directly on the chip. In fact, there are strong links between easily/reliably achievable read lengths in DNA sequencing, oligo lengths in DNA synthesis, and probe lengths in microarrays.

Most of the current NGS methods are also called "sequencing-by-synthesis" precisely for this reason. A counter example would nanopore sequencing, where the DNA is read by pulling it through a pore, rather than by synthesizing a complementary strand one base at a time.

Patrik

Nathan McCorkle

unread,
Feb 28, 2013, 5:54:41 PM2/28/13
to diy...@googlegroups.com


On Feb 28, 2013 2:50 PM, "Patrik D&apos;haeseleer" <pat...@gmail.com> wrote:
>
> On Thursday, February 28, 2013 1:53:37 PM UTC-8, Nathan McCorkle wrote:
>>
>> Cool! My thinking is throw the reversible terminators in a pipeline
>> with terminal transferase, and voila, green chemistry DNA synthesis
>> (well depending on the reversible terminators!)
>
>
> That is precisely how a lot of DNA synthesis works. It's also how a lot of microarrays get synthesized, by using light-directed synthesis directly on the chip.

I thought that was normal phosphoramidite synthesis with a photogenerated acid for activation

In fact, there are strong links between easily/reliably achievable read lengths in DNA sequencing, oligo lengths in DNA synthesis, and probe lengths in microarrays.
>
> Most of the current NGS methods are also called "sequencing-by-synthesis"

Yeah but they use a template, terminal tranferase doesn't depend on a template to extend ssDNA

precisely for this reason. A counter example would nanopore sequencing, where the DNA is read by pulling it through a pore, rather than by synthesizing a complementary strand one base at a time.
>
> Patrik
>

> --
> -- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
> Learn more at www.diybio.org
> ---
> You received this message because you are subscribed to the Google Groups "DIYbio" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
> To post to this group, send email to diy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diybio?hl=en.

> To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/p6dgnbAU26AJ.

Jason Bobe

unread,
Mar 1, 2013, 9:57:09 PM3/1/13
to diy...@googlegroups.com, cathal...@cathalgarvey.me
On Thursday, February 28, 2013 10:04:46 AM UTC-5, jlund256 wrote:
Here is some more info on next generation sequencing. 

Illumina has a bunch of videos and tutorials on their site for various aspects of the miseq workflow.

I recently came across a company (haven't used them yet) that has developed an NGS service marketplace, basically trying to find and sell "excess capacity":

Jason
Reply all
Reply to author
Forward
0 new messages