Multiplexing in a flow cell lane by bar coding

Kevin M. Carr

unread,

Aug 7, 2008, 10:04:40 AM8/7/08

to Solexa User Group

Hello,

I have a researcher who wishes to do targeted resequencing of a ~100kb
region from 100-200 individuals. We are planning to generate the targeted
DNA by pooling of tiled long-range PCR products. Obviously the amount if
sequence generated in a single lane of the GA flow cell is sufficient to
cover many individuals. I was hoping to use bar coding of the individual
samples (exactly like the 454 MID tag procedure) to multiplex samples in
each lane of flow cell. I have heard tell that there are people out there
doing this with Illumina but my searches for specifics have thus far proved
fruitless.

Do any of you good folks have any experience or protocols you might like to
share for multiplexing samples by bar code in an Illumina GA run.

Thank in advance,

Kevin M. Carr

**************************
Bioinformatics Specialist
Research Technology
Support Facility
202-D Biochemistry Bldg.
Michigan State University
East Lansing, MI 48824

Ph: (517) 353-6794
Fax:(517) 353-8638
**************************

Brenton

unread,

Aug 8, 2008, 7:46:25 AM8/8/08

to solexa

We have been doing multiplexing by using PCR primers that have the
Illumina GA sequencing primer, a 3 nt barcode, followed by a common
region the primes all of the sequences we are interested in looking
at. In our case, we are studying alternative splicing and using
constitutive exon sequences for the common region, but I imagine you
could ligate a set of linkers to your DNA fragments and then us
barcoded PCR primers to index them. The problem with this, of course,
is that you sacrifice a good portion of the read for the common region
- in our case this isn't a problem, but I would imagine that for most
applications it would be. Nonetheless, using this technique we can
pool up to 64 samples in one lane and easily tease them apart after
the run. It works great.

Cheers,

Brent

Kittler, Ellen Ph. D.

unread,

Aug 8, 2008, 9:28:25 AM8/8/08

to sol...@googlegroups.com

Our users have employed several strategies. One of the more successful
is to make our own version of the Adapters (linkers) using the Illumina
sequence with a
6nt bar code at the 3' end. Then proceed as usual with the Illumina PCR
primers and sequencing with the Illumina sequencing primer. We use
non-barcoded positions for a first mini-run to get an offsets and
matrix, then run the whole flow cell. For Eland, we mask those bar code
positions by changing the GERALD.cfg to
#:USE_BASES nnnnnnYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
#:READ_LENGTH 30

Is there a better way to do this ?

~ Ellie & Nemo
http://www.umassmed.edu/nemo

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ellen LW Kittler, PhD
UMass Center for AIDS Research, and
Program in Molecular Medicine
UMass Medical School
373 Plantation Street, Room 207
Worcester MA 01605
(508) 856 - 6137 phone
(508) 856 - 6187 fax

John

unread,

Aug 11, 2008, 3:08:24 PM8/11/08

to solexa

Hi Kevin,

From an Illumina seminar this past Friday we learned that "sample
indexing" is currently in beta testing with a six base index per
sample. Initially this will enable multiplexing of 12 samples per
lane because the first release will include 12 six-base indexes. I
also understand that the paired-end module is required for indexing
with the GA.

Regards,
John

Kevin M. Carr

unread,

Aug 11, 2008, 6:12:05 PM8/11/08

to Solexa User Group

John,

I had also asked this question directly to Illumina and the they told me the
basically what you did; i.e. "We're going to have a kit out some time this
year", but they didn't mention the paired-end requirement (which would be a
deal killer since we don't have the paired-end unit). I can't really
understand why one would need the paired-end to make this work if it is just
a 6nt index stuck between the adapter and your DNA. Do you recall the
reason why the paired-end module was required?

Kevin M. Carr

**************************
Bioinformatics Specialist
Research Technology
Support Facility
202-D Biochemistry Bldg.
Michigan State University
East Lansing, MI 48824

Ph: (517) 353-6794
Fax:(517) 353-8638
**************************

John

unread,

Aug 13, 2008, 1:00:35 PM8/13/08

to solexa

Kevin,

I am given to understand that "sequence indexing" is a paired-end
protocol: one read of will read the tag and the other read will read
the index. The reason that this is not done as a single read is that
the same index would be read repeatedly and this would adversely
affect the matrix, the phasing and pre-phasing. Doing it as a paired-
end read was considered the best way to get around the matrix and
phasing/pre-phasing problem.

John Garner
Computational Biologist
NIH/NIA

Kevin M. Carr

unread,

Aug 13, 2008, 1:58:47 PM8/13/08

to Solexa User Group

John,

Illumina confirmed that their indexing scheme is based on the paired-end
protocol but they did not elaborate on the reasons. Your explanation seems
reasonable. However I would think it would be sufficient to run one lane of
control DNA and use that exclusively for doing matrix/phasing calculations;
that is what we do now for DGEx runs since you can't count on the DGEx tags
to be balanced.

I also found a very brief description of the indexing protocol in an
Illumina document on the GAII upgrade
(http://www.illumina.com/seminars/archive/Illumina_GA_Update.pdf). From
this it appears that the index sequence sits between the adapter and the
sequence anchoring the DNA to the flow cell. They introduce a third primer
and sequencing step (beside the two for the paired-end) to sequence the
index.

I found another presentation by a company called Fasteris Life Sciences
(http://www.fasteris.com/pdf/2008-06-06_Fasteris_Seminar.pdf) which
describes indexing single reads in the manner I had envisioned, by placing a
2-4nt index between the adapter and your DNA. They don't describe how they
handle the matrix/phasing calculation issue, but my guess would be using a
control lane.

I guess I will be designing my own indexed adapters. I know that the
indexes can not be totally random. For example you would not want a
situation where missing a base incorporation on an index make it appear to
be another index in your set.

Do the good folks here have any other thoughts on index sequence design?

Thanks,

james@CRUK

unread,

Aug 14, 2008, 6:34:46 AM8/14/08

to solexa

Hi Kevin,
We have users that have tried the barcoding with unique adapters and
sacrificing 4bp of the 3' sequence. It works well and for most
applications does not really lose too much sequence. I think for a
100kb region a 31bp read or possibly a 41bp read (with lower quality
at the 5' end) would align nicley, you have such a small bit of the
genome present.
You will probably get all 200 individuals in a single lane this way.
Our first run was four samples mutliplexed and the data was as good
as single sample analysis.

To clarify the Paired End indexing strategy. Illumina are using the
PE module to perform a second read AFTER the first real sequencing
read. Tihs is a 6bp run and is quick, means you get high quality
sequence on all reads, and could be extended to allow very redundant
barcodes to remove the possibility of mistaking one samples for
another due to sequencing errors. It requires unique 5' adapters for
each barcode, 96 would seem like a sensible number of barcode adapter
molcules to make but I hear that illumina will not be releasing this
many, they may release the sequences (and modifications?) in the near
future though and we can stop pretending they are confidential!
If you are doing PE runs then I believe the barcode is the middle
read.
This strategy makes a PE module almost a requirement for most GAs.
James.

Dan Turner

unread,

Aug 14, 2008, 9:08:32 AM8/14/08

to solexa

No, they're not going to tell you - the way the indexes are added
means that the primer with the index has no modifications. It's
cheaper like this: it costs a fortune to get your primers modified in
the secret way, and it wouldn't be viable to get it done to each of
your e.g. 96 indexing primers.

Dan

> > **************************- Hide quoted text -
>
> - Show quoted text -

ECO

unread,

Aug 15, 2008, 11:25:40 AM8/15/08

to solexa

Hi Kevin,

We are dealing with s similar situation on the SOLiD and are designing
our own barcodes. A very helpful paper is:

http://www.nature.com/nmeth/journal/v5/n3/abs/nmeth.1184.html

...which helps with misassignment errors.

-=Eric

elaneyk

unread,

Aug 18, 2008, 10:31:23 AM8/18/08

to solexa

Hi everyone,

On the subject of illumina indexes, have a look at page 128 of this
document for an illustration of the method:
http://www.wipo.int/pctdb/images4/PCT-PAGES/2008/152008/08041002/08041002.pdf,

Elaine

Aaron Liston

unread,

Aug 27, 2008, 1:53:52 PM8/27/08

to solexa

Kevin et al: Our paper on Solexa multiplexing is now online at
http://nar.oxfordjournals.org/cgi/content/full/gkn502

Kevin M. Carr

unread,

Aug 27, 2008, 6:50:10 PM8/27/08

to Solexa User Group

Aaron,

Thanks for posting this; in many ways it is very similar to what our
researcher here wants to do so will be very helpful. After reading the
paper I have a couple of questions if you don't mind.

First, you used tiled PCR to cover the cp genome which is what we suggested
to our researcher to cover the 100kb region he is interested. Given the
disparity in coverage you observed for the various amplicon tiles would you
now choose a different method; for example a DNA capture array?

Second, your supplementary material shows that you modified the adapters for
both ends. I may be missing something but why modify the "1" adapters (e.g.
CCT1, GGT1). These end up at the other end of the DNA strand from the
sequencing primer so it seems to me that a common adapter for this end
should be fine. Am I not understanding the library preparation protocol for
the Illumina properly?

Finally, you showed that you can get misincorporations in your index
sequences. Given the sequences you chose it appears that you could
misassign reads if there is a particular misincorporated base. For example,
in the MPLX S1 sample two of the indexes you used were AAT and ATT; an A->T
substitution in the first case or T->A in the second would change the
assignment of that read. It seems that there isn't any way to detect and
correct these. Did you just accept that this may happen but that the
frequency was low enough to ignore?

Thanks,

Kevin M. Carr

**************************
Bioinformatics Specialist
Research Technology
Support Facility
202-D Biochemistry Bldg.
Michigan State University
East Lansing, MI 48824

Ph: (517) 353-6794
Fax:(517) 353-8638
**************************

> From: Aaron Liston <lis...@science.oregonstate.edu>
> Reply-To: Solexa User Group <sol...@googlegroups.com>
> Date: Wed, 27 Aug 2008 10:53:52 -0700 (PDT)
> To: Solexa User Group <sol...@googlegroups.com>
> Subject: Re: Multiplexing in a flow cell lane by bar coding
>
>

Aaron Liston

unread,

Aug 28, 2008, 12:29:06 PM8/28/08

to solexa

Kevin: See my replies below:

On Aug 27, 3:50 pm, "Kevin M. Carr" <ca...@msu.edu> wrote:
> Aaron,
>
> Thanks for posting this; in many ways it is very similar to what our
> researcher here wants to do so will be very helpful. After reading the
> paper I have a couple of questions if you don't mind.
>
> First, you used tiled PCR to cover the cp genome which is what we suggested
> to our researcher to cover the 100kb region he is interested. Given the
> disparity in coverage you observed for the various amplicon tiles would you
> now choose a different method; for example a DNA capture array?

We have explored some alternative methods, but so far none have the
robustness of tiled PCR.

>
> Second, your supplementary material shows that you modified the adapters for
> both ends. I may be missing something but why modify the "1" adapters (e.g.
> CCT1, GGT1). These end up at the other end of the DNA strand from the
> sequencing primer so it seems to me that a common adapter for this end
> should be fine. Am I not understanding the library preparation protocol for
> the Illumina properly?

The adapters need to be complementary and double stranded at their 3'
end for the ligation step of library preparation. Note that we also
retained the T overhang for cloning.

>
> Finally, you showed that you can get misincorporations in your index
> sequences. Given the sequences you chose it appears that you could
> misassign reads if there is a particular misincorporated base. For example,
> in the MPLX S1 sample two of the indexes you used were AAT and ATT; an A->T
> substitution in the first case or T->A in the second would change the
> assignment of that read. It seems that there isn't any way to detect and
> correct these. Did you just accept that this may happen but that the
> frequency was low enough to ignore?

Correct. Since we average 80X coverage for our targets, we have been
able to ignore these misassignments. We have now begun using 4 bp tags
with 2 mismatches between them, and this further reduces their
frequency.
Aaron

> Thanks,
>
> Kevin M. Carr
>
> **************************
> Bioinformatics Specialist
> Research Technology
> Support Facility
> 202-D Biochemistry Bldg.
> Michigan State University
> East Lansing, MI 48824
>
> Ph: (517) 353-6794
> Fax:(517) 353-8638
> **************************
>

Kevin M. Carr

unread,

Aug 28, 2008, 2:45:03 PM8/28/08

to Solexa User Group

Aaron Liston wrote:
>
> The adapters need to be complementary and double stranded at their 3'
> end for the ligation step of library preparation. Note that we also
> retained the T overhang for cloning.

The sound you hear is me slapping my head and grunting a Homer Simpsonesque
DOH!.

The fact is I had not completely grokked the nature of the Illumina "Y"
adapters for library preparation. The tumblers finally clicked, thanks.

Thanks also for your answers to the other questions.

james@CRUK

unread,

Aug 29, 2008, 5:32:30 PM8/29/08

to solexa

How do readers of this thread think about the idea of longer barcodes
that could be unique? 10bp would give 1M barcodes for instance whch
would allow lots of error checking and little chance of
misinterpretation due to contamination of runs. As the reads speed up
and get longer then this would seem reasonably simple to incorporate
into our process. Comments.

Melissa Wong

unread,

Sep 2, 2008, 3:35:56 AM9/2/08

to solexa

This is the question I've been thinking lately. Why 6 nt barcode?
Longer and more unique barcodes is an attractive idea. SNP genotyping
using NGS will be possible provided you hire someone to sort out all
those data for you. :p
Instead of pooling more barcoded samples, isn't it better to have more
lanes in a flow cell?

james@CRUK

unread,

Sep 2, 2008, 2:45:55 PM9/2/08

to solexa

I believe the limit on lanes is down to fluid dynamics. It probably
makes more sense to have no lanes at all and purley rely on barcoding
to deconvolute samples. The Heliscope flow cell is huge and there is a
lot of room inside the GA...GA3 anyone?

solexer

unread,

Sep 23, 2008, 12:42:38 PM9/23/08

to solexa

Hi list,

sorry to jump in, but how about analysis of such multiplexed data?
Would you divide the Eland output in different files according to the
barcode (so end up with as many fiels as samples multiplexed) and then
feed it into SSAHA or Maq?

Thanks for your input,

Dave.

~S

unread,

Sep 26, 2008, 12:11:30 PM9/26/08

to solexa

good question..
I came here essentially for the bioinformatics concern on dealing with
this bar-coded data!

Naive way - grep for the code at start of the read, use those as
separate inputs to maq, ssaha..

David A.G

unread,

Sep 29, 2008, 11:30:56 AM9/29/08

to sol...@googlegroups.com

Thanks for the answer,

I take that if grep-ing is the naive way, there is another way , is there?
Has anyone seen the recent paper by DW Craig about modifying Illumina´s GA pipeline? (Nature Methods 2008 Sep 14)

Dave

> Date: Fri, 26 Sep 2008 09:11:30 -0700

> Subject: Re: Multiplexing in a flow cell lane by bar coding

> From: bioi...@gmail.com
> To: sol...@googlegroups.com

~S

unread,

Oct 2, 2008, 12:00:35 PM10/2/08

to solexa

I am looking for the 'another' way :)

On Sep 29, 10:30 am, David A.G <dasol...@hotmail.com> wrote:
> Thanks for the answer,
>
> I take that if grep-ing is the naive way, there is another way , is there?
> Has anyone seen the recent paper by DW Craig about modifying Illumina´s GA pipeline? (Nature Methods 2008 Sep 14)
>
> Dave
>
>
>
> > Date: Fri, 26 Sep 2008 09:11:30 -0700
> > Subject: Re: Multiplexing in a flow cell lane by bar coding

> > From: bioinf...@gmail.com

> > To: sol...@googlegroups.com
>
> > good question..
> > I came here essentially for the bioinformatics concern on dealing with
> > this bar-coded data!
>
> > Naive way - grep for the code at start of the read, use those as
> > separate inputs to maq, ssaha..
>
> > On Sep 23, 11:42 am, solexer <dasol...@hotmail.com> wrote:
> > > Hi list,
>
> > > sorry to jump in, but how about analysis of such multiplexed data?
> > > Would you divide the Eland output in different files according to the
> > > barcode (so end up with as many fiels as samples multiplexed) and then
> > > feed it into SSAHA or Maq?
>
> > > Thanks for your input,
>
> > > Dave.
>

> _________________________________________________________________
> Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friend...

~S

unread,

Oct 13, 2008, 3:21:39 PM10/13/08

to solexa

The link does not work anymore !

On Aug 18, 9:31 am, elaneyk <elan...@gmail.com> wrote:
> Hi everyone,
>
> On the subject of illumina indexes, have a look at page 128 of this

> document for an illustration of the method:http://www.wipo.int/pctdb/images4/PCT-PAGES/2008/152008/08041002/0804...,

elaneyk

unread,

Oct 14, 2008, 9:59:03 AM10/14/08

to solexa

Apologies for that try this link:
http://www.wipo.int/pctdb/en/wo.jsp?wo=2008041002&IA=GB2007003798&DISPLAY=DOCS

or go to http://www.wipo.int/portal/index.html.en and search for "WO/
2008/041002", the information that i found useful was in the "Initial
Publication without ISR" document under the documents tab. Also you
can search for "solexa chesterford" and find a whole plethora of
patents related to solexa/illumina sequencing technology.

**It's handy to know that the patents list the catalogue numbers of
some of the sample prep kit components ;-) A word of warning however,
the components can differ between protocols e.g. the phusion enzyme
for genomic prep is F531 and for mRNA-seq prep it's F530, same enzyme,
completely different concentrations.....**

Elaine

james@cancer

unread,

Oct 17, 2008, 7:18:03 AM10/17/08

to solexa

Has everyone interested in barcoding read the Craig et al Nature
Methods paper,
http://tinyurl.com/6xjctj
It is a good one on the design, use and analysis of 5' indexes.
Illumina will produce a pre-release a protocol soon though that will
probably allow barcodinh. I suspect this might be as described above
by Craig. Hopefully the official Paired-end module barcoding come soon
after.
ABI are talking about releasing a 96well indexing kit by the endo of
this year!

Perera, Anoja

unread,

Oct 22, 2008, 5:57:36 PM10/22/08

to sol...@googlegroups.com

Yes, and the protocol is pretty straightforward. I think this method will be ideal for cases where paired-end might be too much. We plan to test this out very soon.

-----Original Message-----
From: sol...@googlegroups.com [mailto:sol...@googlegroups.com] On Behalf Of james@cancer
Sent: Friday, October 17, 2008 6:18 AM
To: solexa
Subject: Re: Multiplexing in a flow cell lane by bar coding

Reply all

Reply to author

Forward