CASAVA 1.6 GERALD

Sivakumar Gowrisankar

unread,

Apr 14, 2010, 11:24:23 AM4/14/10

to sol...@googlegroups.com

Hi

I have been very frustrated with the data analysis of multiplexed samples from Illumina.
Here is my problem

I have a 76 cycle read1 and a 7 cycle read2 index. Since we are a core lab we do not actually care about what those indexes are. We however want to account for the indexes and separate them before alignment. Earlier versions used to have the option of given USE_BASES Y76I7. This would actually separate the index and place it on the header sequence in the s_N_sequence.txt file. However with the new version this option says "Read too long". If I use USE_BASES Y76,I7 it truncates the read to 76 cycles and ignores the index altogether without placing it on the header of the s_N_sequence.txt file. I do not want to use the demultiplex.pl option as I do not know what barcodes are present. There is a --qseq-mask option in the demultiplex.pl program which I can supposedly use IF I know which barcodes are present.

Overall I find handling multiplexed data very confusing with Illumina's pipeline program. Does anyone else face similar problems?

Thanks
Siva

Sivakumar Gowrisankar
Bioinformatician
Partners Healthcare Center for Personalized Genetic Medicine

Leath Tonkin

unread,

Apr 14, 2010, 12:14:57 PM4/14/10

to sol...@googlegroups.com

Hi Siva,

You are probably stuck using the demultiplex option. Just make a sample sheet and put the number "0" (zero) in the barcode field. This will bin everything into an "unknown" directory and you should be able to proceed from there. We mix regular and index libraries on the same run and I'm extremely frustrated too. It's been almost a month and I still don't have Gerald working correctly for the non-indexed lanes. I ended up concatenating all the qseq files and dumping them on the end customer to sort out all the reads.

Good luck and I hope this helps.

Leath

Leath Tonkin, PhD

Manager, Vincent J. Coates Genomics Sequencing Laboratory

QB3/University of California, Berkeley

B206 Stanley Hall

MC 3220

Berkeley, CA 94720-3220

(510) 666-3372

lto...@berkeley.edu

http://qb3.berkeley.edu/gsl/<--New website address! Please use only this to bookmark the site.

--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.

Abhishek Pratap

unread,

Apr 14, 2010, 12:18:09 PM4/14/10

to sol...@googlegroups.com

Hi Siva

I see what you are saying. I share the frustration you had with demultiplexing the runs but normally one is aware of the indexes used by the wet lab folks.

In case you are using CASAVA-v1.6 pipeline, I guess you can still demultiplex . I have used this option USE_BASES Y76,I6n,Y76 . If this doesnt work well with CASAVAv1.6 it surely does work with GA 1.4 GERALD

I6n is important. Due to noise the lase base of the index is ignored.

Let me know if you have more questions.

-Abhi

--

Eric Cabot

unread,

Apr 15, 2010, 1:42:43 PM4/15/10

to sol...@googlegroups.com

Hi Leath,

If I recall correctly, samples associated with a zero in the barcode
field of the SampleSheet file will have the actual index sequences
stripped from the index field of the sequence files in the unknown
directory. (A statement from Illumina suggests that this is an
oversight). From looking at various files in unknown directories I've
come up with an (untested) work-around: Use as an index some number that
is not in use -- eg. if you don't expect a given index sequence in your
samples, use it in the SampleSheet. The resulting sequence files in
unknown won't be index-stripped so you can sort the reads accordingly. I
use a Perl program for this purpose.

While we are talking about de-multiplexing with CASAVA 1.6, be aware of
the fact that the multiplexedGERALD.pl program doesn't support threading.
For this reason, I usually navigate to each resulting GERALD directory and
run the make files directly with the -j option.

Eric L. Cabot, Ph.D.
Lead Bioinformaticist
Advanced Genome Analysis Resource
University of Wisconsin Biotechnology Center

Leath Tonkin wrote:
> Hi Siva,
>
> You are probably stuck using the demultiplex option. Just make a sample
> sheet and put the number "0" (zero) in the barcode field. This will bin
> everything into an "unknown" directory and you should be able to proceed
> from there. We mix regular and index libraries on the same run and I'm
> extremely frustrated too. It's been almost a month and I still don't
> have Gerald working correctly for the non-indexed lanes. I ended up
> concatenating all the qseq files and dumping them on the end customer to
> sort out all the reads.
>
> Good luck and I hope this helps.
>
> Leath
>
> Leath Tonkin, PhD
> Manager, Vincent J. Coates Genomics Sequencing Laboratory
> QB3/University of California, Berkeley
> B206 Stanley Hall
> MC 3220
> Berkeley, CA 94720-3220
> (510) 666-3372

> lto...@berkeley.edu <mailto:lto...@berkeley.edu>

> http://qb3.berkeley.edu/gsl/<--New website address! Please use only this
> to bookmark the site.
>
>
>
> On Apr 14, 2010, at 8:24 AM, Sivakumar Gowrisankar wrote:
>
>> Hi
>>
>> I have been very frustrated with the data analysis of multiplexed
>> samples from Illumina.
>> Here is my problem
>>
>> I have a 76 cycle read1 and a 7 cycle read2 index. Since we are a core
>> lab we do not actually care about what those indexes are. We however
>> want to account for the indexes and separate them before alignment.
>> Earlier versions used to have the option of given USE_BASES Y76I7.
>> This would actually separate the index and place it on the header
>> sequence in the s_N_sequence.txt file. However with the new version
>> this option says "Read too long". If I use USE_BASES Y76,I7 it
>> truncates the read to 76 cycles and ignores the index altogether
>> without placing it on the header of the s_N_sequence.txt file. I do

>> not want to use the demultiplex.pl <http://demultiplex.pl/> option as

>> I do not know what barcodes are present. There is a --qseq-mask option

>> in the demultiplex.pl <http://demultiplex.pl/> program which I can

>> supposedly use IF I know which barcodes are present.
>>
>> Overall I find handling multiplexed data very confusing with
>> Illumina's pipeline program. Does anyone else face similar problems?
>>
>> Thanks
>> Siva
>>
>>
>> Sivakumar Gowrisankar
>> Bioinformatician
>> Partners Healthcare Center for Personalized Genetic Medicine
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "solexa" group.
>> To post to this group, send email to sol...@googlegroups.com

>> <mailto:sol...@googlegroups.com>.

>> To unsubscribe from this group, send email to
>> solexa+un...@googlegroups.com

>> <mailto:solexa+un...@googlegroups.com>.

Leath Tonkin

unread,

Apr 15, 2010, 3:06:54 PM4/15/10

to sol...@googlegroups.com

Hi Eric,

In my case, when I've used zero in the barcode field, the barcode gets incorporated into the read 1 qseq file in the unknown foldeer that results from the demultiplexing script. I just looked at one of the qseq files and verified the presence on the index read to make sure my memory hadn't gone completely. However, I don't know if running Gerald will strip the index out in the downstream sequence.txt or export.txt files with zero for the barcode.

When I was on the phone with Illumina about CASAVA1.6, they mentioned there is a CASAVA 1.7 beta that will parallelize multiplexedGerald without going into all the subdirectories.

Leath

Leath Tonkin, PhD

Manager, Vincent J. Coates Genomics Sequencing Laboratory

QB3/University of California, Berkeley

B206 Stanley Hall

MC 3220

Berkeley, CA 94720-3220

(510) 666-3372

lto...@berkeley.edu

http://qb3.berkeley.edu/gsl/<--New website address! Please use only this to bookmark the site.

hemant kelkar

unread,

Apr 16, 2010, 1:01:01 PM4/16/10

to sol...@googlegroups.com

Leath,

We have found that one can run demultiplexGerald jobs in "parallel" in all 12 directories by using "distmake" on SGE. Be sure to specify 12 job slots when you qsub.

Hemant

On Thu, Apr 15, 2010 at 3:06 PM, Leath Tonkin <lto...@berkeley.edu> wrote:

When I was on the phone with Illumina about CASAVA1.6, they mentioned there is a CASAVA 1.7 beta that will parallelize multiplexedGerald without going into all the subdirectories.

Leath

Leath Tonkin, PhD
Manager, Vincent J. Coates Genomics Sequencing Laboratory
QB3/University of California, Berkeley
B206 Stanley Hall

MC 3220
Berkeley, CA 94720-3220
(510) 666-3372
lto...@berkeley.edu
http://qb3.berkeley.edu/gsl/<--New website address! Please use only this to bookmark the site.

Reply all

Reply to author

Forward