HiSeq data archive

55 views
Skip to first unread message

Sivakumar Gowrisankar

unread,
Aug 12, 2010, 10:39:15 AM8/12/10
to sol...@googlegroups.com
Hello Everyone

We have a new HiSeq and I am trying to determine which files are worth storing for a longer time. For example, I am planned to store the following

1. Bcl Raw files
2. Fastq files (s_N_sequence.txt)
3. Export files (s_N_export.txt)

I am wondering if we will ever require the Bcl Raw files. The only reason I can think of is if someone wants to use the unfiltered (Before CHASTITY Filter) for their analysis. Given that HiSeq produces a tremendous amount of data (close to 100 million reads) with high %PF (>80%), has anyone encountered a researcher wanting ALL of the raw reads?

Also it would be nice to get input from the community as in what files are being archived for the HiSeq.

Thanks
Siva
PCPGM

Andrew Gagne

unread,
Aug 12, 2010, 10:46:53 AM8/12/10
to sol...@googlegroups.com
We routinely have users who want qseq or intesnty files from GA runs. 

We expect a number of users will want the BCL files, although the export file contains the unfiltered reads as well, so perhaps that would be enough?

andrew

--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.

Sivakumar Gowrisankar

unread,
Aug 12, 2010, 10:56:42 AM8/12/10
to sol...@googlegroups.com
That is true! But some users might want just the s_N_Sequence file in which case some lanes won't have the unfiltered reads. I guess to start with we will archive the BCL files as well and then phase it out if needed.

Siva

hemant kelkar

unread,
Aug 12, 2010, 11:01:18 AM8/12/10
to sol...@googlegroups.com
A number of people probably use SRF files for archiving/submitting data to GEO/SRA.

Can the "sequenceread" project handle data from HiSeq machines? What are people who already have HiSeq's doing?

- Hemant

Davide Cittaro

unread,
Aug 12, 2010, 11:03:31 AM8/12/10
to sol...@googlegroups.com

On Aug 12, 2010, at 5:01 PM, hemant kelkar wrote:

> A number of people probably use SRF files for archiving/submitting data to GEO/SRA.
>

It's a compact way to store data after all... also, the srf2fastq utility allows a one-step way from archive to fastq (already scaled in sanger format)

> Can the "sequenceread" project handle data from HiSeq machines?

I wish I had a HiSeq to test this :-)

d

--
Davide Cittaro
daweo...@gmail.com
http://daweonline.googlepages.com/

Davide Cittaro

unread,
Aug 12, 2010, 10:50:27 AM8/12/10
to sol...@googlegroups.com
Hi, 

On Aug 12, 2010, at 4:46 PM, Andrew Gagne wrote:

We routinely have users who want qseq or intesnty files from GA runs. 

We expect a number of users will want the BCL files, although the export file contains the unfiltered reads as well, so perhaps that would be enough?


We are going to buy a HiSeq, I'm pretty concerned about data storage... what are bcl files? Will it be possible to produce SRF (with staden package) from qseq files?

d

andrew

On Thu, Aug 12, 2010 at 10:39 AM, Sivakumar Gowrisankar <siv...@gmail.com> wrote:
Hello Everyone

We have a new HiSeq and I am trying to determine which files are worth storing for a longer time. For example, I am planned to store the following

1. Bcl Raw files
2. Fastq files (s_N_sequence.txt)
3. Export files (s_N_export.txt)

I am wondering if we will ever require the Bcl Raw files. The only reason I can think of is if someone wants to use the unfiltered (Before CHASTITY Filter) for their analysis. Given that HiSeq produces a tremendous amount of data (close to 100 million reads) with high %PF (>80%), has anyone encountered a researcher wanting ALL of the raw reads?

Also it would be nice to get input from the community as in what files are being archived for the HiSeq.

Thanks
Siva
PCPGM

--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.


--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.

Andrew Gagne

unread,
Aug 13, 2010, 5:06:37 PM8/13/10
to sol...@googlegroups.com
Illumina has stopped producing qseq files as that was one of the bottlenecks. I believe BCLs are basically equivalents of a line in a qseq file... you can still generate qseq files from bcl

Sivakumar Gowrisankar

unread,
Aug 13, 2010, 6:02:17 PM8/13/10
to sol...@googlegroups.com
As a matter of fact you have to generate qseq files from BCL to do any kind of processing. Even CASAVA 1.7 does not handle BCL files directly.

Bruce

unread,
Aug 13, 2010, 6:44:32 PM8/13/10
to solexa
Exactly right! Illumina has done nobody any favors with regards to bcl
files. Bcl files had always been generated, then automatically
converted to qseq (by bustard I think). They simply took out the
'automatic' step, now leave it up to us to convert to qseq, and claim
they are saving us lots of space. It will be a big deal when pipelines
make use of bcl's directly. With this said, they are consistently
doing a great job with yields, read #s, and quality.

On Aug 13, 6:02 pm, Sivakumar Gowrisankar <siv...@gmail.com> wrote:
> As a matter of fact you have to generate qseq files from BCL to do any kind
> of processing. Even CASAVA 1.7 does not handle BCL files directly.
>
>
>
> On Fri, Aug 13, 2010 at 5:06 PM, Andrew Gagne <aga...@gmail.com> wrote:
> > Illumina has stopped producing qseq files as that was one of the
> > bottlenecks. I believe BCLs are basically equivalents of a line in a qseq
> > file... you can still generate qseq files from bcl
>
> >>> solexa+un...@googlegroups.com<solexa%2Bunsu...@googlegroups.com>
> >>> .
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/solexa?hl=en.
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "solexa" group.
> >> To post to this group, send email to sol...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> solexa+un...@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/solexa?hl=en.
>
> >>  --
> >> Davide Cittaro
> >> daweonl...@gmail.com
> >>http://daweonline.googlepages.com/
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "solexa" group.
> >> To post to this group, send email to sol...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> solexa+un...@googlegroups.com<solexa%2Bunsu...@googlegroups.com>
> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/solexa?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "solexa" group.
> > To post to this group, send email to sol...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > solexa+un...@googlegroups.com<solexa%2Bunsu...@googlegroups.com>
> > .
Reply all
Reply to author
Forward
0 new messages