Solexa short paired-end reads

13 views
Skip to first unread message

balanagireddy

unread,
Dec 4, 2009, 10:00:29 PM12/4/09
to solexa
Hi Everyone,

I am a graduate student from new york. I am developing a denovo
assembler for short paired-end data. I want to evaluate my assembler
on different sequencing machine data. I evaluated my assembler on ABI
solid data and the results are pretty good. As Solexa is the highly
used sequencing machine out there, I want to evaluate my assembler on
solexa data. But I had a hard time finding solexa data on the
internet. Can someone give pointers to short paired-end data generated
by solexa.

Thank you,

Regards
Bala

Abhishek Pratap

unread,
Dec 4, 2009, 10:26:32 PM12/4/09
to sol...@googlegroups.com
Hi Bala

You could use SRA rep from NCBI to find from solexa data. 



-Abhi


--

You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.



bala nagi reddy

unread,
Dec 4, 2009, 10:35:37 PM12/4/09
to sol...@googlegroups.com
Hi Abhishek,

Thanks for your reply. I downloaded from this website and the data is
in fastq format. Is there any tool to convert data in short read fastq
format to fasta format.

Thanks.

Regards
Bala Mudiam
+1 631 216 2388

bala nagi reddy

unread,
Dec 4, 2009, 10:38:26 PM12/4/09
to sol...@googlegroups.com
Hi,

I have one more question regarding this link.

http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=main&m=main&s=main

I searched for illumina term and forwarded to
http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=illumina

Is there a way to distinguish between short paired-end reads and
single ended reads.

Thanks for your help.

bala nagi reddy

unread,
Dec 4, 2009, 10:49:20 PM12/4/09
to sol...@googlegroups.com
Hi,

I figured out that I can use fq_all2std.pl, part of the maq open
source project to convert fastq format to fasta format.

After conversion the data looks like this.

>SRR000224.1
GGGGGTATAGAGATGAAATGCGGTGCACGAAGTTGCATAGCCGAAGCACAAAGTAAAATATCCACAGCATAAAGTCATATAATAACTGCATTAATTCTAA
>SRR000224.2
ATATTATGTTTAAATCCTATAAACAGTATATAACTTTTCTTTAGGTACAAAACTTTAAAGTAATTAAACTAAATAATGAATAACGTTATTGCTGGTTGGTAAAATACTAGACCATATTGG
>SRR000224.3
CAGATTTAAAAACTTTTTNGTTTTTATTCTTTATACTATTATTATCTCGTTAGAAATAAATAGTTGTAATAATGGTAGTTATGTAGATAAAGTAGCTGCTAGCAGCAT

The short paired reads input from solid to my program looks as below.

>a366827-
CATCATAGTGAACGTCCAGTGGCCTA
>b364679-
ATGAAATGACAAATGAGCCATTTGGT
>a80486+
ACTTACAGGGCTACATGAAACTTAAT
>b82452+
CATCGTCGTCCCTGCAACACAAAATA
>a538763+
CTTATATTCCAGGCATCTTGTGCCAA

Here, 1st k-mer corresponds to left k-mer and 2nd corresponds to l-mer.

How to deduce this from the fasta data, I convered above.

Thank you.

Regards
Bala Mudiam
+1 631 216 2388





On Fri, Dec 4, 2009 at 10:35 PM, bala nagi reddy
<balanag...@gmail.com> wrote:

praveen.premas

unread,
Dec 13, 2009, 12:01:20 PM12/13/09
to sol...@googlegroups.com
Dear Bala,
Let me try and get you some Solexa data for assembly.
Give me your FTP info.
cheers
Praveen Gupta

Reply all
Reply to author
Forward
0 new messages