Sven
unread,Jan 17, 2012, 6:00:14 PM1/17/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EA Utils
Hi,
I am the first one :-)
I am not sure this a problem, or not...
Maybe you can comment on that.
I have a (bad) illumina dataset, here one lane chip-seq.
I'd like to use fastq-mcf to
a) clip quality
b) clip primer
c) basic statistics
In a first approach I put three sequences (fasta formatted) in the
adapter file,
TruSeq_Universal_Adapter
P5_APr
P7_APr
all in 5'->3' direction.
Running fastq-mcf resulted in:
fastq-mcf -l 10 -P 33 -o MySample.fq_Clipped2 illuminaAdaptors.fasta
MySample.fastq
Scale used: 2.2
Phred: 33
Trim 'start': 1 from MySample.fastq
Threshold used: 251 out of 100000
Adapter TruSeq_Universal_Adapter
(AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT): counted
54623 at the 'end' of 'MySample.fastq', clip set to 1
Adapter P5_Amplification_Primer (AATGATACGGCGACCACCGAG): counted 54623
at the 'end' of 'MySample.fastq', clip set to 1
Files: 1
Total reads: 17536194
Too short after clip: 11405490
Clipped 'end' reads: Count: 3449370, Mean: 1.80, Sd: 0.89
Trimmed 14052427 reads by an average of 13.69 bases on quality < 7
Now I added the reverse-complement of the "reverse" primer P7 to the
adapter file (P7_APr_RevComp) and now I got:
fastq-mcf -l 10 -P 33 -o MySample.fq_Clipped3 illuminaAdaptors.fasta
MySample.fastq
Scale used: 2.2
Phred: 33
Trim 'start': 1 from MySample.fastq
Threshold used: 251 out of 100000
Adapter TruSeq_Universal_Adapter
(AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT): counted
54623 at the 'end' of 'MySample.fastq', clip set to 1
Adapter P5_APr (AATGATACGGCGACCACCGAG): counted 54623 at the 'end' of
'MySample.fastq', clip set to 1
Adapter P7_APr_RevComp (TCGTATGCCGTCTTCTGCTTG): counted 10582 at the
'end' of 'MySample.fastq', clip set to 2
Files: 1
Total reads: 17536194
Too short after clip: 7623003
Clipped 'end' reads: Count: 8224158, Mean: 13.85, Sd: 5.62
Trimmed 14052427 reads by an average of 13.69 bases on quality < 7
The final stats are (unexpectedly) different, especially:
Too short after clip: 11405490
vs.
Too short after clip: 7623003
How do I have to provide the adapter sequences? Always 5'->3' or "as
read" by the software?
Can you comment on the statistics? Something I have missed?
Thanks,
Sven