Difference in trimming behavior between DynamicTrim and SolexaQA?

296 views
Skip to first unread message

Michael Gooch

unread,
Nov 28, 2011, 4:59:04 PM11/28/11
to solexaq...@googlegroups.com, Everson,Richard, Singh,Prashant, Farooq,Umar (res)
I am a bit confused by the FAQ and limited information in the usage
strings provided by the scripts. Is there a difference in overall
trimming behavior between SolexaQA.pl and Dynamictrim.pl? (I know the
algorithms employed by the scripts are different as one uses R and
matrix2png, but I am asking more specifically about trimming here.) If
there are differences, what are the major differences? How does
Lengthsort.pl deal with mate pairs when one is too small for the cutoff?

M. Gooch

MPC

unread,
Nov 28, 2011, 5:37:24 PM11/28/11
to solexaqa-users
Hi Michael,

Yes, I think you've gotten a bit confused about what the programs
actually do.

SolexaQA:
Reports on sequence quality in a lane of Illumina data. SolexaQA does
not modify the original sequences in any way, including by trimming
them. However, SolexaQA does report what the sequences might look
like after trimming with DynamicTrim.

DynamicTrim:
Trims each read individually based on base quality scores.

LengthSort:
Sorts reads into size bins -- such as reads 50-bp and greater, and
reads smaller than this size threshold. For paired end data,
LengthSort will return two paired files with forward and reverse reads
greater than the threshold, a single file with unpaired reads greater
than the threshold, and a single file of discarded reads smaller than
the threshold. (There is more information in the DynamicTrim manual).

To give more information about trimming, the SolexaQA package
implements two trimming algorithms. The first returns the longest
contiguous read segment for which the quality score at each base is
greater than a user-supplied quality cutoff. The second returns the
result of the trimming algorithm implemented in BWA:

http://bio-bwa.sourceforge.net/bwa.shtml

The two trimming algorithms are described here:

http://solexaqa.sourceforge.net/questions.htm#trim

All of the above is implemented in plain Perl. R and matrix2png are
only used in the SolexaQA program, and then only to produce graphical
plots of data quality.

It might be worth creating some toy test files and playing around with
DynamicTrim and LengthSort. This should quickly show you how the
programs behave under any given scenario.

Best
-Murray

Reply all
Reply to author
Forward
0 new messages