Hi Nicole,
you can download the script "purge_PCR_duplicates.pl" from:
Click on "Raw", then save the page as ASCII text. Under Unix, make the file executable with:
$ sudo chmod +x purge_PCR_duplicates.pl
$ purge_PCR_duplicates.pl -h
... for more explanation.
I have added the calculation of median quality scores for all single-end reads within a set of PCR duplicates. This should be better than just randomly picking a quality score string from one PCR copy. Currently, the paired-end sequence (and its quality score string) of a unique fragment is just the first that has been found by the script. As soon as I have time again, I will add the determination of consensus sequence for the PE sequences in a set of PCR duplicates. Also, be aware that that each sequencing error in the single-end reads in almost all cases generates a new unique. That means, the output will have a higher proportion of reads with sequencing errors than the input. Many sequencing errors can be identified and corrected if they belong to a set of PCR duplicates (i. e. reads from the same restriction fragment). I will add that at a later stage.
If you have any questions or need assistance with running the script, don't hesitate to ask me,
claudius