Hello, everyone,
We’re using PRINSEQ to make the quality control in some publicly available metagenomic sequences but we are facing some issues to clean paired-end samples from SRA. After running Prinseq using the following line:
perl
prinseq-lite.pl -verbose -fastq /SRA/SRR1275449_1.fastq -fastq2 /SRA/SRR1275449_2.fastq -min_len 100 -ns_max_p 1 -out_format 1 -out_good /SRA/SRR1275449_good -seq_id "SRR1275449_"
We just got the singletons output files and the bad files. The good output files weren’t generated. The Prinseq website says that should be both output files and the singletons files. Could anyone help us?
We downloaded the .sra files and extracted the fastq files using the following command line:
fastq-dump --gzip --skip-technical --readids --dumpbase --split-files --clip SRRXXX.sra > stats_SRRXXX.txt
We checked the output fastq and we got the following header format:
For the extracted ‘SRR1275449_1.fastq’ file:
@SRR1275449.1.1 HWI-ST332R:341:H81YRADXX:1:110
TGATTTAAATCAGTTGTTACCTCTCCTAAC
+SRR1275449.1.1 HWI-ST332R:341:H81YRADXX:1:110
BCBFFFFFHHHHHIJJJJJJJJJJJJJJJJ
@SRR1275449.2.1 HWI-ST332R:341:H81YRADXX:1:110
ACGAGGGCAGGGAACTCAGTTACTTTTTTC
+SRR1275449.2.1 HWI-ST332R:341:H81YRADXX:1:110
@@@DDDDFHHFHHJJJJGGEGIIIHIIJJJ
@SRR1275449.3.1 HWI-ST332R:341:H81YRADXX:1:110
CTTCGTTGTAGTAATGTTTTCAAGCAAAAG
+SRR1275449.3.1 HWI-ST332R:341:H81YRADXX:1:110
CCCFFEFFHHHFHIJJGIJJJJJJJJJJJJ
For the extracted ‘SRR1275449_2.fastq’ file:
@SRR1275449.1.2 HWI-ST332R:341:H81YRADXX:1:110
TACGGAAGTTCCTGCCTCCATATTGAATTC
+SRR1275449.1.2 HWI-ST332R:341:H81YRADXX:1:110
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJ
@SRR1275449.2.2 HWI-ST332R:341:H81YRADXX:1:110
ACACTTCTCGCAACTTCTTATTGCTGGCTC
+SRR1275449.2.2 HWI-ST332R:341:H81YRADXX:1:110
CCCFFFFFHHHHHIIJJJJJIIGIEHGIJJ
@SRR1275449.3.2 HWI-ST332R:341:H81YRADXX:1:110
GAACTTCAAAGAAAATAATGTGAGAGGCCA
+SRR1275449.3.2 HWI-ST332R:341:H81YRADXX:1:110
CCCFFFFFHHHGHJJJJJJJJJIJJJJJJJ
OBS - As the program runs, both output files are generated but when the program finishes only the singletons files are preserved in output folder.
I really would appreciate any help making prinseq work for those files.
Best wishes