I am trying to use PRINSEQ with Paired-end reads.
I know that "The sequence identifiers for two matching paired-end sequences in separate files can be marked by /1 and /2, or _L and _R, or _left and _right, or must have the exact same identifier in both input files.".
Can I have an example of the input data?
I am not sure but I think that mine are marked by .1 and .2 (cf. above).
Can you confirm that I have to "transform" them?
And If yes how should i do that?
Thank you
Marie
___SRR496757_1.fastq
@SRR496757.1 B802KKABXX:8:1:1321:1990 length=90
ATTCAAAAGTATCACAATTGAGCTTGAAAATCACACGAGCTGTATTTTTTTTTTGTCAACAGCGAGGAGAGAACTACACAAGCAAAAAAG
+SRR496757.1 B802KKABXX:8:1:1321:1990 length=90
=GGGGDEFFCGGFFGDAGGGBGGGGGGFGDEGGF?EGDFEGGEEGFEEC@6?;<37*?@?@@?@?A5)@?4/7&/86:48EBDEC@@?##
@SRR496757.2 B802KKABXX:8:1:1737:1916 length=90
NTAAATCTCAATTGAAGGCATGACTTCGGCGAATTTCGACAGACACCCGCATGTGGCAAGCTGTTCAGTTCGAGTTCAGTTCGACCCCCC
+SRR496757.2 B802KKABXX:8:1:1737:1916 length=90
#**-(27272EEEE?EEEEEEEE?EEEEE>9@@@@?A@A9@@@@?@>9>>BBBB<0:=<7133-.)30+47770099999>8>>>BBBBB
___SRR496757_2.fastq
@SRR496757.1 B802KKABXX:8:1:1321:1990 length=90
CCCCCCCATGGCACAGTCACAAGTAGTATTAAAGGTAGCCCCGGGCTACAGACGATACTACAAAAGATAGAATACCAGTACAGTCTTTTT
+SRR496757.1 B802KKABXX:8:1:1321:1990 length=90
GGDFGG>GGGG?ED?DBDDDAAB?:DB?DDCCCCC:;@=>DDDDDBD?ACAA-A>CC-C:=?############################
@SRR496757.2 B802KKABXX:8:1:1737:1916 length=90
GGAGTATATAGTGCCGGTTGCCGCTATAGTGCCGGCCTTATTGGCTGGGGGGGAACCAAAAAACCGGACAGAAAATAAAGGGGGGTCTAT
+SRR496757.2 B802KKABXX:8:1:1737:1916 length=90
GGEGGGGF:FACEEEEEEEE5CDDD>@B?@EB:=ADDCEE:BEEC-CA=?########################################
cat file_1.fastq | paste - - | sed 's/^\(\S*\)/\1\/1/' | tr "\t" "\n" > file_1_renamed.fastq
cat file_2.fastq | paste - - | sed 's/^\(\S*\)/\1\/2/' | tr "\t" "\n" > file_2_renamed.fastq
See: https://edwards.sdsu.edu/research/changing-the-label-of-paired-end-sequences-in-fastq-files/