Hi Prof. Dobin,
Thank you for your response.
By this sentence "Merge mate1 and mate2 unmapped files to FASTQ_1 and FASTQ_2 ":
I mean that I am trying to merge the "Unmapped.out.mate1/2" from STAR output to "FASTQ_1/2" files respectively before aligning to reference genome hg19.
Please let me tell you a bit more detail of the data:
The downloaded data from the public resource has two sets of files - aligned bam files and unmapped files (Unmapped.out.mate1/2).
The public resource has used STAR to do the alignment and the alignment is being done using reference genome hg38.
In order to ensure that the fastq files are in same order, first I sort the bam file using samtools like this: samtools sort -n
And then I use samtools fastq which gives me two fastq files _1.gz and _2.gz.
Now I want to merge the unmapped reads to these two fastq files.
The rationale for merging the unmapped reads is that those reads which did not map to hg38 may map to hg19.
Please let me know if it is correct to proceed this way for the unmapped reads or there is any alternative way to process the unmapped reads ?
Thanks,
Pawan