I have two groups of paired fastq (R1 & R2)
"GroupeA": {
"R1": "T1_R1_200.fastq.gz",
"R2": "T1_R2_200.fastq.gz"}
"GroupeB": {
"R1": "T1__R1_201.fastq.gz",
"R2": "T1__R2_201.fastq.gz" }
I'm applying star on each group. (one for groupA and then groupB). I write only the important options used.
STAR
+" --readFilesIn GroupeAr1 GroupeAr2"
+" --readFilesCommand zcat"
+" --outSAMtype BAM SortedByCoordinate "
+" --quantMode GeneCounts"
+" --outSAMattrIHstart 0 "
So here I generated the 2 logfinal.out and 2 readsperGene.tab. For example here I will extract for the same gene , the number of reads mapped in all readperGene.tab, to add them and get the global value . Wrong ? Also do that to get total uniquely mapped reads using intermediates logfinal.out.
Then I merge them using samtools :
samtools merge -r final.bam A.bam B.bam
I finaly have one bam that I will pass to :
STAR --runMode inputAlignmentsFromBAM" \
#+" --quantMode GeneCounts " \
+" --outWigType wiggle " \
+" --outWigStrand Stranded" \
+" --outWigNorm RPM"
Does the fact that I'm doing that on an unsorted bam can influe the wig ouput ?
Note for htseq counts : I have test on an unsorted bam. I was thinking that after the merge the bam is still sorted but that not the case. So I re sorted the final bam (samtools sort) and now I try the following if it match :
htseq-count -f bam -s reverse -r pos T6rep2.bam /home/jean-philippe.villemin/mount_archive2/commun.luco/ref/genes/GRCh38_PRIM_GENCODE_R25/gencode.v25.primary_assembly.annotation.gtf > genes.htseq-con3.txt