Hi Alex,
I've been trying to use the .bam files obtained with STAR in RSEM, but I am having some trouble. As I've told you in my previous messages, I have indexed my de novo transcriptome with STAR (without gff) and aligned using the following command:
$ STAR --runMode alignReads --runThreadN 2 --genomeDir transcriptome_index/ --readFilesIn C1.paired.1.fastq.gz C1.paired.2.fastq.gz --readFilesCommand gunzip -c --outFilterMultimapNmax 50 --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample_C1/I checked the bam file with "rsem-sam-validator" and it is not suitable for RSEM. The error is:
$ rsem-sam-validator Aligned.sortedByCoord.out.bam
.
Only find one mate for paired-end read K00124:558:HFKJWBBXX:7:1101:22690:35774!
Please make sure that the two mates of a paired-end read are adjacent to each other.
The input file is not valid!So I tried to format it with "convert-sam-for-rsem". The result is not ok, since "rsem-sam-validator" says:
Clipping or padding is detected (cigar S) for read K00124:558:HFKJWBBXX:7:1101:1010:6379!
RSEM currently doest not support clipping or padding.
The input file is not valid!Lastly, I have tried sorting the file with:
$ samtools sort -n -@ 1 -m 1G -o Aligned.sortedByCoord.out.forRSEM.bam Aligned.sortedByCoord.out.bamBut I also obtained an unvalid result; "rsem-sam-validator" says:
"The two mates of paired-end read K00124:558:HFKJWBBXX:7:1101:1010:6379 are marked as both mate1 or both mate2! The input file is not valid!" If I try to run RSEM with those files I obtain different errors. If I try with the bam obtained from STAR, I obtain:
Read K00124:558:HFKJWBBXX:7:1126:4501:44775: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)
If I try with the result ofsamtools sort -n, i obtain:
Read K00124:558:HFKJWBBXX:7:1101:1010:6379: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)
If I try with the result of "convert-sam-for-rsem", I obtain:
Read K00124:558:HFKJWBBXX:7:1101:1010:6379: RSEM currently does not support gapped alignments, sorry!
When I visualized one of those reads in the .bam file, this is what I obtained for the 'original' bam:
$ samtools view Aligned.sortedByCoord.out.bam | grep 'K00124:558:HFKJWBBXX:7:1101:1010:6379'
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i21.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i21.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i45.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i45.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 163 TRINITY_DN1347_c0_g1_i49.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:1 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 83 TRINITY_DN1347_c0_g1_i49.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:1 AS:i:248 nM:i:2
For the sorted bam (with samtools sort -n):
$ samtools view Aligned.sortedByCoord.out.forRSEM.bam | grep 'K00124:558:HFKJWBBXX:7:1101:1010:6379'
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i21.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i45.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 83 TRINITY_DN1347_c0_g1_i49.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:1 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i21.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i45.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 163 TRINITY_DN1347_c0_g1_i49.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:1 AS:i:248 nM:i:2
For the sorted bam (with "convert-sam-for-rsem"):
$ samtools view Aligned.sortedByCoord.out.forRSEMconv.bam.tmp.bam | grep 'K00124:558:HFKJWBBXX:7:1101:1010:6379'
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i21.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i21.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:3 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 419 TRINITY_DN1347_c0_g1_i45.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 339 TRINITY_DN1347_c0_g1_i45.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:2 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 163 TRINITY_DN1347_c0_g1_i49.p2 32 1 46S104M = 161 279 CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF NH:i:3 HI:i:1 AS:i:248 nM:i:2
K00124:558:HFKJWBBXX:7:1101:1010:6379 83 TRINITY_DN1347_c0_g1_i49.p2 161 1 150M = 32 -279 CAGTCACCACCGTCGCCCATGCTCCAGTAGCTACCGTCGCCCATGCTCCAGTAACCGGTGTCCGGGCAGTTTATGGAGCTGGTTATGGTTATGGTTACAATGCGCCATTGACTGTAGCACACACACCAGCTGTAAGTTATGCTGCTCCAG F-JFJFJF<JJJJJJJJJFFJFFFJJJJJJFFJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJAFAJJJJJJFJFJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJFJJJFJJJFFFAA NH:i:3 HI:i:1 AS:i:248 nM:i:2
I also paste here the reads in the fastq files:
$ zgrep -A3 'K00124:558:HFKJWBBXX:7:1101:1010:6379' C1.paired.1.fastq.gz
@K00124:558:HFKJWBBXX:7:1101:1010:6379 1:N:0:AACTCACC
CTGGAGCAGCATAACTTACAGCTGGTGTGTGTGCTACAGTCAATGGCGCATTGTAACCATAACCATAACCAGCTCCATAAACTGCCCGGACACCGGTTACTGGAGCATGGGCGACGGTAGCTACTGGAGCATGGGCGACGGTGGTGACTG
+
AAFFFJJJFJJJFJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJFJFJJJJJJAFAJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJFJJJJJJJJJJFFJJJJJJFFFJFFJJJJJJJJJ<FJFJFJ-F
$ zgrep -A3 'K00124:558:HFKJWBBXX:7:1101:1010:6379' C1.paired.2.fastq.gz
@K00124:558:HFKJWBBXX:7:1101:1010:6379 2:N:0:AACTCACC
CAGGAACACATCTCATAAAACAACATGTTGAAATTGTTCGTCCTTTCAGTGCTCGTAGCAGCCGCGTCCTCTGCTGGACTTGGCGGACTCGGGGCTGTTTATGGAGCTGGCTATGGTTATGGTTACCATGCCCCATTGACTGTAGCCCAC
+
AAFFFJJJJJJJFJJJJFJJJJJJJFJJJFJJJFJJJJJJJFJJFJJJFFJJFAJJJFFFJJJJJFAJFJJJJJJJJJJJFJFJ7FFJJJJJJJJAAJJJJJJJJFJFJ7JJJJJJJFFFJJJFJJJJJFF<AJFJJJ<FJJJJJJ)AJF
Do you have any advice to overcome this problem? Any help is welcome.
Thank you in advance and sorry for all the questions,
Lucila.