Hi Rob,
thank you very much for the fast reply!
Your awk script seems to work, but unfortunately I still can´t do the quantification.
Does the akw script modify the reference .fa the way you intended?
HEAD reference .fa (original):
>ENST00000448914.1
cdna:known chromosome:GRCh38:14:22449113:22449125:1
gene:ENSG00000228985.1 gene_biotype:TR_D_gene
transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T cell
receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]
ACTGGGGGATACG
>ENST00000631435.1
cdna:known chromosome:GRCh38:CHR_HSCHR7_2_CTG6:142847306:142847317:1
gene:ENSG00000282253.1 gene_biotype:TR_D_gene
transcript_biotype:TR_D_gene gene_symbol:TRBD1 description:T cell
receptor beta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12158]
GGGACAGGGGGC
>ENST00000632684.1
cdna:known chromosome:GRCh38:7:142786213:142786224:1
gene:ENSG00000282431.1 gene_biotype:TR_D_gene
transcript_biotype:TR_D_gene gene_symbol:TRBD1 description:T cell
receptor beta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12158]
GGGACAGGGGGC
HEAD processed (your akw) reference .fa:
>ENST00000448914
ACTGGGGGATACG
>ENST00000631435
GGGACAGGGGGC
>ENST00000632684
GGGACAGGGGGC
The Warning/Error I get is:
.
.
WARNING: Transcript ENST00000572106 appears in the reference but did not appear in the BAM
WARNING: Transcript ENST00000576073 appears in the reference but did not appear in the BAM
WARNING: Transcript ENST00000573592 appears in the reference but did not appear in the BAM
..
No version numbers in the reference .fa (processed with awk), but still can´t find the IDs in the BAM.
Although the IDs do appear in the BAM, at least in the same format
Head BAM:
UNC12-SN629:170:D075HACXX:4:2301:1931:47690 419 ENST00000620552 5102 3 48M = 5142 88 CCTGGGCAAGGGGGACTTCGTGTCGCTGGCACTGCGGGACCGCCGCCT @@=DBAA?C;=C811?;:911C?;DEAGB7DDBFF645@E;9>>B?B@ NH:i:2 HI:i:1
UNC12-SN629:170:D075HACXX:4:2301:1931:47690 339 ENST00000620552 5142 3 48M = 5102 -88 CGCCGCCTGGAGTTCCGCTACGACCTGGGCAAGGGGGCAGCGGTCATC B@FHHIIJJJJIIGJIIJJJJIGJJJJJIHJJJJJGHHHHFFFFFCCB NH:i:2 HI:i:1
UNC12-SN629:170:D075HACXX:4:2301:1931:47690 163 ENST00000379370 5102 3 48M = 5142 88 CCTGGGCAAGGGGGACTTCGTGTCGCTGGCACTGCGGGACCGCCGCCT @@=DBAA?C;=C811?;:911C?;DEAGB7DDBFF645@E;9>>B?B@ NH:i:2 HI:i:2
Does the BAM look ok? I assume that the BAM is ok, because it worked with RSEM.....
Do I need the gtf for the quantification?
salmon quant -p 2 -t /san/zyto/MS/STAR_aligner/GRCh38/TRANSKRIPOME/Homo_sapiens.GRCh38.cdna.all_TCproc.fa -g /san/zyto/MS/STAR_aligner/GRCh38/annotation/Homo_sapiens.GRCh38.85.gtf -l IU -a ./star_out_fqAligned.toTranscriptome.out.bam -o STAR_Salmon_quant
Thanks again!
Best
Martin