Hi Alex,
Thanks for the STAR its truly a STAR and you are STAR for making the STAR I guess..
I'm working with data set where there isn't a reference genome, however we have quite a good transcriptome data (in fasta file) so we are using it as a "reference genome" to map against our other RNAseq data from the same organism and it works fine.. I actually tried both STAR and BWA MEM both gave me pretty much exact the same result.. I obviously don't need to worry about splicing. We don't have any annotation, not that I know of and even if we do it isn't in any particular format such as GTF/GFF. We are interested in read counts per transcript, which is really per individual contig name as it will be reflected in the BAM file..
I will write some python script to count reads per "contig" so that we can estimate some differential expression and stuff. However I thought it would be great if feature like that was introduced into STAR. Currently there isn't any tools that I know of count reads per contig.
The way I am going to approach this is to make big hash with keys set to contig/transcript names and values are counters for every time
there is a read that associate with particular contig name and then just write keys and values into table. There might be a better ways..
p.s exploring STAR's splicing features and they are seems pretty cool
Regards,
Kirill