Hi everyone,
I'm working with de novo assembled transcriptomes from several different species for which there are no genomes or reference transcriptomes available. I assembled the transcriptomes in-house and then used kallisto to map each set of reads to its corresponding assembly and thus obtain tpm values. All pretty standard.
The problem is that I'm trying to look into the abundance of specific protein domains across these different assemblies - whether it's expressed much higher or lower in some species relative to others, that sort of thing. I know that kallisto normalizes for gene length and sequencing depth, but since we're comparing between completely different assemblies, do I need to scale for library size too? If so, how? The methods I've come across to do so all require a transcripts-to-gene annotation file, which isn't feasible in our case.
Any suggestions or advice would be greatly appreciated.
Thanks!
~ Shabnam.