I wanted to consult regarding the result of my corset run. There is this cluster with more than 10,000 sub-clusters (Y) that showed different annotation for some sub-clusters. For example, Cluster123.15471 and Cluster123.18358 gave different nr results: leptin receptor gene and Heat shock factor 2 protein, respectively. I am planning to perform gene-level differential expression (using tximport), but this could introduce a large bias in DGE analysis. Is it not advisable to just consider a sub-cluster ID as a uniq cluster? I used salmon and corset with the default settings to generate the clusters.
Any suggestion will be appreciated. Thank you so much!