I am currently working on a project to identify transcription factor binding sites (TFBSs) in the promoter regions of genes from 40 hierarchical clusters and I have encountered a challenge that I would like to know if STREME might be able to address.
How can I identify novel motifs in the promoter regions of genes from 40 hierarchical clusters, defining the promoter as the region 1000 bp upstream of the transcription start site (TSS)? To avoid confusion with TFBSs from neighboring genes that are unrelated to the genes I am studying, should the analysis be restricted to genes where the nearest upstream neighboring gene transcribing on the opposite strand is more than 1000 bp away? Or can STREME address this issue by ensuring that the identified novel motifs are specific to the defined promoter regions and do not include overlapping motifs from neighboring genes?
I would also like to know more about the site percentage or coverage that is a prt of the results section when running STREME.
Thank you
Fathima Ashraf