Hello,
I did not find anything else in the documentation, so I just wanted to clarify something about motif discovery.
In the Manual for streme, under running time and memory usage, it says for the different libraries: "STREME runs about twice as slow on DNA sequences as on RNA sequences because
STREME treats DNA sequences as double-stranded and RNA sequences as single-stranded."
My sequencing data is derived from a single-stranded DNA library, which I do motif discovery on. So all motifs I am looking for are exclusively in one strand of my sequencing library (in this case the strands I parse to streme as the input file), but if I specify the alphabet as DNA (or leave it unspecified), streme will always assume dsDNA and start scanning my sequences and their reverse complement, correct? And if I manually specify RNA as a library, it will only search the forward strand I parse to streme, correct? Is there any other difference (except T --> U, but T is still registered as U) between these two options?
Furthermore, is it possible to force streme to assume ssDNA when running with the DNA alphabet file? Or can I write a custom alphabet file that is the same as the DNA file, but is assumed single stranded, just as RNA is? If yes, how do I know if a file is assumed single stranded or double stranded?
Best,
Corbin