Counting repetitions of a sub-sequence.

33 views
Skip to first unread message

mgrant...@gmail.com

unread,
Oct 8, 2014, 4:08:33 PM10/8/14
to biogo...@googlegroups.com
Hi,

Sorry for the newbie question.
I am trying to do something simple, and unfortunately I am not familiar enough with go to figure an easy way to do it.
I am reading a fasta file, and I need to determine how many times a certain sequence is repeating.
Can someone possibly suggest a way?

Thanks for your time.
   Michael

Dan Kortschak

unread,
Oct 8, 2014, 4:11:05 PM10/8/14
to mgrant...@gmail.com, biogo...@googlegroups.com
By "repeating" and "sequence" what do you mean?

How long are the expected repeats and how perfect is the requirement for matching between repeats?

mgrant...@gmail.com

unread,
Oct 9, 2014, 3:54:42 PM10/9/14
to biogo...@googlegroups.com, mgrant...@gmail.com
I am looking for something similar to biopython's Bio.motifs. In my analysis case the subsequences are short (about 5-10 letters), and I am looking for perfect matches only.
Before using this library I used strings.Count.

Dan Kortschak

unread,
Oct 9, 2014, 4:22:59 PM10/9/14
to mgrant...@gmail.com, biogo...@googlegroups.com, mgrant...@gmail.com
I think your best bet would be the index/suffixarray package in the standard library then. (There is a PWM package in biogo that handles cases a small as yours, but it does more than what you want here). You will need to convert the sequence letters to []byte, but alphabet.Letters can be converted with essentially zero cost using alphabet.LettersToBytes.

I have partially implemented a BWT search, but it is nowhere near ready.

Dan

Michael Grantham

unread,
Oct 10, 2014, 7:30:43 PM10/10/14
to Dan Kortschak, biogo...@googlegroups.com
Thanks!
Reply all
Reply to author
Forward
0 new messages