version 0.18: PWM rewrite

0 views
Skip to first unread message

COTRASIF

unread,
Mar 17, 2009, 7:02:14 AM3/17/09
to cotr...@googlegroups.com
PWM finder had been rewritten and is in operation since March, 8.
It is rock-stable, and had been extensively tested.

This (and over 190 git commits since version 0.17) forward COTRASIF to
version 0.18.

We have started testing full-genome search for the PWM method.

==========

Two bugs were found in HMM finder:
1. inability to converge on automatic cut-off, when submitted
sequences are long (e.g. 43 nucleotides long); we are working on an
improved cut-off estimator, which will be able to handle such cases.
2. uncontrolled memory consumption at specific (but valid) input
conditions; this is also being fixed now.

We will report in this thread as soon as these two are fixed.

Meanwhile, if your HMM task(s) stays with status 'running' for several
hours, or gets status 'failed' - please use the PWM finder;
your HMM task(s) will be re-run by us (after fixing the two bugs), and
you will get a notification with the link to results.

==========

An article about COTRASIF has been published recently.

If you use COTRASIF in your research, and find it useful - please cite:

Bogdan Tokovenko, Rostyslav Golda, Oleksiy Protas, Maria Obolenskaya
and Anna El'skaya (2009) Nucleic Acids Res., doi:10.1093/nar/gkp084

http://nar.oxfordjournals.org/cgi/content/abstract/gkp084v1

==========

If you have any questions, suggestions or feature requests - please
write us at cotr...@biomed.org.ua.

Bogdan

unread,
Mar 17, 2009, 7:30:53 AM3/17/09
to cotrasif
Note: I'm delaying update to Ensembl release 53 until HMM bugs are
fixed.

Bogdan

unread,
Mar 18, 2009, 5:29:05 AM3/18/09
to cotrasif
> 1. inability to converge on automatic cut-off, when submitted
> sequences are long (e.g. 43 nucleotides long); we are working on an
> improved cut-off estimator, which will be able to handle such cases.

This bug was fixed yesterday, cut-off convergence is now achieved much
faster.

However, one must understand that sequences over 20 nucleotides long
have a high total Ic, and would need a very low similarity threshold
(cut-off) to produce any results. As it is not yet possible to
manually specify similarity threshold for the HMM method, a task with
long sequences will most probably yield empty result.

Until we add manual cut-off specification to HMM, I suggest that for
long sequences one uses the PWM method with lower cut-off, e.g.
0.6-0.65 for ~40-nucleotide long matrix.

Provided sufficient demand, I'll make a PFM/PWM/Ic utility available
online, for easier conversion of sequences to matrices.

Bogdan

unread,
Mar 18, 2009, 4:28:06 PM3/18/09
to cotrasif
I've decreased the delay between task submission (queuing) and task
running; this should noticeably decrease task results wait times.

Bogdan

unread,
Mar 19, 2009, 11:13:56 AM3/19/09
to cotrasif
> 2. uncontrolled memory consumption at specific (but valid) input
> conditions

This is now fixed.

I'm waiting for additional testing results, and after that will update
to Ensembl 53.
Reply all
Reply to author
Forward
0 new messages