We have recently integrated 398 PFMs from Transfac 7.0 Public.
Some of the imported matrices had to be re-normalized; see
Help page for details.
With this integration, COTRASIF is now at version 0.17.
Looking at the development roadmap we had published
earlier, we have completed the first point of it, and (temporarily - until we add plants) got rid of a "background task" of adding more genome-wide promoter collections. Transfac integration delayed the full-genome-sequence search feature, which is about to be made publicly available soon.
Below you will find an updated version of the development roadmap. It has only become larger, and still is quite imprecise as to the dates of implementation - all the specified dates are approximate, as some features will come sooner, and some will come later than promised here.
1. Provide full-genome-sequence search option as an addition to existing functionality: Jan, 2009.
2. Implement "TFBS enrichment analysis" for promoters; this should noticeably ease the selection of targets for experimental verification. February 2009.
3. Improve tasks management: add an ability to descriptively name the tasks being submitted, and to retrieve (and delete?) previous results. Add an ability for the user to enter a list of Gene_IDs to search in, so as to avoid searching the whole genome when not necessary. Add hyper-linked HTML format for results file, and compressed results file download links (will be handy for genome-wide search results). Add Statistics page listing the current state of internal COTRASIF database. February-March 2009.
4. Enable public (API/GET/web-service) access to internal COTRASIF database of promoters, for all genomes; make dumps available.
5. Implement automatic calculation of the PWM threshold (similar to how it is done for HMM).
6. Improve conservation filter: a) allow users to specify %id and orthology types for filtering, b) introduce position-constrained filtering (with distance measured from the TSS of each of the orthologous genes); consider implementing multi-species (as opposed to two-species) filtering
7. New development roadmap to write. Options to choose from: a) include nucleosome position prediction as an additional filter (not a favourite one), b) integrate Gene Ontology enrichment analysis (not a favourite one - but should be added at least as an external application link), c) add plant genomes database to the COTRASIF import pipeline (if this won't be done earlier, which is quite possible), d) add support for transcription modules (groups of TFBS), together with support for searching with multiple matrices/sequence sets simultaneously, e) make our HMM search binary available for download (it is not as easy as it seems). If you want a feature that is missing from this list - let us know.
Bogdan