version 0.21: multiple changes

1 view
Skip to first unread message

COTRASIF

unread,
Oct 12, 2009, 10:59:26 AM10/12/09
to cotr...@googlegroups.com
After a period of silence, version 0.21 brings the following changes (in no specific order):
  • Fix: Conservation filter (also known as orthology filter) no longer relies on protein_percent_identity; this should make an update to latest E!56 release possible, as well as should allow to bring the internal database of promoters back in sync with the orthology data. As Ensembl now provides more orthology-related data, it would be possible to re-integrate some numeric filters later. Any relevant references, comments and suggestions are welcome.
  • Change: To compensate for a loss of "protein percent identity" filtering, 2 new features are now exposed to the user:
    1. Ability to select the "allowed" orthology types.
    2. Ability to specify the delta of the distance from TFBS to TSS, when comparing orthologous promoters (so-called position-constrained conservation/orthology filtering)
  • Fix: blank screen upon task submission (WSOD).*
  • Fix: Some time ago, the PWM search mode started ignoring pseudocounts for fractional frequency matrices. This has lead to the problem of -infinity values in the calculated PWM matrix. As it is not logical to add pseudocounts back, the solution was to apply a minimal value of 10^-9 to all zero-value cells in the submitted fractional PFM.*
  • Enhancement: New column - 'TSS-relative position' - was added to results files. It both simplifies the immediate interpretation of TFBS location, and helped to implement position-constrained conservation filtering.
  • Enhancement: Support for user-provided cut-off for the HMM search method was added.*
  • Enhancement: Usability of the "Start HMM search" was improved. Previously, it was not clear enough that the PFM box is optional. A respective text label has been added, to avoid confusion. Also, on the "Start HMM search" page, the optional PFM box is now empty by default (it used to be pre-filled with an example). Also, PFM box was moved below the sequences box, so as to show which one is more important.
  • Enhancement: Added a 10-second auto-redirect to task status page from the "Task submitted successfully" page; some people were missing the link to task status page.*
  • Update: Help page was extended to include the description of all the modifications introduced.
  • Fix: genomewide binaries died with segmentaion fault when e.g. -cutoff was specified instead of --cutoff. Now, if wrong option is found, program exits with a descriptive error message.
  • Enhancement: it is now officially possible to use genomewide binaries for TFBS search in the non-Ensembl fasta files. Non-Ensembl fasta headers are automatically detected, warning message is issued once to the screen, after that program proceeds normally - but only uses sequence identifier, and no other data from the fasta record header.
  • Fix: genomewide binaries, when used on non-Ensembl FASTA files, were unable to properly handle -1 strands. Now any non-Ensembl FASTA headers are by default set to use strand '1'.
  • Enhancement: automatic tests were added for 3 cases of genomewide binaries use. Previously, only the web-binaries had automatic tests.
  • Enhancement: Help page was extended to include a detailed discussion of the TRANSFAC versus JASPAR matrices. The discussion is linked to from the matrix selection/edit form.
*: items marked with an asterisk were released to the website before the release of version 0.21


An update to E!56 will be started now, and will be additionally reported upon completion.
It is advised that COTRASIF is not used before successful update is reported.

Despite the changes to downloadable binaries, they are still in beta status.


Finally, I would like to thank Noboru Sakabe and Zhang Bo for useful feedback on COTRASIF, which made many of the fixes/enhancements possible.


Bogdan Tokovenko
Reply all
Reply to author
Forward
0 new messages