POY 5.0.0 release announcement

50 views
Skip to first unread message

Ward C Wheeler

unread,
Aug 15, 2013, 9:00:28 PM8/15/13
to po...@googlegroups.com
POY users,

We are please to announce the release of POY5.  Many new features 
have been added  and others improved including maximum likelihood, chromosomal, 
genomic analysis, and custom alphabet sequences. 

Windows and OSX Binaries, source, and documentation are available 

Changes since POY4 below.

As always, please contact us with bug reports and suggestions.

Enjoy,
The POY5 Team

CHANGES BETWEEN 5.0 beta2 and 5.0 Official Release

New Features:
    - Status messages for progress when using Model Selection and Fixed States
      transformation commands.
    - report(diagnosis) produces columns of only relevant information.
    - Improved error messages.
    - Updated Documentation.
    - Updated Test Suite and Tutorials.
    - New option for optimization level of dynamic likelihood (see below).

Bugs Fixed:
    - Script analysis would drop graphsupports command in finalized script.
    - Due to a number of issues when loading trees with taxa missing from the
      loaded data, we do not allow these trees to be loaded. The application
      will report the missing trees and skip loading the information. This can
      be used to generate a select command to filter terminals, like
        select(terminals, not files:("missing_terminals_file"))
    - General NonAdditive Characters causing error under Exhaustive DO. The
      appropriate functions not being called from previous testing.
    - Logic backwards in processing the level and orientation arguments for
      reading BreakInv and chromosome characters.
    - Loading data with unequal number of fragments (separated by #) did not
      cause an error, instead filled sections with "missing" data.
    - Issues with synonym files and processing trees (Ron Clouse)
    - Build under dynamic likelihood joined two nodes with different models.
    - Order of nexus blocks had Assumption block first, not last.

New Commands (see manual for full explanations) :
    - set( opt:exhaustive_dyn) will optimize the dynamic likelihood model
      directly, instead of via an implied alignment.



CHANGES BETWEEN 5.0 beta1 and 5.0 beta2

New Features:
    - report trees with branches would work for likelihood characters, but would
      not report parsimony branch lengths. The new command allows printing the
      branch lengths from three different methods (see below), these options are
      ignored under likelihood as we always report the parameter value that
      maximizes the log-likelihood.
    - report graphtrees, asciitrees, and trees with collapsible property have
      been extended to use the branch lengths mentioned above (see command
      description below).
    - Custom Alphabet transformation to static character via static approx. This
      was an issue with the characters that can be represented as prealigned and
      how to properly transform between them.
    - Added support for ocaml PARMAP for generating fixed-state cost matrices.
    - Added command report(robinson_foulds) (see command description below)

Bugs Fixed:
    - Selecting terminals caused an error in ncurses display when writing to an
      incorrect window (Louise Crowley).
    - Transforming to likelihood under a model selection criteria (aic,aicc,bic)
      from a tcm under parsimony with different gap cost than substitution cost 
      resulted (in failure) with added data that represents the cost of a gap.
      These should be filtered from the transformation, as in other likelihood
      transformations on those characters.
    - Custom-alphabet implied alignment produced an alignment with extra indels.
      This was due to an encoding issue that became relevant when mixing custom
      alphabet characters and levels.
    - Issue with missing data in custom-alphabet prealigned characters resolved.
    - generation of cost-matrix with all-elements row/col and non-zero diagonal
      cost matrix --was being replaced with zeros, should have been min of row.
    - No reported error when reading in sequence data with unequal fragments.
      This forced the computation to proceed as if those fragments were
      missing. Now (as in POY4) we report an error to the user.

New Commands (see manual for full explanations) :
    - report command for parsimony branch lengths
        report(trees:(branches:min))
            - minimum number of changes is reported on branches
        report(trees:(branches:max))
            - maximum number of changes is reported on branches
        report(trees:(branches:single))
            - number of changes reported based on single assignment of dynamic
              characters
    - report command for tree distances using Robinson Foulds distance metric
        report(robinson_foulds)
            - will print matrix to terminal window
        report("OUTFILE", robinson_foulds)
            - will print matrix to file OUTFILE

Changed Commands (see manual for full explanations) :
    - report command for collapsed branches has been changed. Instead of
      collapse:true or collapse:false, we've extended the command like the
      branch lengths above to collapse:min, collapse:max, collapse:single. The
      branch is collapsed when the length as defined above is equal to 0.0.
        report(trees:(collapse:single))
        report("tree.pdf", graphtrees:collapse:min)
        report( asciitrees:collapse:max )
 


CHANGES BETWEEN 5.0 alpha3 and 5.0 beta

New Features:
    - Consistency in alignment procedures across the application. The trace-back
      procedures produce the same result in affine (with gap opening equal to 0)
      as normal alignment procedures, as well as alignment procedures with speed
      increases (newkkonen), and the space saving algorithm.
    - Continuous characters are fully supported above the range of 0-255, now
      0 to the maximum size of integers on the machine. Although this change is
      slightly slower, characters that do fit in the 255 range are vectorized as
      previously implemented.
    - build(N,random) does not do a random Wagner build, but generates and
      diagnoses a random topology.
    - Sankoff and Sequence characters with matrices of non-0 diagonal elements
      have been implemented.
    - Information theoretic model selection procedures have been designed under
      static and dynamic likelihood (see new command below). This command can be
      used on multiple character sets and types of data in which case the model
      selected for each data-set are combined on the final tree(s). A tree must
      be in memory when the command is executed. The command analises all the
      models possible for each tree and selects the best based on the
      information criteria selected.
    - Updated build/compile procedures for different environments.
    - Implemented No Common Mechanism likelihood model (see new command below).
    - Bootstrap Probabilities under likelihood have been implemented. We use the
      same command as BP for other characters previously.
    - Better memory usage when loading multiple files of the same TCM.
    - Speed increases in the diagnosis of normal and affine sequence alignment.
    - Changed most (all found/possible) functions to tail-recursion to avoid
      stack-overflows, should also increase speeds.

Bugs Fixed:
    - Character Selection procedures (through the IDENTIFIERS in the command
      structure) have been verified and implement special cases of each-other
      when necessary for lower chances of future bugs. (Thanks to Fernando
      Marques).
    - Bug-fix with partitioned dynamic likelihood characters of multiple models
      or in combination with static characters causing optimization failures.
    - Error in transform(prealigned) on static characters --command only works
      on dynamic characters. These characters should have been ignored.
    - Likelihood model optimization routine returning matrix of NAN when branch
      lengths were sub-normal; minimum value has been used to avoid this.
    - Error in report(seq_stats) when missing data is present.
    - Proper usage of missing data in iterative:exact and iterative:approx.
      Previously missing data could be assigned in the median nodes of
      characters in certain situations, resulting in 0 costs assignments in
      subtrees, as well as errors in the median assignment functions. (Thanks to
      Denis Jacob Machado)
    - elikelihood for static likelihood characters had a bug in counting
      'uninformative' data in estimating transition probabilities.
    - Single assignment functions for the newkkonen alignment procedure were not
      calling the proper median function.
    - Static likelihood takes '?' into account correctly under fifth state (gap
      as an additional state) models. Previously it was interpreted as a gap,
      now it is interpreted as missing, like gaps in four-state models.
    - Static likelihood was not taking into account missing data correctly.
    - Correction for tree diagnosis in non-0 diagonal tcm matrix.
    - selecting unique topologies may choose suboptimal likelihood trees if the
      models/branches are different, we now select the lower of the two.
    - transform(likelihood(...) -> transform(parsimony) resulted in an error
      state of the application and incorrect costs from before the likelihood
      transform command. Now the transform can be done to recover the parsimony
      costs or vice versa.
    - Join caused a failure in fuse, causing failure in diagnosis of tree.

New Commands (see manual for full explanations) :
    - Command to set the optimization thoroughness for the likelihood procedures
      has been added. The command,
        set(opt:coarse)
        set(opt:exhaustive)
        set(opt:no_opt)
      determines the number of passes for the optimization algorithm, and the
      convergence factors for the numerical routines.
    - Information theoretic model selection for likelihood uses the same
      transform command as (e)likelihood, but replaces the model (ie. jc69, gtr)
      with an information theoretic criteria --aic, aicc, or bic. ie,
        transform(likelihood:(aic,rates:gamma:(4)))
    - No Common Mechanism (ncm) has been added as an additional model under
      likelihood. This is for static characters only. ie,
        transform(likelihood:(ncm))

Changed Commands (see manual for full explanations) :
    - reading prealigned characters (outside of nucleotides) have been unified
      with the normal command structure. For example,
        read( custom_alphabet:("DATAFILE", "MATRIXFILE"; init3D:true) )
      now is,
        read( prealigned:( custom_alphabet:("DATAFILE"), tcm:("MATRIXFILE")) )
      The previous command structure was only briefly implemented in an alpha.

Known Issues :
    - Pre-aligned affine data reports an incorrect cost. This option for
      analysis has been turned off and an error is reported.
    - Diagnosis on level over 5 creates a seg-fault. This is probably a memory
      issue as the function would exceed many computers limits.
    - Newkkonen space saving (set(space_saving_alignment)) command has been
      de-activated due to segfault.
    - 'help' commands generated from latex docs are a mess.



CHANGES BETWEEN 5.0 alpha2 and 5.0 alpha3

Bugx Fixed:
    - Continuous characters are fully supported from Hennig86/Nona files. The
      previous format that POY reads is the same (integers separated by spaces,
      missing data represented as '?', and ranges defined in square brackets
      separated by spaces. The data limit for the continuous characters requires
      a maximum range of 255. (Thanks to Edmundo Gonzalez)
    - Costs displayed on trees is incorrect for partitions/sets of characters.
      This is fixed to represent the overall tree cost.



CHANGES BETWEEN 5.0 alpha1 and 5.0 alpha2

New Features:
    - orientation set to true by default for breakinversion data-type.
    - faster alignment under low-mem settings

Bugs Fixed:
    - Fixed Makfile in src directory to perform the install, and removed the
      Makefile/configuration in the root directory. These files were synonyms
      for the ones the src directory and add no value to the compilation
      process.
    - Default for configuring with --enable-mpi is to set interface to flat.
      This is a requirement that is oft forgotten and there is no reason why we
      cannot facilitate that requirement. Setting interface to anything else
      will over-write that choice and report a warning.
    - Add Error message for input sequence with different number of
      fragments(fragments are devided by '#').  (bug report by Torsten Dikow).
    - Improved Makefile and Configure scripts from minor errors. Also removed
      Makefile and configure script from the root directory to avoid confusion
      and easier to maintain. (bug report by Jan De Laet).
    - Report lkmodel (to report likelihood model), was not working properly in
      certain situations; without identifiers.  (bug report by John Denton).
    - Transform likelihood with multiple types (for example a combination of
      static and dynamic) would fail in the transform due to alphabet size
      issues. We partition the data now between static and dynamic and then
      apply the transform to the characters. (bug report by Fernando Marques).
    - Used non-affine alignment for affine models under parsimony; this has
      been reverted correctly, and also includes affine low-mem ukkonen.
    - Compiling supramap and cmxs libraries dynamic linking rule was missing
      in our myocamlbuild file. (bug report by Travis Treseder).
    - Diagnosing Static and Dynamic Likelihood mixed models caused errors due to
      demarcation of sets of data. Resolved so we group by the type of model
      being applied to the characters as well as pre-defined sets and data
      classes.
    - Diagnosis for Dynamic Likelihood characters did not work on leaf nodes.
    - Backtrace works the same in POY4 and POY5 for normal alignment; the issue
      is in regard to the preference in inserting indels in which sequence.
    - Affine alignment bug with aligning two sequences at a point each having 
      gap polymorphisms. This is a very rare instance.

New Commands:
    - set(space_saving_alignment)
    - set(normal_alignment)
        - commands turn on/off low-memory alignment procedure. Default off.

Changed Commands:
    - transform(chromosome:(newkkonen,..)
    - transform(genome:(newkkonen,... ))
        - this option is specified in the low-memory/space-saving alignment
          procedure mentioned above.



CHANGES BETWEEN 4.1.2.1 and 5.0 alpha1

New Features
    - Added likelihood criterion for diagnosing trees.
        - Added methods of optimization for likelihood during build/swap/fuse
        - Support for dynamic and static characters under a variety of models.
        - Most Parsimonious and Maximum Average Likelihood cost models.
        - Static and Dynamic character support for likelihood, including any
          alphabet size (ie, discrete morphological characters, amino acid, ...)
    - Added a variety of median solvers for rearrangements.
    - Support for Genome and Chromosome characters with annotator Mauve.
    - New selection method for polymorphic data in fixed state characters.
    - Added level support on all alphabets sizes.
    - Changed default TCM to 1,1.
    - Updated configure scripts for newer versions of gcc and ocaml.
    - Low memory option for alignment of sequences.
    - Changed command for transform for dealing with identifiers.
    - Require one type of delimiters in data files.
    - Choice of equally costly medians can be user specified.
    - pre-aligned for custom-alphabet and amino-acid characters.
    - Allow additional medians by search-based command for fixed state
      characters.
    - Internal assignment in diagnosis output of fixed state characters print
      taxon name.
    - Default for amino-acid to not use 3D alignment in up-pass
    - Better support for manipulating and calling character sets in data
    - Graphic output for mauve outlining alignment of blocks and rearrangements

Bugs Fixed:
    - Memory leaks in grappa interface.
    - Building random trees does not use a modified Wagner build.
    - Missing data is presented as a '?' in output from implied alignments.
    - Avoid rediagnosing trees before certain operations.
    - Issues in reading prealigned phylip files.
    - Better support for all features of NEXUS files.
        - Added POY block to nexus files for our specific needs, including
          likelihood, chromosome, genome, and dynamic character information.
    - Detecting file types is more accurate.
    - Parsing file types has better support.
    - Can transform Break Inversion and Custom alphabet with cost matrix file.
    - Replaced command dynamic_pam to deal with new datatypes.
        - now chromosome, genome, breakinv, etc.
    - Priority for backtrace in the alignments standardized between floating
      point alignment (dynamic likelihood), affine, and sequence characters.
    - Nexus output is fully produced when called in the report command
      --includes trees, set information, data, etc.
    - Support for scientific notation of floating point numbers in parsers.
    - Custom Alphabet and Break Inversion data characters case sensitive.
    - Fixed and improved POY help documentation 
    - Initial cost for downpass in fixed state characters was incorrect (up-pass
      and final costs were correct).
    - Status messages during branch and bound build (after every 1% complete).
    - 3D option set to false is observed after re-diagnosis.
    - all element code (X) in amino acid is treated as a polymorphism
    - Improved max_time behavior in searching
    - Fixed cost issue in rearrangement for annotated characters

Features eliminated:
    - dynamic_pam command has been replaced by the commands chromosome, genome,
      breakinv, and custom_alphabet.

New Commands:
    - transform( likelihood:( ... ) )
    - transform( genome:( ... ) )
    - transform( chromosome:( ... ) )
    - transform( breakinv:( ... ) )
    - transform( custom_alphabet:( ... ) )
    - transform( parsimony )
    - transform( level:INT )
    - set( partition:( ... ))
    - set( codon_partition:( ... ))
    - swap/fuse/build( optimize:(model:(...),branches:(...)) )
    - report( trees:(branches) )
    - report( lkmodel )

Changed Commands:
    - transform( [IDS], (transformations,...) )
    - read( custom_alphabet:([datafile],[costmatrix],[prealigned]))





Ward Wheeler
Division of Invertebrate Zoology
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192
USA




http://www.wiley.com/WileyCDA/WileyTitle/productCd-047067170X.html

b...@optusnet.com.au

unread,
Aug 16, 2013, 12:36:40 AM8/16/13
to po...@googlegroups.com
Congratulations to the POY5 team!.
I am looking forward to compiling this new version and trying it.
Best wishes
Buz WIlson




----- Original Message -----

To:
"po...@googlegroups.com" <po...@googlegroups.com>
Cc:

Sent:
Fri, 16 Aug 2013 01:00:28 +0000
Subject:
[POY] POY 5.0.0 release announcement


POY users,

We are please to announce the release of POY5.  Many new features 
have been added  and others improved including maximum likelihood, chromosomal, 
genomic analysis, and custom alphabet sequences. 

Windows and OSX Binaries, source, and documentation are available 
Reply all
Reply to author
Forward
0 new messages