RTG Core 3.7 / RTG Tools 3.7

112 views
Skip to first unread message

RTG Announcements

unread,
Aug 25, 2016, 5:11:45 PM8/25/16
to RTG Announcements
Real Time Genomics are pleased to announce the availability of new releases of our full analysis suite, RTG Core, and our utility package, RTG Tools.  This release includes new features and performance improvements. Some of the highlights of this release:

* Improvements to mapping speed when aligning targeted sequencing data. This feature makes use of a per-reference hash blacklist which is constructed once per reference genome and can yield significant speed improvement.  In addition, several changes were made to reduce peak memory use during mapping.

* Variant callers now allow the optional inclusion of expected germline allele balance terms in the Bayesian model.  In a genome-wide scale, this generally results in a reduction in false-positive calls, although sensitivity may be reduced for variants which do not follow allele balance expectations, such as mosaic de novo variants.

* Several improvements to the somatic caller. These include the ability to enable output of germline variants (due to the joint calling, accuracy of calling germline variants during somatic calling is typically higher than separately calling germline variants from the normal sample alone). The somatic caller now has the ability to explicitly model the expected somatic allelic fraction, for use in cases where the tumor heterogeneity is expected to be low. Additional options allow the output of records at sites exceeding user-specified thresholds for non-reference evidence. We have also included an AVR model specifically built for somatic calling which provides more accurate scoring than the regular germline AVR models.

* Several improvements to the variant comparison tools.  vcfeval now includes the ability to evaluate matches across confident-region boundaries according to GA4GH recommended practise.  vcfeval can be used to compare against "sample-free" VCFs such as ExAC/COSMIC/dbSNP, and the runtime has also been significantly improved.  In addition, the rocplot command can now produce precision-sensitivity graphs, and can output SVG as a more publication-ready format.

If you haven't used RTG Core before (or maybe even if you have), we suggest you run the demo-family.sh script that runs through a short end-to-end demonstration of sex-aware and pedigree-aware family variant calling, including de novo variant detection and variant evaluation with vcfeval. (It also makes a nice demo of our comprehensive simulation tools.)

Commercial users of RTG Core may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products/rtg-core-non-commercial or build from the source on github at https://github.com/RealTimeGenomics/rtg-core.

Users of RTG Tools, which is made freely available for non-commercial or commercial use alike, can download the new version from our website at http://realtimegenomics.com/products/rtg-tools or build from the source code on github at https://github.com/RealTimeGenomics/rtg-tools.

Note: RTG now requires Java 8, so for those using the "nojre" RTG download or who are building from source, make sure you have Java 8 installed.

Detailed changes are listed below by area.  For more information on new features, see the RTG Operations Manual (which is now available in both PDF and HTML).

## Basic Formatting and Mapping

* format: Automatically installs reference genome configuration
  information when a recognized reference genome is being formatted to
  SDF. Also outputs a reminder for those cases where it looks like a
  reference genome is being formatted but which is not one of the
  recognized genomes.

* sdf2cg: New command to allow the export of Complete Genomics data
  that has been formatted as SDF to Complete Genomics TSV read format.

* map/cgmap: TLEN was not being correctly computed in the presence of
  soft clipping and back steps. This has now been corrected.

* map/cgmap: Several reductions in peak memory use during mapping.

* map: Significant speed improvement when mapping highly targeted
  sequencing data, using the mechanism of a repetitive hash blacklist.
  This is enabled via the new flag --reference-blacklist. A separate
  tool 'hashdist' is used for this one-off blacklist construction.

* hashdist: New command that can be used to analyse the uniqueness of
  k-mers contained within a reference sequence and to produce a
  reference hash blacklist.

* calibrate: New flag --exclude-bed and --exclude-vcf can be used to
  exclude sites of known genomic variation during the computation of
  calibration data. It is not currently possible to specify this
  information to the automatic calibration that is carried out during
  mapping, this will be added in a future release.

### Variant Calling

* snp/family/population/somatic: These callers expect calling to be
  carried out on alignments that have had calibration information
  computed. They now requires the explicit use of the --no-calibration
  flag in order to proceed anyway.

* snp/family/population/somatic: These commands now output a warning
  if too many "excessive coverage" situations are encountered, as this
  usually signifies that the user has incorrectly calibrated their
  mappings or has failed to supply an appropriate coverage parameter
  to the caller. In addition, these commands output a warning if it
  appears that calibration has not been computed from correct regions
  for targeted data.

* snp/family/population/somatic: New flag --min-base-quality which
  allows explicit ignoring of base calls which do not meet the
  specified minimum phred quality score. These bases will be treated
  the same as an N and will not contribute to allele counts.  The
  default is to consider all bases.

* family/population/somatic: The semantics of --max-coverage has
  changed from being the total coverage across all samples, to being
  the average per-sample coverage.  This flag is typically only used
  when running without calibration, and this change makes the default
  behaviour more scalable with varying numbers of samples.

* snp: An explicitly specified --ploidy flag now overrides the ploidy
  obtained from reference genome configuration (if present).
  Previously the ploidy specified in the reference genome would take
  precedence.

* snp/family/population/somatic: Fixed an incorrect (and sometimes
  non-deterministic) computation of the PUR FORMAT annotation. This
  does not affect primary calling but could result in changes in AVR
  score.

* snp/family/population/somatic: Updated the Bayesian model to include
  a term for the expected allele balance. This is disabled by default,
  and can be enabled with the new flag --enable-allelic-fraction. This
  option gives improved precision for regular germline calling, but
  sensitivity to mosaic variants or those within CNV regions may be
  reduced.

* snp/somatic: The new flags --min-variant-allelic-depth and
  --min-variant-allelic-fraction can be used to enable output at sites
  where these thresholds are met, even if the caller would not
  otherwise make a call. Note that this does not act as a filter to
  prevent the caller from output at sites where these thresholds are
  not met.

* somatic: New flag --include-germline which instructs the somatic
  caller to also output variants which have been identified as
  germline variants.

* somatic: New flag --enable-somatic-allelic-fraction which instructs
  the Bayesian model to include a term for the expected somatic
  allelic fraction in the calling.  This flag is most appropriate when
  tumor heterogeneity is low.

* somatic: A new pre-built AVR model is provided for somatic calling
  which provides better scoring for somatic variants than the regular
  AVR models. This new model, "illumina-somatic.avr" is selected by
  default by the somatic caller.

### Variant Processing and Analysis

* vcfsubset/vcffilter: New flag --no-header which omits the output of
  the VCF header.

* vcffilter: New option --keep-expr to allow filtering records based
  on simple JavaScript expressions with natural VCF field access. For
  example 'NA12878.DP > NA12892.DP' to select records from a trio
  call-set where the depth of NA12878 is greater than that of her
  mother. See the user manual for more information and examples.

* vcffilter: New option --javascript to allow advanced filtering and
  other processing of the VCF file using powerful JavaScript
  filters. These scripts can contain initial setup, per-record
  actions, and end functions. See the user manual for more information
  and examples.

* vcfeval: Specifying a sample name of ALT for either the baseline or
  call sample name instructs vcfeval to match against all possible
  non-ref diploid (or haploid if using --squash-ploidy) genotypes
  possible from the declared ALTs. This permits matching against a VCF
  that contains no sample column, for example to find hits against a
  sample-free VCF such as ExAC or COSMIC.

* vcfeval: New flag --evaluation-regions, which adds support for
  matching across high-confidence/false-positive regions such as those
  supplied with GIAB or Illumina Platinum Genomes truth sets according
  to GA4GH recommendations. In summary, only matches against baseline
  variants within these regions count as true positives and only
  non-matched call variants made within these regions count as false
  positives.

* vcfeval: Now outputs additional true positive statistics for the
  unweighted calls, so you can see the simple count of true positives
  in call representation.  When computing precision, this uses the
  unweighted call count in the denominator, to reduce representation
  bias in the precision.

* vcfeval: Significant speed increase (often 2x speed up for typical
  WGS comparisons).

* vcfeval: New output mode 'roc-only' which skips the output of VCF
  files and only produces the ROC data files and summary metrics. This
  reduces run-time and the size of the output directories when doing
  many runs.

* vcfeval: Command line score field specification permits INFO.<name>
  form, for consistency with JavaScript expression notation, although
  the old form of INFO=<name> is still supported.

* rocplot: Added the ability to plot precision-sensitivity graphs via
  the new flag --precision-sensitivity.  In the interactive GUI the
  graph type can also be changed on the fly via a dropdown chooser.

* rocplot: Added the ability to output images in SVG format, both in
  non-interactive mode via the new flag --svg, and when saving images
  from the interactive GUI.

* rocplot: Improved the default labelling of curves by including the
  score field if available.

* rocplot: The curve palette size has been increased in order to allow
  easier differentiation when more than 8 curves are being displayed
  at once.

* rocplot: (GUI) Fixed an annoying bug that could occur when trying to
  edit the title of the plot or of the curves. Several other minor GUI
  improvements have been made, such as the ability to use the
  mouse-wheel to scroll large lists of curves.

### Other

* aview: Now defaults to showing base colors in the terminal. Use
  --no-base-colors to disable this.

* aview: Better error handling for invalid SAM records.

* aview: New flag --print-soft-clipped-bases to display soft-clipped
  bases.

* chrstats: New flag --output-pedigree that can be used to create a
  default pedigree file based on the mappings of multiple samples,
  using inferred sample sex where possible.

* many: In several cases where a flag could be specified multiple
  times, it is now possible to supply a comma separated list of
  values. These are indicated in the output of --help.

* many: Most utility commands which write VCF files now do so
  asynchronously, often resulting in significant speed improvements.

* all: The distribution now includes an HTML version of the operations
  manual in addition to the PDF version.

* all: The minimum Java requirement for RTG is now Java 8.

Reply all
Reply to author
Forward
0 new messages