RTG Core 3.9 / RTG Tools 3.9

75 views
Skip to first unread message

RTG Announcements

unread,
Mar 21, 2018, 10:45:19 PM3/21/18
to RTG Announcements
Real Time Genomics are pleased to announce the availability of new releases of our full analysis suite, RTG Core, and our utility package, RTG Tools.  This release includes new features and performance improvements. Some of the highlights of this release:

* Representation improvements to variant caller outputs. The various variant calling commands default to an alternative algorithm for representing haplotype calls as smaller components. While the underlying haplotype calls are the same as previous releases, the decomposed representation used in VCF is now more granular, so results can look quite different on the surface.

* Small-variant evaluation improvements. vcfeval allows optional preprocessing of input VCF files to decompose large calls to smaller constituents, which can permit longer calls to receive partial credit during accounting.  In addition, vcfeval snp and indel ROC outputs now include precision and sensitivity metrics. For more information see the user manual. vcfeval also supports matching variants that occur inside a spanning deletion.

* Support for structural variant evaluation. This release includes beta commands for comparing structural variant calls such as translocations, inversions, and sequence-resolved larger insertions and deletions, using a similar workflow to vcfeval/rocplot. The new svdecompose command converts higher-level SV events and longer insertions and deletions into break-ends. In conjunction with this, the new bndeval command runs a comparison between a baseline and called break-end dataset. Outputs are VCF and ROC data files that are compatible with rocplot.

* Java 9 compatibility testing. RTG is compatible with Java 9, although currently we recommend Java 8 for performance reasons. Also note that due to differences in Java Math library implementation between Java 8 and Java 9, in rare situations minor output differences may be observed when comparing results obtained using Java 8 with Java 9.  Builds that include a bundled JRE have been updated to the latest JRE 8u161.

* Improvements to the AVR models that perform variant scoring. The variant callers include new predictor attributes and all AVR models have been rebuilt to take advantage of these.

Commercial users of RTG Core may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products/rtg-core-non-commercial or build from the source on github at https://github.com/RealTimeGenomics/rtg-core.

Users of RTG Tools, which is made freely available for non-commercial or commercial use alike, can download the new version from our website at http://realtimegenomics.com/products/rtg-tools or build from the source code on github at https://github.com/RealTimeGenomics/rtg-tools.


Detailed changes are listed below by area.  For more information on new features, see the RTG Operations Manual which is included within the distribution as HTML and PDF.

## Basic Formatting and Mapping

* format: In addition to minimum and maximum length of input and output
  sequences, now outputs the mean length of the sequences.

* petrim: This command is now available in RTG Tools.

* petrim: New flag --mismatch-adjustment allows updating bases within
  reads when non-matching bases are encountered in the overlap.

* petrim: Output summary and length distribution information.

* sammerge: New flag --no-header, does what it says on the tin.

* map/cgmap: Output SAM/BAM records include an XC:A:A attribute for
  those reads unmapped due to no index hits. (The mapping summary.txt
  output has also been altered slightly to account for this)

* map/cgmap: The HTML output reports include read summary status counts.

* map: Direct mapping of fastq data containing 0 length sequences could
  result in an exception or incorrect quality data being associated with
  a sequence in the output BAM. This has been fixed.

* map: Prevent exception when using a SAM/BAM read group without a
  sample tag specified. We now mandate a sample field be present.

### Variant Calling

* snp: Prevent exception when using a SAM/BAM read group without a
  sample tag specified. We now mandate a sample field be present.

* family/population: Fix an arithmetic overflow during calculation of
  priors in Hardy-Weinberg.

* variant callers: The default representation used for the output of
  complex haplotype calls now breaks these calls into smaller components
  than previously. This behaviour is selectable via an advanced flag:
  --Xtrim-split={none,standard,trim,align}.

* variant callers: The default AVR model is now illumina-wgs.avr rather
  than illumina-exome.avr.  When processing exome data, we would
  recommend only using the illumina-exome model if you are specifically
  interested in ranking variant calls outside of target regions.

* somatic: The VAF annotation is produced by default (previously this
  annotation was only produced when using the --min-allelic-fraction /
  --min-allelic-count flags)

* avrbuild: Multi-thread the loading of training VCF files.

* discord: Various improvements, primarily improving compatibility with
  third party BAM files and to better handle sequencing with smaller
  average fragment lengths.

* cnvponbuild: A region label column is not required (one can be
  specified with the new flag --label-column-name).

* cnvponbuild: The name of the input column supplying coverage levels
  can be overridden with the new flag --coverage-column-name.

* segment: New flag --min-panel-coverage allows specifying a minimum
  normalized coverage threshold applied to the input panel of normals
  file.

### Variant Processing and Analysis

* vcffilter: New flags --min-alleles/--max-alleles to filter by number
  of alleles. For example, --min-alleles=2 --max-alleles=2 for biallelic
  sites only.

* vcffilter: New flag --fail-samples to allow setting the FT FORMAT
  field of samples that fail the filtering criteria.

* vcffilter: Fix Javascript interpreting the setting an INFO field to
  the value '1' as setting a flag type INFO field.

* vcffilter/vcfannotate: New flag --add-header to supply extra header
  lines, either as literal lines or read from file.

* vcfannotate: New flag --annotation to allow adding several computed
  annotations to the VCF records. See the user manual for the list of
  available annotations.

* vcfsubset: Rather than aborting when trying to process VCFs that do
  not contain header declarations for fields to be manipulated, just
  warn and continue.

* vcfstats: Improvement in counting of partial calls, and do not issue a
  warning when polyploid calls are encountered. There has been a slight
  change in output format regarding partial calls, so check any scripts
  that may be parsing vcfstats output.

* vcfmerge: The --preserve-format also applies when two input records
  contain calls for the same sample at the same reference position and
  span.

* vcfmerge: The existing flag --add-header now allows lines read from
  file.

* vcfmerge: New flag --input-list-file to allow supplying the VCFs to
  merge via a text file.

* vcfeval: New flag --decompose to allow decomposing VCF files prior to
  evaluation. This permits some degree of partial credit allocation for
  callers that produce longer complex calls rather than breaking calls
  into small constituents. Warning: When this flag is used, output VCF
  files will contain decomposed allele representations, but with
  annotations from the original records, so any annotations that depend
  on un-decomposed variant representations (e.g. allelic depths, GL,
  etc) may no longer be meaningful. Records that have been decomposed
  contain ORP and ORL locations indicating the position and length of
  the original variants to allow backtracking through the decomposition.

* vcfeval: The ROC data files corresponding to variant type subsets
  (e.g. snps and indel specific) now include the additional metrics such
  as sensitivity and precision that were previously present only in the
  full ROC data file. See the user manual for more information about how
  these metrics are computed for these subsets.

* vcfeval: Improvements to --ref-overlap in cases where variants can
  have ref bases removed from either side to choose the side that
  minimizes overlaps with other variants.

* vcfeval: Algorithm adjustment to permit more frequent syncing, helping
  to reduce instances where variants are too complex to evaluate.

* vcfeval: Support for the '*' ALT allele that indicates a spanning
  deletion.

* rocplot: Produce a more informative error message when trying to open
  the GUI when running in a headless environment.

* rocplot: (GUI) Remember zoom levels independently for ROC and
  Precision/Recall graphs for better behaviour when swapping back and
  forth.

* rocplot: (GUI) A secondary crosshair is available by shift-click
  placement which allows displaying the difference in metrics between
  the two points.

* rocplot: (GUI) Permit curve interpolation (this can be important for
  precision recall curves with sparse data, since linear interpolation
  in precision/recall space can be misleading).

* vcfdecompose: New command to decompose complex variants into smaller
  components.

* svdecompose: New command to break structural variant DUP/INV/DEL
  events and longer sequence-resolved insertions and deletions into
  constituent break ends for evaluation with bndeval.

* bndeval: New command to compare breakend call sets.  This command
  provides a similar workflow to vcfeval in terms of output files and
  use of rocplot for benchmarking call sets.

### Other

* pedfilter: New filtering options to select portions of an input
  pedigree: --keep-family allows retaining particular families;
  --keep-ids allows selecting particular individuals from the larger
  pedigree.

* aview: New flags --sort-sample and --print-sample.

* many: The --no-index flag has been removed. This option was of little
  use since index files are almost always generated on the fly rather
  than as a separate pass. The behaviour is currently still available
  in this release via --Xno-index, but will removed in the future.

* many: The use of --Xforce to write into an existing directory will
  now remove any pre-existing log file / done file / progress file.

* many: Colorized command line help. Whether this is enabled is
  automatically determined, but can be disabled using RTG_JAVA_OPTS
  (either per-command or in rtg.cfg) using
  -Drtg.default-markup=none. See the user manual for more information.

* many: Single region restrictions can now be specified using the syntax
  <chr>:<pos>~<size> to denote the range surrounding <pos> by <size> on
  each side.

* many: Miscellaneous bugfixes and improvements to error handling.

* misc: version and crash talkbacks attempt to indicate to the user if a
  new version is available.

* misc: Update to htsjdk 2.14.3.

* misc: Update rtg launcher script to accept Java 9. However, for
  performance reasons we recommend using Java 8 for computationally
  intensive analysis such as mapping and variant calling.

* misc: Update bundled JRE to 1.8.0_161.

Reply all
Reply to author
Forward
0 new messages