Real Time Genomics are pleased to announce the availability of new releases of our full analysis suite, RTG Core, and our utility package, RTG Tools. This release includes new features and performance improvements. Some of the highlights of this release:
* Representation improvements to variant caller outputs. The various variant calling commands default to an alternative algorithm for representing haplotype calls as smaller components. While the underlying haplotype calls are the same as previous releases, the decomposed representation used in VCF is now more granular, so results can look quite different on the surface.
* Small-variant evaluation improvements. vcfeval allows optional preprocessing of input VCF files to decompose large calls to smaller constituents, which can permit longer calls to receive partial credit during accounting. In addition, vcfeval snp and indel ROC outputs now include precision and sensitivity metrics. For more information see the user manual. vcfeval also supports matching variants that occur inside a spanning deletion.
* Support for structural variant evaluation. This release includes beta commands for comparing structural variant calls such as translocations, inversions, and sequence-resolved larger insertions and deletions, using a similar workflow to vcfeval/rocplot. The new svdecompose command converts higher-level SV events and longer insertions and deletions into break-ends. In conjunction with this, the new bndeval command runs a comparison between a baseline and called break-end dataset. Outputs are VCF and ROC data files that are compatible with rocplot.
* Java 9 compatibility testing. RTG is compatible with Java 9, although currently we recommend Java 8 for performance reasons. Also note that due to differences in Java Math library implementation between Java 8 and Java 9, in rare situations minor output differences may be observed when comparing results obtained using Java 8 with Java 9. Builds that include a bundled JRE have been updated to the latest JRE 8u161.
* Improvements to the AVR models that perform variant scoring. The variant callers include new predictor attributes and all AVR models have been rebuilt to take advantage of these.
Commercial users of RTG Core may download the update from our website at
http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at
http://realtimegenomics.com/products/rtg-core-non-commercial or build from the source on github at
https://github.com/RealTimeGenomics/rtg-core.
Users of RTG Tools, which is made freely available for non-commercial or commercial use alike, can download the new version from our website at
http://realtimegenomics.com/products/rtg-tools or build from the source code on github at
https://github.com/RealTimeGenomics/rtg-tools.
Detailed changes are listed below by area. For more information on new features, see the RTG Operations Manual which is included within the distribution as HTML and PDF.
## Basic Formatting and Mapping
* format: In addition to minimum and maximum length of input and output
sequences, now outputs the mean length of the sequences.
* petrim: This command is now available in RTG Tools.
* petrim: New flag --mismatch-adjustment allows updating bases within
reads when non-matching bases are encountered in the overlap.
* petrim: Output summary and length distribution information.
* sammerge: New flag --no-header, does what it says on the tin.
* map/cgmap: Output SAM/BAM records include an XC:A:A attribute for
those reads unmapped due to no index hits. (The mapping summary.txt
output has also been altered slightly to account for this)
* map/cgmap: The HTML output reports include read summary status counts.
* map: Direct mapping of fastq data containing 0 length sequences could
result in an exception or incorrect quality data being associated with
a sequence in the output BAM. This has been fixed.
* map: Prevent exception when using a SAM/BAM read group without a
sample tag specified. We now mandate a sample field be present.
### Variant Calling
* snp: Prevent exception when using a SAM/BAM read group without a
sample tag specified. We now mandate a sample field be present.
* family/population: Fix an arithmetic overflow during calculation of
priors in Hardy-Weinberg.
* variant callers: The default representation used for the output of
complex haplotype calls now breaks these calls into smaller components
than previously. This behaviour is selectable via an advanced flag:
--Xtrim-split={none,standard,trim,align}.
* variant callers: The default AVR model is now illumina-wgs.avr rather
than illumina-exome.avr. When processing exome data, we would
recommend only using the illumina-exome model if you are specifically
interested in ranking variant calls outside of target regions.
* somatic: The VAF annotation is produced by default (previously this
annotation was only produced when using the --min-allelic-fraction /
--min-allelic-count flags)
* avrbuild: Multi-thread the loading of training VCF files.
* discord: Various improvements, primarily improving compatibility with
third party BAM files and to better handle sequencing with smaller
average fragment lengths.
* cnvponbuild: A region label column is not required (one can be
specified with the new flag --label-column-name).
* cnvponbuild: The name of the input column supplying coverage levels
can be overridden with the new flag --coverage-column-name.
* segment: New flag --min-panel-coverage allows specifying a minimum
normalized coverage threshold applied to the input panel of normals
file.
### Variant Processing and Analysis
* vcffilter: New flags --min-alleles/--max-alleles to filter by number
of alleles. For example, --min-alleles=2 --max-alleles=2 for biallelic
sites only.
* vcffilter: New flag --fail-samples to allow setting the FT FORMAT
field of samples that fail the filtering criteria.
* vcffilter: Fix Javascript interpreting the setting an INFO field to
the value '1' as setting a flag type INFO field.
* vcffilter/vcfannotate: New flag --add-header to supply extra header
lines, either as literal lines or read from file.
* vcfannotate: New flag --annotation to allow adding several computed
annotations to the VCF records. See the user manual for the list of
available annotations.
* vcfsubset: Rather than aborting when trying to process VCFs that do
not contain header declarations for fields to be manipulated, just
warn and continue.
* vcfstats: Improvement in counting of partial calls, and do not issue a
warning when polyploid calls are encountered. There has been a slight
change in output format regarding partial calls, so check any scripts
that may be parsing vcfstats output.
* vcfmerge: The --preserve-format also applies when two input records
contain calls for the same sample at the same reference position and
span.
* vcfmerge: The existing flag --add-header now allows lines read from
file.
* vcfmerge: New flag --input-list-file to allow supplying the VCFs to
merge via a text file.
* vcfeval: New flag --decompose to allow decomposing VCF files prior to
evaluation. This permits some degree of partial credit allocation for
callers that produce longer complex calls rather than breaking calls
into small constituents. Warning: When this flag is used, output VCF
files will contain decomposed allele representations, but with
annotations from the original records, so any annotations that depend
on un-decomposed variant representations (e.g. allelic depths, GL,
etc) may no longer be meaningful. Records that have been decomposed
contain ORP and ORL locations indicating the position and length of
the original variants to allow backtracking through the decomposition.
* vcfeval: The ROC data files corresponding to variant type subsets
(e.g. snps and indel specific) now include the additional metrics such
as sensitivity and precision that were previously present only in the
full ROC data file. See the user manual for more information about how
these metrics are computed for these subsets.
* vcfeval: Improvements to --ref-overlap in cases where variants can
have ref bases removed from either side to choose the side that
minimizes overlaps with other variants.
* vcfeval: Algorithm adjustment to permit more frequent syncing, helping
to reduce instances where variants are too complex to evaluate.
* vcfeval: Support for the '*' ALT allele that indicates a spanning
deletion.
* rocplot: Produce a more informative error message when trying to open
the GUI when running in a headless environment.
* rocplot: (GUI) Remember zoom levels independently for ROC and
Precision/Recall graphs for better behaviour when swapping back and
forth.
* rocplot: (GUI) A secondary crosshair is available by shift-click
placement which allows displaying the difference in metrics between
the two points.
* rocplot: (GUI) Permit curve interpolation (this can be important for
precision recall curves with sparse data, since linear interpolation
in precision/recall space can be misleading).
* vcfdecompose: New command to decompose complex variants into smaller
components.
* svdecompose: New command to break structural variant DUP/INV/DEL
events and longer sequence-resolved insertions and deletions into
constituent break ends for evaluation with bndeval.
* bndeval: New command to compare breakend call sets. This command
provides a similar workflow to vcfeval in terms of output files and
use of rocplot for benchmarking call sets.
### Other
* pedfilter: New filtering options to select portions of an input
pedigree: --keep-family allows retaining particular families;
--keep-ids allows selecting particular individuals from the larger
pedigree.
* aview: New flags --sort-sample and --print-sample.
* many: The --no-index flag has been removed. This option was of little
use since index files are almost always generated on the fly rather
than as a separate pass. The behaviour is currently still available
in this release via --Xno-index, but will removed in the future.
* many: The use of --Xforce to write into an existing directory will
now remove any pre-existing log file / done file / progress file.
* many: Colorized command line help. Whether this is enabled is
automatically determined, but can be disabled using RTG_JAVA_OPTS
(either per-command or in rtg.cfg) using
-Drtg.default-markup=none. See the user manual for more information.
* many: Single region restrictions can now be specified using the syntax
<chr>:<pos>~<size> to denote the range surrounding <pos> by <size> on
each side.
* many: Miscellaneous bugfixes and improvements to error handling.
* misc: version and crash talkbacks attempt to indicate to the user if a
new version is available.
* misc: Update to htsjdk 2.14.3.
* misc: Update rtg launcher script to accept Java 9. However, for
performance reasons we recommend using Java 8 for computationally
intensive analysis such as mapping and variant calling.
* misc: Update bundled JRE to 1.8.0_161.