RTG Core 3.4 Release / non-commercial availability, including source code

34 views
Skip to first unread message

RTG Announcements

unread,
Dec 22, 2014, 10:29:36 PM12/22/14
to
Just in time for the holidays, we are pleased to announce our new 
release!

We are especially excited to be making RTG Core available 
for non-commercial academic research under improved terms 
(see LICENCE) in response to the feedback we have been receiving. 
The main highlights are:

* Free for non-commercial academic research

* Unlimited duration (no license key file required)

* Source code available on github at: https://github.com/RealTimeGenomics/rtg-core

* The non-commercial release is available for download now via

If you have any problems or questions, you can contact us at 
sup...@realtimegenomics.com and we'll do our best to help you out.
If you require a license for commercial use, or wish to purchase 
commercial support, contact us via in...@realtimegenomics.com.

Below are the release notes for RTG Core 3.4. We aim 
to produce an updated release of RTG Tools but couldn't fit it in just 
yet -- look for that in the new year.

=== Release Notes for RTG Core 3.4 ===

Below are the release notes for RTG Core, upon which RTG Core 3.4
is built.  Not all features described below may be included in this
product.

RTG Core 3.4 (2014-12-20)
-------------------------

Major features of this release:

* Added the ability to run variant calling only on a list of regions
  provided via BED file.  This results in a large speed improvement
  when performing exome variant calling, by avoiding computation
  associated with off-target locations, as well as permitting fast
  variant calling of target sites from whole genome data, or running
  variant calling in haploid mode in areas of loss-of-heterozygosity.

* Added the ability to perform variant calling for sites where the
  reference is unknown but where reads have been mapped. This can be
  used to fill in gaps in draft reference assemblies.  This includes
  both sites where an N is observed in the reference, larger N-blocks
  where reads have been mapped spanning the N block, and large
  N-blocks where reads are anchored on one side by known reference.

* Workflow improvements to human pipeline processing to identify
  mislabelled samples or incorrect pedigree.  At the end of read
  mapping, average coverage levels across chromosomes are examined and
  a warning is issued if there appear to be gross chromosomal
  abnormalities or if the coverage levels do not match expected levels
  for the sex of the individual specified. A standalone tool for this
  is also provided.  Similarly, the mendelian analysis tool now
  computes concordance with pedigree and issues a warning if low
  concordance indicates a parent or child is inconsistent with the
  supplied pedigree.  In addition we have added two commands for
  manipulating, extracting information from, and summarizing pedigree
  files.

* New commands for metagenomics taxonomy and reference database
  management.  Previously using metagenomics databases other than those
  pre-built by RTG was difficult and error-prone.  Three commands have
  been added to allow taxonomy construction starting from a NCBI
  taxonomy dump, filtering the taxonomy based on user criteria, and
  validating the structure of a metagenomics species reference
  database.


Detailed changes are listed below by area.  Please read these through
fully, as some command-line flags have changed, so updates to your
pipeline scripts may be required. For more information on new
features, see the RTG Operations Manual.


== Basic Formatting and Mapping

* map/cgmap/mapf: As an alternative to supplying --sex to specify the
  sex of the individual being mapped, you may specify a pedigree file
  containing the sex information for the sample.  This requires you to
  have either formatted the read set with read-group information or to
  supply read group information at mapping time (the advantage of this
  feature is that it lets you minimize the number of command-line
  differences for each sample being mapped).

* map/cgmap: When mapping using a reference containing sex chromosome
  information, average per-chromosome coverage information is used to
  issue warnings when it is likely that the incorrect mapping sex has
  been specified or if any autosomes have abnormal coverage levels
  (perhaps indicating a chromosomal abnormality).  This feature
  requires you to be using a reference genome SDF containing chromosome
  information, as described in the RTG Operations Manual.

* chrstats: New command to perform standalone average coverage
  reporting and checking against expected coverage levels from
  calibrated mapping files.  This is essentially the same check that is
  performed during mapping, but allows multiple mapping files to be
  provided (either if multiple mapping runs were performed for a
  single sample, or for batch reporting for multiple samples).

* calibrate: New option --merge to allow merging multiple alignment
  files into a single output file while performing calibration.  For
  example, this can reduce the number of I/O operations needed to go
  from multiple, uncalibrated, unindexed third party input files to a
  single calibrated indexed BAM file.

* calibrate: New option --threads to allow calibration of multiple files to
  use multiple cores.  (Currently this option only takes effect when
  used with the --merge option, not regular multi-file calibration)


=== Variant Calling

* snp/family/population/somatic: New flag --bed-regions, adds the
  ability to only perform calling on the regions specified via a BED
  file.  This is more efficient than applying BED filtering via
  --filter-bed.  However note that the results can sometimes differ,
  due to edge effects of complex calling regions that cross region
  boundaries.

* snp/family/population/somatic: Implemented variant calling across
  N's in the reference.  (This was previously occurring in some cases
  where mappings across the N contain indels, but has now been fully
  implemented).  Calls where the reference is not a valid allele due to
  containing an N are annotated with an NREF INFO tag for easy
  filtering, and neither contain QUAL or GL values.

* snp: As an alternative to supplying --sex to specify the sex of the
  individual for variant calling, you may specify a pedigree file
  containing the sex information for the sample.  This can reduce the
  number of command-line differences when processing multiple samples.

* family/population/somatic: Better error handling when input mappings
  contain a record that does not correspond to one of the samples
  being called.

* snp/family/population/somatic: Fixed a hang that could occur when
  trying to clean up after an out-of-memory error.

* snp/family/population/somatic: Fixed a rare crash that could occur
  at the end of chromosomes.

* somatic: Previously stored a somatic score indicating the likelihood
  of the variant being a somatic variant in the QUAL field.  This is
  not strictly according to the VCF spec, so this score has been moved
  to the new NCS INFO field.

* vcfannotate: The --fill-ac-an flag now does not add an AC annotation
  when no ALTs are present in a record.

* vcffilter: New flag --region to extract and filter only the variants
  contained within a single specified region.

* vcffilter: New flag --bed-regions to extract and filter only
  variants contained within the regions contained in a BED file.

* vcffilter: Better error handling when applying criteria that require
  GT be present to files that are missing the GT field.

* vcfmerge: The default behaviour has changed when merging variants at
  the same position where the ALTs are different and the variants
  contain FORMAT fields that cannot be automatically be merged
  (Number=A,G,R, or the special case of the AD FORMAT field).  Now
  these FORMAT fields are removed to allow the merge to proceed.  There
  is a new flag --preserve-formats to instead output separate variants
  that keep those FORMAT fields.

* vcfeval: New flag --baseline-tp that allows additionally outputing
  the baseline version of true positive variants (the regular tp.vcf
  contains the called representation of true positive variants).

* vcfeval: --squash-ploidy treats heterozygous calls in baseline and
  calls as homozygous ALT to allow a lenient comparison.  Note that
  genotypes at multi-allelic sites where neither allele is REF simply
  choose the ALT with the highest index.

* vcfeval: Fixed an exception that could occur when processing variant
  missing GT information for some samples.

* vcfeval: Fixed an exception that could occur when provided variants
  that were outside the bounds of the supplied reference genome

* vcfeval: Fixed an inconsistency when handling ROC files in locales
  where ',' is the decimal separator.

* mendelian: The default is now to perform checks only on non-failing
  variants. The --pass flag has been removed, and a new flag added
  --all-records in order to obtain the behaviour of checking all
  variant records regardless of filters.

* mendelian: Now performs concordance checking to detect sample
  mislabelling and incorrect pedigree.

* mendelian: Removed --male and --female flag, which were only needed
  for VCFs produced by versions of RTG prior to 2.7.  If required,
  alternative pedigree information can be supplied via the --pedigree
  flag.


=== Metagenomics

* ncbi2tax: New tool to generate an RTG taxonomy file from NCBI
  taxonomy dump.

* taxfilter: New tool for the custom filtering of taxonomy files and
  metagenomic reference SDFs containing taxonomy information.

* taxstats: New tool for verifying the contents of a metagenomic
  reference SDF.


=== Other

* sdfsubseq: The output sequence name is the same as the input
  sequence if the coordinates are unchanged.

* many: Added the ability to read BED from stdin by specifying '-' as
  the BED file name (this is not supported in cases where a region
  restriction is also being applied to the file, as this would require
  the BED to be tabix indexed)

* many: Added the ability to read VCF from stdin by specifying '-' as
  the VCF file name (not supported in cases where a region restriction
  is also being applied to the file, as this would require the VCF to
  be tabix indexed)

* many: Users of linux bash can enable command and flag
  completion. See the file rtg-bash-completion in the scripts
  directory for more information.

* bgzip: New flag --no-terminate allows the omission the block gzip
  termination block. This permits advanced users to compress multiple
  files for later fast concatenation (the termination block should be
  present on the final file only).

* bgzip: New flag --compression-level allows altering the degree of
  compression (thus speed) from 1 (least but fast) to 9 (best but
  slow).

* rocplot: GUI mode has better error handling when there is no
  graphical environment.

* rocplot: PNG output mode will attempt to use headless mode to
  prevent an error when the graphical environment is unavailable.

* popsim: Speed improvements.

* readsim/cgsim: Added the --sam-rg flag to set the read group
  information to be stored in the output SDF. Removed --diploid-input
  as the recommended way to simulate diploid genomes is to use
  samplereplay or the --output-sdf option of
  samplesim/childsim/denovosim.

* readsimeval: New command for evaluating the accuracy of mapping reads
  generated by readsim.

* pedfilter: New command for pedigree file filtering and simple
  manipulation and conversion between pedigree PED files and
  pedigree-augmented VCF headers.

* pedstats: New command for extracting information and summarizing
  information contained in a pedigree file.

* aview: The flag --dont-display-dots has been renamed to
  --no-dots for consistency.

Reply all
Reply to author
Forward
0 new messages