Opus Sample Rate

1 view

Skip to first unread message

Charlotta Menchaca

unread,

Aug 5, 2024, 12:25:41 AM8/5/24

to vasgardprinla

Opusis a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors.[4][5] Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.[6][7]

Opus combines the speech-oriented LPC-based SILK algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency.[4] Bitrate, audio bandwidth, complexity, and algorithm can all be adjusted seamlessly in each frame. Opus has the low algorithmic delay (26.5 ms by default)[8] necessary for use as part of a real-time communication link, networked music performances, and live lip sync; by trading off quality or bitrate, the delay can be reduced down to 5 ms. Its delay is exceptionally low compared to competing codecs, which require well over 100 ms, yet Opus performs very competitively with these formats in terms of quality per bitrate.[9]

As an open format standardized through RFC 6716, a reference implementation called libopus is available under the New BSD License. The reference has both fixed-point and floating-point optimizations for low- and high-end devices, with SIMD optimizations on platforms that support them. All known software patents that cover Opus are licensed under royalty-free terms.[10] Opus is widely used as a voice over IP (VoIP) codec in applications such as Discord,[11] WhatsApp,[12][13][14] and the PlayStation 4.[15]

Opus supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s (or up to 256 kbit/s per channel for multi-channel tracks), frame sizes from 2.5 ms to 60 ms, and five sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, the human hearing range). An Opus stream can support up to 255 audio channels, and it allows channel coupling between channels in groups of two using mid-side coding.

Opus has very short latency (26.5 ms using the default 20 ms frames and default application setting), which makes it suitable for real-time applications such as telephony, Voice over IP and videoconferencing; research by Xiph led to the CELT codec, which allows the highest quality while maintaining low delay. In any Opus stream, the bitrate, bandwidth, and delay can be continually varied without introducing any distortion or discontinuity; even mixing packets from different streams will cause a smooth change, rather than the distortion common in other codecs. Unlike Vorbis, Opus does not require large codebooks for each individual file, making it more efficient for short clips of audio and more resilient.

The Opus format is based on a combination of the full-bandwidth CELT format and the speech-oriented SILK format, both heavily modified: CELT is based on the modified discrete cosine transform (MDCT) that most music codecs use, using CELP techniques in the frequency domain for better prediction, while SILK uses linear predictive coding (LPC) and an optional Long-Term Prediction filter to model speech. In Opus, both were modified to support more frame sizes, as well as further algorithmic improvements and integration, such as using CELT's range encoder for both types. To minimize overhead at low bitrates, if latency is not as pressing, SILK has support for packing multiple 20 ms frames together, sharing context and headers; SILK also allows Low Bitrate Redundancy (LBRR) frames, allowing low-quality packet loss recovery. CELT includes both spectral replication and noise generation, similar to AAC's SBR and PNS, and can further save bits by filtering out all harmonics of tonal sounds entirely, then replicating them in the decoder.[16] Better tone detection is an ongoing project to improve quality.

The format has three different modes: speech, hybrid, and CELT. When compressing speech, SILK is used for audio frequencies up to 8 kHz. If wider bandwidth is desired, a hybrid mode uses CELT to encode the frequency range above 8 kHz. The third mode is pure-CELT, designed for general audio. SILK is inherently VBR and cannot hit a bitrate target, while CELT can always be encoded to any specific number of bytes, enabling hybrid and CELT mode when CBR is required.

SILK supports frame sizes of 10, 20, 40 and 60 ms. CELT supports frame sizes of 2.5, 5, 10 and 20 ms. Thus, hybrid mode only supports frame sizes of 10 and 20 ms; frames shorter than 10 ms will always use CELT mode. A typical Opus packet contains a single frame, but packets of up to 120 ms are produced by combining multiple frames per packet. Opus can transparently switch between modes, frame sizes, bandwidths, and channel counts on a per-packet basis, although specific applications may choose to limit this.

The reference implementation is written in C and compiles on hardware architectures with or without a floating-point unit, although floating-point is currently required for audio bandwidth detection (dynamic switching between SILK, CELT, and hybrid encoding) and most speed optimizations.

Opus packets are not self-delimiting, but are designed to be used inside a container of some sort which supplies the decoder with each packet's length. Opus was originally specified for encapsulation in Ogg containers, specified as audio/ogg; codecs=opus, and for Ogg Opus files the .opus filename extension is recommended.[2] Opus streams are also supported in Matroska,[17] WebM,[18] MPEG-TS,[19] and MP4.[20]

An optional self-delimited packet format is defined in an appendix to the specification.[22] This uses one or two additional bytes per packet to encode the packet length, allowing packets to be concatenated without encapsulation.

Opus allows the following bandwidths during encoding. Opus compression does not depend on the input sample rate; timestamps are measured in 48 kHz units even if the full bandwidth is not used. Likewise, the output sample rate may be freely chosen. For example, audio can be input at 16 kHz yet be set to encode only narrowband audio.[23]

Opus was proposed for the standardization of a new audio format at the IETF, which was eventually accepted and granted by the codec working group. It is based on two initially separate standard proposals from the Xiph.Org Foundation and Skype Technologies S.A. (now Microsoft). Its main developers are Jean-Marc Valin (Xiph.Org, Octasic, Mozilla Corporation, Amazon), Koen Vos (Skype), and Timothy B. Terriberry (Xiph.Org, Mozilla Corporation, Amazon). Among others, Juin-Hwey (Raymond) Chen (Broadcom), Gregory Maxwell (Xiph.Org, Wikimedia), and Chris Montgomery (Xiph.Org) were also involved.

The development of the CELT part of the format goes back to thoughts on a successor for Vorbis under the working name Ghost. As a newer speech codec from the Xiph.Org Foundation, Opus replaces Xiph's older speech codec Speex, an earlier project of Jean-Marc Valin. CELT has been worked on since November 2007.

The SILK part has been under development at Skype since January 2007 as the successor of their SVOPC, an internal project to make the company independent from third-party codecs like iSAC and iLBC and respective license payments.

In July 2010, a prototype of a hybrid format was presented that combined the two proposed format candidates SILK and CELT. In September 2010, Opus was submitted to the IETF as proposal for standardization. For a short time the format went under the name of Harmony before it got its present name in October 2010.[26] At the beginning of February 2011, the bitstream format was tentatively frozen, subject to last changes.[27] Near the end of July 2011, Jean-Marc Valin was hired by the Mozilla Corporation to continue working on Opus.[28]

In November 2011, the working group issued the last call for changes on the bitstream format. The bitstream has been frozen since January 8, 2012.[29] On July 2, 2012, Opus was approved by the IETF for standardization.[30] The reference software entered release candidate state on August 8, 2012.[31] The final specification was released as RFC 6716 on September 10, 2012.[32][33] and versions 1.0 and 1.0.1 of the reference implementation libopus were released the day after.

On December 5, 2013, libopus 1.1 was released,[34] incorporating overall speed improvements and significant encoder quality improvements: Tonality estimation boosts bitrate and quality for previously problematic samples, like harpsichords; automated speech/music detection improves quality in mixed audio; mid-side stereo reduces the bitrate needs of many songs; band precision boosting for improved transients; and DC rejection below 3 Hz. Two new VBR modes were added: unconstrained for more consistent quality, and temporal VBR that boosts louder frames and generally improves quality.

libopus 1.2 Beta was released on May 24, 2017. libopus 1.2 was released on June 20, 2017.[35] Improvements brought in 1.2 allow it to create fullband music at bitrates as low as 32 kbit/s, and wideband speech at just 12 kbit/s.[36]

libopus 1.3.1 was released on April 12, 2019.[40] This Opus 1.3.1 minor release fixes an issue with the analysis on files with digital silence (all zeros), especially on x87 builds (mostly affects 32-bit builds). It also includes two new features:

The codec is under active development.[45] The current focus is on adding a deep learning based redundancy encoder that enhances packet loss concealment by embedding one second of recovery data in each encoded packet. The deep redundancy (DRED) algorithm was developed by among others Jean-Marc Valin, Ahmed Mustafa, Jan Bthe, Timothy Terriberry, Chris Montgomery, Michael Klingbeil, and Paris Smaragdis from Amazon Web Services[46] with sponsorship to open source the algorithm and subsequently extend the IETF standard from Sid Rao.[47] This encoder is a backwards compatible change to the codec enabling customers to easily upgrade applications to take advantage of this machine learning capability. A draft RFC is underway to standardize the new capability.[48] This RFC is one of the first attempts to standardize a deep learning algorithm in the IETF.