Comparingnormalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Guangyan Zhang, Thomas Merritt, Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers
Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li
Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition
Zhao Yang, Dianwen Ng, Chong Zhang, Xiao Fu, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma, Jizhong Zhao
The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Zheng Yuan, Aldo Pastore, Dorina de Jong, Hao Xu, Luciano Fadiga, Alessandro D'Ausilio
Classification of Multi-class Vowels and Fricatives From Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran
Multi-Head State Space Model for Speech Recognition
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales
Topological Data Analysis for Speech Processing
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning
Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark
Gaobin Yang, Jun Du, Maokui He, Shutong Niu, Baoxiang Li, Jiakui Li, Chin-Hui Lee
Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Christiaan Jacobs, Nathanal Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper
How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara Sainath
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Kamer Ali Yuksel, Thiago Castro Ferreira, Golara Javadi, Mohamed Al-Badrashiny, Ahmet Gunduz
Description and Analysis of ABC Submission to NIST LRE 2022
Pavel Matejka, Anna Silnova, Josef Slavček, Ladislav Mosner, Oldřich Plchot, Michal Klčo, Junyi Peng, Themos Stafylakis, Lukš Burget
Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation
Tanel Alume, Kunnar Kukk, Viet-Bac Le, Claude Barras, Abdel Messaoudi, Waad Ben Kheder
Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22
Jess Villalba, Jonas Borgstrom, Maliha Jahan, Saurabh Kataria, Leibny Paola Garcia, Pedro Torres-Carrasquillo, Najim Dehak
Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara Sainath, Pedro M. Mengibar
On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
Titouan Parcollet, Shucong Zhang, Rogier van Dalen, Alberto Gil C. P. Ramos, Sourav Bhattacharya
Automatic speaker recognition with variation across vocal conditions: a controlled experiment with implications for forensics
Vincent Hughes, Jessica Wormald, Paul Foulkes, Philip Harrison, Finnian Kelly, David van der Vloed, Poppy Welch, Chenzi Xu
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices
Konstantinos Markopoulos, Georgia Maniati, Georgios Vamvoukakis, Nikolaos Ellinas, Georgios Vardaxoglou, Panos Kakoulidis, Junkwang Oh, Gunu Jho, Inchul Hwang, Aimilios Chalamandaris, Pirros Tsiakoulis, Spyros Raptis
"Select language, modality or put on a mask!" Experiments with Multimodal Emotion Recognition
Paweł Bujnowski, Bartłomiej Kuźma, Bartłomiej Paziewski, Jacek Rutkowski, Joanna Marhula, Zuzanna Bordzicka, Piotr Andruszkiewicz
When Words Speak Just as Loudly as Actions: Virtual Agent Based Remote Health Assessment Integrating What Patients Say with What They Do
Vikram Ramanarayanan, David Pautler, Lakshmi Arbatti, Abhishek Hosamath, Michael Neumann, Hardik Kothare, Oliver Roesler, Jackson Liscombe, Andrew Cornish, Doug Habberstad, Vanessa Richter, David Fox, David Suendermann-Oeft, Ira Shoulson
5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids
Ankit Gupta, Abhijeet Bishnu, Mandar Gogate, Kia Dashtipour, Tughrul Arslan, Ahsan Adeel, Amir Hussain, Tharmalingam Ratnarajah, Mathini Sellathurai
FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding
Xuxin Cheng, Wanshi Xu, Ziyu Yao, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou
Tensor decomposition for minimization of E2E SLU model toward on-device processing
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair
Jiarui Lu, Bo-Hsiang Tseng, Joel Ruben Antony Moniz, Site Li, Xueyun Zhu, Hong Yu, Murat Akbacak
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer
Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Mete Ozay, Karthikeyan Saravanan, Myoungji Han, Jung In Lee, Seokyeong Jung
A Two-stage Progressive Neural Network for Acoustic Echo Cancellation
Zhuangqi Chen, Xianjun Xia, Cheng Chen, Xianke Wang, Yanhong Leng, Li Chen, Roberto Togneri, Yijian Xiao, Piao Ding, Shenyi Song, Pingjian Zhang
Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-attended Speaker Representations
Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Titouan Parcollet, Rogier van Dalen, Sourav Bhattacharya
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur
OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Li Fu, Siqi Li, Qingtao Li, Fangzhu Li, Liping Deng, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He
Automatic Prediction of Language Learners' Listenability Using Speech and Text Features Extracted from Listening Drills
Yingxiang Gao, Jaehyun Choi, Nobuaki Minematsu, Noriko Nakanishi, Daisuke Saito
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer
Ram C. M. C. Shekar, Mu Yang, Kevin Hirschi, Stephen Looney, Okim Kang, John H. L. Hansen
TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective
Andong Li, Weixin Meng, Guochen Yu, Wenzhe Liu, Xiaodong Li, Chengshi Zheng
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari
A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression With and Without Medication
Michael Neumann, Hardik Kothare, Doug Habberstad, Vikram Ramanarayanan
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Edresson Casanova, Christopher Shulby, Alexander Korolev, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Alusio, Moacir Antonelli Ponti
Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR
Burin Naowarat, Philip Harding, Pasquale D'Alterio, Sibo Tong, Bashar Awwad Shiekh Hasan
3a8082e126