4:00pm, Patrick Perez, Kyutai, France
Title: A multistream multimodal foundation model for real-time voice-based applications
Abstract: A unique way for humans to seamlessly exchange information and emotion, speech should be a key means for us to communicate with and through machines. This is not yet the case. In an effort to progress toward this goal, we introduce a versatile speech-text decoder-only model that can serve a number of voice-based applications. It has in particular allowed us to build Moshi, the first-ever full-duplex spoken-dialogue system (with no latency and no imposed speaker turns) as well as Hibiki, the first simultaneous voice-to-voice translation model with voice preservation to run on a mobile phone. This multistream multimodal model can also be turned into a visual-speech model (VSM) via cross-attention with visual information, which allows Moshi to freely discuss about an image while maintaining its natural conversation style and low latency. This talk will provide an illustrated tour of this research.
Short bio: Patrick Pérez is CEO at Kyutai, a non-profit open-science AI lab, based in Paris. Prior to this, Patrick was at Valeo as VP of AI and Scientific Director of valeo.ai (2018-2023), and with Technicolor (2009-2018), Inria (1993-2000, 2004-2009) and Microsoft Research Cambridge (2000-2004) as research scientist. His research interests lie in reliable multimodal AI for the benefit of all.
5:00pm, Amel Bennaceur, The Open University, UK
Title: Engineering Safe and Socially-Aware AI Systems
Abstract: Artificial Intelligence (AI) systems often require interaction and cooperation with humans to achieve their goals. However, human behaviour is uncertain and complex, and so it can be difficult to reason about it formally. Specifying, designing, implementing, and deploying AI systems able to cooperate with humans is challenging but crucial for reliability and trustworthiness of those AI systems. In this talk, I will present IDEA: an adaptive software architecture that enables cooperation between humans and autonomous systems, by leveraging in the social identity approach. This approach establishes that group membership drives human behaviour. Identity and group membership are crucial during emergencies, as they influence cooperation among survivors. IDEA systems infer the social identity of surrounding humans, thereby establishing their group membership. By reasoning about groups, we limit the number of cooperation strategies the system needs to explore. IDEA systems select a strategy from the equilibrium analysis of game-theoretic models, that represent interactions between group members and the IDEA system. I will show how this approach can extend AI systems’ capability to enhance the resilience of local communities and emergency services in the aftermath of a mass emergency in an urban area by facilitating their effective cooperation. I will show how we can use synthesis techniques to also ensure the safety of the cooperation. Finally, I will reflect on the move to allyship between human and AI systems considering the role of requirements and alignment.
Bio: Dr. Amel Bennaceur is an associate professor and director of research at the School of Computing at the Open University, UK. Her research focuses on formally-grounded and practice-informed software engineering methods and techniques to ensure the trustworthiness and resilience of intelligent systems. She published the results of this work in 60+ papers in top journals and conferences (TOSEM, TSE, Middleware, and ECSA) in research areas such as Software Engineering and Distributed Systems. She contributed to several EU and EPSRC research projects.