Hi there,
We, guest editors, prepared an exciting new International Journal of Computer Vision (IJCV) Special Issue on Audio-Visual Generation!
We cordially invite you or your colleagues to submit related papers to the Special Issue.
Call for papers can be found
here.
** Special Issue "Audio-Visual Generation" **
The ability to simulate and reason about the physical world is central to human intelligence. We perceive our surroundings and construct mental models that allow us to internally simulate possible outcomes, enabling reasoning, planning, and action—what we might call “world simulators”. Similarly, developing a world simulator is crucial for building human-like AI systems that can interact effectively with dynamic and complex environments. Recent research has shown that high-fidelity video generation models are a promising path toward building such comprehensive and efficient world simulators.However, the physical world is inherently multimodal. Human perception mostly relies not only on visual stimuli but also on sound. Sound often conveys critical information complementing what we can see, providing a richer and more nuanced understanding of the environment. To create world simulators capable of mimicking human-like perception and reasoning, it is crucial to develop coherent audiovisual generative models. Despite this, most modern approaches focus on vision-only or language-visual modalities, often with less focus on understanding and generating integrated audiovisual signals.This special issue aims to spotlight the exciting yet underexplored field of audio-visual generation as a key stepping stone towards achieving multi-modal world simulators. Our goal is to prioritize innovative approaches that explore this multimodal integration, advancing both the generation and analysis of audio-visual content. In addition to these approaches, we also aim to explore the broader impacts of this research. Moreover, in line with the classical concept of analysis-by-synthesis, advances in audiovisual generation can foster improvements in analysis and understanding methods, reinforcing the symbiotic relationship between these two areas. This research is not merely about content creation; it holds the potential to form a fundamental building block for more advanced, human-like AI systems.Topics of interest: This special issue invites research articles tackling the challenges and proposing novel creative ideas in audio-visual generation. Potential topics of interest include, but are not limited to:· Audio and image/video generation· Audio-conditional X generation· Speech and talking avatar generation· Advanced audio-visual adaptor or interface· Benchmark and dataset· Ethical considerations and social impact· Generic topics and applications related to audio-visual generation ** Submission Guidelines **
Please submit via IJCV Editorial Manager: www.editorialmanager.com/visiChoose SI: Audio-Visual Generation from the dropdown.Submitted papers should present original, unpublished work, relevant to one of the topics of the Special Issue. All submitted papers will be evaluated on the basis of relevance, significance of contribution, technical quality, scholarship, and quality of presentation, by at least two independent reviewers. It is the policy of the journal that no submission, or substantially overlapping submission, be published or be under review at another journal or conference at any time during the review process. Manuscripts will be subject to a peer reviewing process and must conform to the author guidelines available on the IJCV website at: https://www.springer.com/11263. ** Important Dates **
· Manuscript submission deadline: 15 March 2025· First review notification: 25 May 2025· Revised manuscript submission: 10 July 2025· Final review notification: 10 August 2025· Final manuscript submission: 20 September 2025· Publication: Fall 2025 Best regards,IJCV SI Audio-Visual Generation Guest Editors,
With Tae-Hyun Oh, Shiqi Yang, Zhixiang Wang, Sergey Tulyakov, Stavros Petridis, Vicky Kalogeiton, Ming-Hsuan Yang