Accepted papers
===============
Title: Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning
Authors: Nisha L. Raichur, Lucas Heublein, Tobias Feigl, Alexander Rügamer, Christopher Mutschler, Felix Ott
Abstract: The primary objective of methods in continual learning is to learn tasks in a sequential manner over time (sometimes from a stream of data), while mitigating the detrimental phenomenon of catastrophic forgetting. This paper proposes a method to learn an effective representation between previous and newly encountered class prototypes. We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL), tailored specifically for class-incremental learning scenarios. We introduce a contrastive loss that incorporates novel classes into the latent representation by reducing intra-class and increasing inter-class distance. Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique. Experimental results conducted on the CIFAR-10, CIFAR-100, and ImageNet100 datasets for image classification and images of a GNSS-based dataset for interference classification validate the efficacy of our method, showcasing its superiority over existing state-of-the-art approaches.
URL: https://openreview.net/forum?id=dNWaTuKV9M
---
Title: Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen
Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. This approach ensures the temporal consistency in the edited videos while maintaining high fidelity to the editing text prompt. We further propose a recursive and ensembled refinement by revisiting the denoising step and guidance scale used in video diffusion process with Video-3DGS. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (×1.9, ×4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
URL: https://openreview.net/forum?id=s1zfBJysbI
---
New submissions
===============
Title: A Comprehensive Survey of Contamination Detection Methods in Large Language Models
Abstract: With the rise of Large Language Models (LLMs) in recent years, abundant new opportunities are emerging, but also new challenges, among which contamination is quickly becoming critical. Business applications and fundraising in Artificial Intelligence (AI) have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model integrity. At the same time, it is becoming harder and harder to keep track of the data that LLMs have seen; if not impossible with closed-source models like GPT-4 and Claude-3 not divulging any information on the training set. As a result, contamination becomes a major issue: LLMs’ performance may not be reliable anymore, as the high performance may be at least partly due to their previous exposure to the data. This limitation jeopardizes real capability improvement in the field of NLP, yet, there remains a lack of methods on how to efficiently detect contamination. In this paper, we survey all recent work on contamination detection with LLMs, analyzing their methodologies and use cases to shed light on the appropriate usage of contamination detection methods. Our work calls the NLP research community’s attention into systematically taking into account contamination bias in LLM evaluation.
URL: https://openreview.net/forum?id=SxNMjbtdFm
---
Title: TRA: Better Length Generalisation with Threshold Relative Attention
Abstract: Transformers struggle with length generalisation, displaying poor performance even on basic tasks. We test whether these limitations can be explained through two key failures of the self-attention mechanism. The first is the inability to fully remove irrelevant information. The second is tied to position, even if the dot product between a key and query is highly negative (i.e. an irrelevant key) learned positional biases may unintentionally up-weight such information - dangerous when distances become out of distribution. Put together, these two failure cases lead to compounding generalisation difficulties. We test whether they can be mitigated through the combination of a) selective sparsity - completely removing irrelevant keys from the attention softmax and b) contextualised relative distance - distance is only considered as between the query and the keys that matter. We show how refactoring the attention mechanism with these two mitigations in place can substantially improve generalisation capabilities of decoder only transformers.
URL: https://openreview.net/forum?id=yNiBUc2hMW
---
Title: On the Efficiency of Diffusion Models in Generating Plausible Designs
Abstract: Diffusion-based generative models have huge potential in creating novel structural images in generative design where the user heavily values the design plausibility, e.g, no floating material or missing part. However, such models often require many denoising steps to achieve satisfactory plausibility, resulting in high computation costs; when using much fewer steps, we can not ensure plausibility. This paper addresses this trade-off and proposes an efficient training and inference method that can achieve the same or better plausibility than existing models while reducing the sampling time. We determine the noise schedule based on the evolution of pixel-value distributions in the forward diffusion process. Compared to previous models, e.g., DDPM and EDM, our method concentrates the noise schedule at a range of noise levels that highly influence the structural modeling and hereby achieves high efficiency in inference without compromising the visual quality or design plausibility. We apply this noise schedule to the EDM method on two structural data sets, BIKED and Seeing3DChairs. On BIKED images, for instance, our noise schedule significantly improves the quality of generated designs: the rate of plausible designs from 83.4% to 93.5%; FID from 7.84 to 4.87, compared to EDM.
URL: https://openreview.net/forum?id=rtNldOBA9N
---