🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
 |
Asynchronous Local-SGD Training for Language Modeling |
Published at 2024-01-18 |
|
#Natural Language Processing
#Optimization and Learning Algorithms
|
This paper presents an empirical study on the performance of asynchronous Local-SGD for training language models. The results show that asynchronous Local-SGD takes more iterations to converge than synchronous Local-SGD but proposes a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. The proposed method matches the perplexity per update step of synchronous Local-SGD and significantly reduces the wall cloc... |
Read More |
|
|
|
 |
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis |
Published at 2024-01-18 |
|
#Deep Learning
#Computer Vision
|
This paper presents a conditional diffusion model called Compose and Conquer (CnC) that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. CnC uses depth disentanglement training and soft guidance to localize multiple conditions in a disentangled manner, allowing perception of objects at varying depths and versatile composition of localized objects with different global semantics.... |
Read More |
|
|
|
|
 |
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference |
Published at 2024-01-18 |
|
#Deep Learning
#Natural Language Processing
|
This paper introduces DeepSpeed-FastGen, a high-throughput text generation system for LLMs using Dynamic SplitFuse, a novel prompt and generation composition strategy. It leverages DeepSpeed-MII and DeepSpeed-Inference to provide an efficient serving system for LLMs with non-persistent and persistent deployment options. This system delivers up to 2.3x higher throughput, 2x lower latency, and up to 3.7x lower tail latency compared to state-of-the-art systems. It offers a detailed benchmarking met... |
Read More |
|
|
|
 |
GARField: Group Anything with Radiance Fields |
Published at 2024-01-18 |
|
#Computer Vision
|
This paper presents GARField, an approach for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. GARField embraces group ambiguity through physical scale by optimizing a scale-conditioned 3D affinity feature field. The field can derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField effectively extracts groups at different levels and represents multi-view consistent groupings with higher fidelity than ... |
Read More |
|
|
|
|
 |
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization |
Published at 2024-01-18 |
|
#Unsupervised Learning
#Deep Learning
#Optimization and Learning Algorithms
#Computer Vision
|
The paper presents ICON, an optimization procedure for training Neural Radiance Fields (NeRF) from 2D video frames without the requirement for pose initialization. ICON estimates initial guesses for poses based on smooth camera motion and introduces confidence, an adaptive measure of model quality used to reweight gradients. ICON achieves superior performance in CO3D and HO3D datasets versus methods which use SfM pose.... |
Read More |
|
|
|
 |
ReFT: Reasoning with Reinforced Fine-Tuning |
Published at 2024-01-18 |
|
#Supervised Learning
#Reinforcement Learning
#Natural Language Processing
|
This paper proposes a technique called ReFT (Reinforced Fine-Tuning) for enhancing the reasoning capabilities of Large Language Models (LLMs). ReFT improves upon the typical Supervised Fine-Tuning (SFT) approach by employing reinforcement learning to learn from multiple annotated reasoning paths instead of relying solely on given CoT data. ReFT significantly outperforms SFT and further improvements can be made using inference-time strategies. The technique is effective in improving the generaliz... |
Read More |
|
|
|
|
 |
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding |
Published at 2024-01-18 |
|
#Computer Vision
#Natural Language Processing
#Deep Learning
#Explainable AI and Interpretability
|
This paper presents SceneVerse, the first million-scale 3D vision-language dataset, and introduces Grounded Pre-training for Scenes (GPS), a unified pre-training framework for 3D vision-language learning. The dataset and framework address major challenges in 3D vision-language grounding by tackling complex 3D scenes, scarcity of data, and the need for a unified learning framework. GPS outperforms existing benchmarks and demonstrates potential for zero-shot transfer in 3D vision-language tasks. P... |
Read More |
|
|
|
|
 |
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion |
Published at 2024-01-18 |
|
#Computer Vision
#Deep Learning
|
TextureDreamer is a novel image-guided texture synthesis method that can transfer high-detailed textures from a small number of input images to target 3D shapes across different categories. This method takes inspiration from recent advancements in diffusion models and significantly improves texture quality. Experiments show that TextureDreamer produces highly realistic, semantically meaningful textures, surpassing the visual quality of previous state-of-the-art methods.... |
Read More |
|
|
|
 |
UniVG: Towards UNIfied-modal Video Generation |
Published at 2024-01-18 |
|
#Computer Vision
#Deep Learning
#Natural Language Processing
|
Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either indivi... |
Read More |
|
|
|
|
 |
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models |
Published at 2024-01-18 |
|
#Deep Learning
#Computer Vision
|
This paper proposes VideoCrafter2, a method to overcome the limitations of high-quality video diffusion models by leveraging low-quality videos and synthesized high-quality images. The authors analyze the connection between spatial and temporal modules in video models and the distribution shift to low-quality videos, and observe that full training of all modules results in a stronger coupling. They then finetune spatial modules with high-quality images to shift the distribution to higher quality... |
Read More |
|
|
|
 |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model |
Published at 2024-01-18 |
|
#Computer Vision
#Deep Learning
|
This paper proposes a new vision backbone, Vision Mamba (Vim), for efficient visual representation learning using bidirectional state space models (SSMs) with position embeddings. Vim is shown to achieve higher performance compared to existing vision transformers like DeiT, while using less computational resources and memory. This makes Vim a potential next-generation backbone for foundation vision models, and code is available at: https://github.com/hustvl/Vim.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary is generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM. |
Visit Developer's Social Media |
|
|
|
|
|
|