🤗 Daily Paper(2024-01-18)

108 views

Skip to first unread message

deep.di...@gmail.com

unread,

Jan 18, 2024, 3:13:59 PM1/18/24

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Asynchronous Local-SGD Training for Language Modeling

Published at 2024-01-18

#Natural Language Processing #Optimization and Learning Algorithms

This paper presents an empirical study on the performance of asynchronous Local-SGD for training language models. The results show that asynchronous Local-SGD takes more iterations to converge than synchronous Local-SGD but proposes a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. The proposed method matches the perplexity per update step of synchronous Local-SGD and significantly reduces the wall cloc...

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Published at 2024-01-18

#Deep Learning #Computer Vision

This paper presents a conditional diffusion model called Compose and Conquer (CnC) that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. CnC uses depth disentanglement training and soft guidance to localize multiple conditions in a disentangled manner, allowing perception of objects at varying depths and versatile composition of localized objects with different global semantics....

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Published at 2024-01-18

#Deep Learning #Natural Language Processing

This paper introduces DeepSpeed-FastGen, a high-throughput text generation system for LLMs using Dynamic SplitFuse, a novel prompt and generation composition strategy. It leverages DeepSpeed-MII and DeepSpeed-Inference to provide an efficient serving system for LLMs with non-persistent and persistent deployment options. This system delivers up to 2.3x higher throughput, 2x lower latency, and up to 3.7x lower tail latency compared to state-of-the-art systems. It offers a detailed benchmarking met...

GARField: Group Anything with Radiance Fields

Published at 2024-01-18

#Computer Vision

This paper presents GARField, an approach for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. GARField embraces group ambiguity through physical scale by optimizing a scale-conditioned 3D affinity feature field. The field can derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField effectively extracts groups at different levels and represents multi-view consistent groupings with higher fidelity than ...

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

Published at 2024-01-18

#Unsupervised Learning #Deep Learning #Optimization and Learning Algorithms #Computer Vision

The paper presents ICON, an optimization procedure for training Neural Radiance Fields (NeRF) from 2D video frames without the requirement for pose initialization. ICON estimates initial guesses for poses based on smooth camera motion and introduces confidence, an adaptive measure of model quality used to reweight gradients. ICON achieves superior performance in CO3D and HO3D datasets versus methods which use SfM pose....

ReFT: Reasoning with Reinforced Fine-Tuning

Published at 2024-01-18

#Supervised Learning #Reinforcement Learning #Natural Language Processing

This paper proposes a technique called ReFT (Reinforced Fine-Tuning) for enhancing the reasoning capabilities of Large Language Models (LLMs). ReFT improves upon the typical Supervised Fine-Tuning (SFT) approach by employing reinforcement learning to learn from multiple annotated reasoning paths instead of relying solely on given CoT data. ReFT significantly outperforms SFT and further improvements can be made using inference-time strategies. The technique is effective in improving the generaliz...

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Published at 2024-01-18

#Computer Vision #Natural Language Processing #Deep Learning #Explainable AI and Interpretability

This paper presents SceneVerse, the first million-scale 3D vision-language dataset, and introduces Grounded Pre-training for Scenes (GPS), a unified pre-training framework for 3D vision-language learning. The dataset and framework address major challenges in 3D vision-language grounding by tackling complex 3D scenes, scarcity of data, and the need for a unified learning framework. GPS outperforms existing benchmarks and demonstrates potential for zero-shot transfer in 3D vision-language tasks. P...

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Published at 2024-01-18

#Deep Learning #Generative Models #Diffusion Models #Image Generation #Natural Language Processing #Computer Vision

SiT is a family of generative models based on Diffusion Transformers that allows for flexible design choices in generative models built on dynamical transport. By carefully introducing various ingredients, SiT surpasses DiT on the conditional ImageNet 256x256 benchmark with an FID-50K score of 2.06....

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Published at 2024-01-18

#Computer Vision #Deep Learning

TextureDreamer is a novel image-guided texture synthesis method that can transfer high-detailed textures from a small number of input images to target 3D shapes across different categories. This method takes inspiration from recent advancements in diffusion models and significantly improves texture quality. Experiments show that TextureDreamer produces highly realistic, semantically meaningful textures, surpassing the visual quality of previous state-of-the-art methods....

UniVG: Towards UNIfied-modal Video Generation

Published at 2024-01-18

#Computer Vision #Deep Learning #Natural Language Processing

Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either indivi...

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Published at 2024-01-18

#Deep Learning #Computer Vision

This paper proposes VideoCrafter2, a method to overcome the limitations of high-quality video diffusion models by leveraging low-quality videos and synthesized high-quality images. The authors analyze the connection between spatial and temporal modules in video models and the distribution shift to low-quality videos, and observe that full training of all modules results in a stronger coupling. They then finetune spatial modules with high-quality images to shift the distribution to higher quality...

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Published at 2024-01-18

#Computer Vision #Deep Learning

This paper proposes a new vision backbone, Vision Mamba (Vim), for efficient visual representation learning using bidirectional state space models (SSMs) with position embeddings. Vim is shown to achieve higher performance compared to existing vision transformers like DeiT, while using less computational resources and memory. This makes Vim a potential next-generation backbone for foundation vision models, and code is available at: https://github.com/hustvl/Vim....

Tags are generated by Google's Gemini Pro API, and the summary is generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages