YZ konusunda en son teknoloji uygulamalar

0 views

Skip to first unread message

Celalettin Penbe

unread,

May 18, 2024, 3:52:59 PMMay 18

to yz-ve...@googlegroups.com

Merhaba,

Google son zamanlarda YZ ürünlerinde epey bir atak yaptı, aşağıda çoğunluğu Google'a ait birbirinden ilginç YZ uygulamalarını bulabilirsiniz:

1. Neural scene representation and rendering (Nöral sahne temsili ve taraması): https://deepmind.google/discover/blog/neural-scene-representation-and-rendering/

In this work, published in Science (Open Access version), we introduce the Generative Query Network (GQN), a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes. Much like infants and animals, the GQN learns by trying to make sense of its observations of the world around it. In doing so, the GQN learns about plausible scenes and their geometrical properties, without any human labelling of the contents of scenes.

2. Phenaki:Realistic video generation from open-domain textual descriptions (Açık alan metin açıklamalarından gerçekçi video oluşturma):

https://sites.research.google/phenaki/

We present Phenaki, a model that can synthesize realistic videos from textual prompt sequences.
Generating videos from text is particularly challenging due to various factors, such as high computational cost, variable video lengths, and limited availability of high quality text-video data.

3. Photorealistic Video Generation with Diffusion Models (Difüzyon Modelleriyle Fotogerçekçi Video Üretimi): https://walt-video-diffusion.github.io/

We present W.A.L.T, a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach has two key design decisions. First, we use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities. Second, for memory and training efficiency, we use a window attention architecture tailored for joint spatial and spatiotemporal generative modeling. Taken together these design decisions enable us to achieve state-of-the-art performance on established video (UCF-101 and Kinetics-600) and image (ImageNet) generation benchmarks without using classifier free guidance. Finally, we also train a cascade of three models for the task of text-to-video generation consisting of a base latent video diffusion model, and two video super-resolution diffusion models to generate videos of 512 x 896 resolution at 8 frames per second.

4. VideoPoet: A large language model for zero-shot video generation (Sıfır çekimli video üretimi için büyük bir dil modeli): https://sites.research.google/videopoet/

VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator.

5. LUMIERE: A Space-Time Diffusion Model for Video Generation (Video Üretimi için Uzay-Zaman Yayılma Modeli): https://lumiere-video.github.io/

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.

6. Lyria: Transforming the future of music creation (Müzik bestelemenin geleceğini dönüştürmek): https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/

Announcing our most advanced music generation model, and two new AI experiments designed to open a new playground for creativity

7. SynthID: Identifying AI-generated content with SynthID (Yapay zeka tarafından oluşturulan içeriği SynthID ile tanımlama): https://deepmind.google/technologies/synthid/

Our SynthID toolkit watermarks and identifies AI-generated content. These tools embed digital watermarks directly into AI-generated images, audio, text or video. In each modality, SynthID’s watermarking technique is imperceptible to humans but detectable for identification.

Diğer en yeni teknolojileri Google Labs ve Google DeepMind'da bulabilirsiniz:

Project Astra: A universal AI agent that is helpful in everyday life
Veo: Our most capable generative video model
Imagen 3: Our highest quality text-to-image model

Google DeepMind brings together two of the world’s leading AI labs — Google Brain and DeepMind — into a single, focused team led by our CEO Demis Hassabis. Over the last decade, the two teams were responsible for some of the biggest research breakthroughs in AI, many of which underpin the flourishing AI industry we see today.

8. GPT-4o: OpenAI'ın duyurduğu en son YZ uygulaması: https://openai.com/index/hello-gpt-4o/

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

Tanıtımını Barış Özcan'dan dinleyelim: https://www.youtube.com/watch?v=xuGfzIGl0ZE