Merhaba,
Google son zamanlarda YZ ürünlerinde epey bir atak yaptı, aşağıda çoğunluğu Google'a ait birbirinden ilginç YZ uygulamalarını bulabilirsiniz:
In this work, published in Science (Open Access version), we introduce the Generative Query Network (GQN), a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes. Much like infants and animals, the GQN learns by trying to make sense of its observations of the world around it. In doing so, the GQN learns about plausible scenes and their geometrical properties, without any human labelling of the contents of scenes.
2. Phenaki:Realistic video generation from open-domain textual descriptions (Açık alan metin açıklamalarından gerçekçi video oluşturma):
We present Phenaki, a model that can synthesize realistic videos from textual prompt sequences.
Generating videos from text is particularly challenging due to various factors, such as high computational cost, variable video lengths, and limited availability of high quality text-video data.
We present W.A.L.T, a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach has two key design decisions. First, we use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities. Second, for memory and training efficiency, we use a window attention architecture tailored for joint spatial and spatiotemporal generative modeling. Taken together these design decisions enable us to achieve state-of-the-art performance on established video (UCF-101 and Kinetics-600) and image (ImageNet) generation benchmarks without using classifier free guidance. Finally, we also train a cascade of three models for the task of text-to-video generation consisting of a base latent video diffusion model, and two video super-resolution diffusion models to generate videos of 512 x 896 resolution at 8 frames per second.
VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator.
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Announcing our most advanced music generation model, and two new AI experiments designed to open a new playground for creativity
Our SynthID toolkit watermarks and identifies AI-generated content. These tools embed digital watermarks directly into AI-generated images, audio, text or video. In each modality, SynthID’s watermarking technique is imperceptible to humans but detectable for identification.
Project Astra: A universal AI agent that is helpful in everyday life
Veo: Our most capable generative video model
Imagen 3: Our highest quality text-to-image model
Google DeepMind brings together two of the world’s leading AI labs — Google Brain and DeepMind — into a single, focused team led by our CEO Demis Hassabis. Over the last decade, the two teams were responsible for some of the biggest research breakthroughs in AI, many of which underpin the flourishing AI industry we see today.
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.
selamlar..