Google Gemini Omni Brings Multimodal AI Video Creation to Life

Covertly AI
9 hours ago
3 min read

Google is taking another major step in generative AI with the launch of Gemini Omni, a new family of AI models designed to “create anything from any input.” The first model in the lineup, Gemini Omni Flash, focuses on video creation. It can take a mix of text, images, audio, and video, then turn those inputs into a polished video that feels connected and consistent. Instead of simply combining different media together, Omni is built to reason across all of them, using Gemini’s broader understanding of the world to create videos with stronger realism, context, and storytelling.

Gemini Omni builds on Google’s earlier work with multimodal AI. Google says Gemini was designed from the beginning to understand different types of information, including text, code, audio, images, and video. Last year, Google’s Nano Banana image model brought Gemini’s intelligence into image generation and editing, helping users restore old photos, design from sketches, and visualize ideas. Now, Omni takes that same idea further by moving into video, where users can create and edit scenes through simple conversation instead of complicated software.

One of Omni’s biggest features is natural language video editing. Users can ask the model to change a specific part of a clip, transform the whole scene, add new objects or characters, or continue refining the video over multiple instructions. Google says the model can keep characters consistent, remember the scene’s previous details, and better understand physics like gravity, motion, fluid movement, and energy. For example, Google showed prompts that could turn a sculpture into bubbles, make a mirror ripple like liquid, or create a claymation-style explainer about protein folding.

The model is also meant to connect creativity with real-world knowledge. Google says Gemini Omni can use its understanding of science, history, culture, and physical movement to create videos that are not only realistic but meaningful. This could make it useful for educational explainers, creative storytelling, advertising, filmmaking, and social media content. Omni Flash can currently create videos up to 10 seconds long, although Google said this is a product decision rather than a hard model limit. Longer videos are expected in the future, along with more advanced professional uses.

Google is also introducing avatar creation through Omni, allowing users to make videos that look and sound like them. To reduce the risk of deepfakes, users will need to go through a setup process, including recording themselves and speaking a series of numbers. Google says all videos made with Omni will also include its SynthID digital watermark, which helps people verify that the content was generated with Gemini. The company says this is part of its wider effort to make AI-generated content more transparent and safer to identify online.

For now, Gemini Omni Flash is being positioned mainly as a consumer-friendly tool. It is rolling out through the Gemini app, Google Flow, and YouTube Shorts, with free access starting on YouTube Shorts and the YouTube Create app. Google AI Plus, Pro, and Ultra subscribers will also get access globally. Developers and enterprise customers are expected to get API access in the coming weeks. Although Omni starts with video, Google’s longer-term goal is much bigger: a model that can eventually create images, audio, video, and more from almost any type of input.

Works Cited

Bellan, Rebecca. “Google’s Gemini Omni Turns Images, Audio, and Text into Video: And That’s Just the Start.” TechCrunch, 19 May 2026, techcrunch.com/2026/05/19/googles-gemini-omni-turns-images-audio-and-text-into-video-and-thats-just-the-start/.

Kavukcuoglu, Koray. “Introducing Gemini Omni.” Google Blog, 19 May 2026, blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/.

Peters, Jay. “Gemini Omni Is a New Family of AI Models Meant to ‘Create Anything.’” The Verge, 19 May 2026, www.theverge.com/tech/933552/google-gemini-ai-omni-flash-media-video-io-2026.

“Gemini Omni.” YouTube, 2026, i.ytimg.com/vi/2m5BCWB02jY/hq720.jpg.

“Screenshot of Google Gemini Omni.” CNET, 19 May 2026, www.cnet.com/a/img/resize/942cb2975d371aa50ac700e7deb35abf5b6aeb3f/hub/2026/05/19/67780fc6-f94e-4df4-a94e-2fa7b2421516/screenshot-2026-05-19-at-11-15-01am.png.

Your daily source for the latest breakthroughs, trends, and headlines in artificial intelligence - all in one place.

Google Gemini Omni Brings Multimodal AI Video Creation to Life

Recent Posts

Comments