Google DeepMind Launches Gemini Omni AI Model That Can Create and Edit Video From Any Input

Google DeepMind Launches Gemini Omni AI Model That Can Create and Edit Video From Any Input

Google’s new Gemini Omni Flash model lets users generate and edit video using text, images, audio, or existing footage — with built-in watermarking and a rollout across YouTube and the Gemini app.

Imagine describing a short film in plain English, uploading a handful of photos, and watching a coherent video appear — then tweaking it by simply typing “make the sky look like dusk” or “keep the same character but change the background to a beach.” That’s the promise behind Google DeepMind’s latest announcement: Gemini Omni, a new AI model family built to generate and edit video from virtually any combination of inputs.

For anyone who’s ever struggled with video editing software, or paid a freelancer to put together a 30-second promotional clip, this could change how everyday people and small businesses approach content creation. But it also raises some serious questions about what we can trust when we watch a video online.

What Gemini Omni Actually Does

Google DeepMind announced Gemini Omni as a model family designed to “create anything from any input — starting with video.” The first model released under this banner is Gemini Omni Flash, which accepts text, images, audio, and video as inputs and can produce video outputs from any combination of those.

What sets it apart from earlier tools is that it’s trained natively across all those modalities at once, rather than stitching together separate specialist systems. Before Omni, Google used distinct models for different tasks — Veo for video generation, Imagen for images, separate systems for audio. Omni is meant to bring all of that under one roof, allowing the model to reason across different types of media simultaneously.

One of the headline features is conversational video editing. Users can modify a video through natural language instructions — saying what they want changed — while the model keeps the physics, character continuity, and scene consistency intact. Google describes Omni as having an intuitive grasp of how the physical world works: gravity, fluid dynamics, how objects interact. Whether that holds up in practice across complex real-world footage is unclear, but it’s an ambitious claim.

Google also says Omni is grounded in factual knowledge — history, science, culture — aiming to bridge what it calls “photorealism and meaningful storytelling.”

Where You’ll Find It

Gemini Omni Flash is being rolled out across several Google products: the Gemini app, Google Flow (Google’s AI creative workflow tool), and YouTube Shorts. YouTube Create is also set to gain Omni-based features, and Google says these will be available at no additional cost to users of that app.

Broader access is tied to Google’s AI subscription tiers. In the US, those are priced at around $7.99 per month for AI Plus (roughly £6), $19.99 per month for AI Pro (roughly £15), and $249.99 per month for AI Ultra (roughly £190). UK pricing hasn’t been formally confirmed, and regional availability may vary — so don’t expect every feature to land here simultaneously.

Google has also indicated plans to extend Omni beyond video over time, with images and audio as future output modalities. The @GoogleDeepMind account described it as “our first step towards a model that can create anything from anything — starting with video,” signalling a longer-term roadmap rather than a finished product.

The Safety Question: Watermarks and Provenance

Given how convincing AI-generated video is becoming, Google has built two layers of transparency into Omni’s outputs.

Every video generated by Omni carries a SynthID watermark — an invisible signal embedded in the footage that’s designed to survive common edits like cropping and re-encoding. It won’t be visible to the naked eye, but it can be detected by compatible tools. Google says verification will be available through the Gemini app, with support being extended to Chrome and Google Search.

Alongside that, Omni outputs include C2PA Content Credentials — cryptographic metadata that travels with the file and signals that it was AI-generated. C2PA is a cross-industry standard backed by the Coalition for Content Provenance and Authenticity, and its adoption by Google aligns with wider efforts to create consistent signals for synthetic media.

Critics, however, point out that watermarking only helps if platforms and users actually check for it. Not every service will adopt these standards, and bad actors motivated to misuse the technology may find ways around them. The EU AI Act and UK government proposals on AI regulation both highlight the need for provenance tools — but the regulatory frameworks are still catching up with the technology.

Concerns From the Creative Sector

The UK’s creative industries contributed around £126 billion in gross value added to the economy in 2022, according to the Department for Digital, Culture, Media and Sport. A significant portion of that comes from exactly the kinds of roles — video editors, animators, VFX artists, production staff — that tools like Omni are designed to make more accessible to non-specialists.

That’s not a simple story. Easier tools can open up creative work to people who couldn’t previously afford professional production. But they also put pressure on the people who do that work for a living, and questions about training data, copyright, and the rights of content creators whose work may have been used to train these models remain unresolved and, in some cases, actively contested.

Demis Hassabis, chief executive of Google DeepMind, has spoken publicly about Omni and related models as steps towards more general-purpose AI. The company frames Omni as a “world model” — a system that doesn’t just generate images but understands how the world works. It’s a bold framing, and one that places Omni in the same broad conversation as OpenAI’s multimodal GPT-4o and similar systems from other companies racing to build AI that can perceive and reason across different types of information.

What This Means for Kent Residents

Kent residents who use YouTube, the Gemini app, or Google’s wider suite of products are likely to encounter Omni-powered features as they roll out in the UK — above all in YouTube Shorts and YouTube Create, where AI-assisted video tools are already being integrated. Small businesses and creators across the county could find it cheaper and faster to produce promotional or social media video without specialist skills, though UK pricing and availability haven’t yet been confirmed. There’s also a media literacy angle worth keeping in mind: as AI-generated video becomes more common and more convincing, knowing how to question whether footage is real becomes a genuinely useful skill for all of us — whether we’re watching local news clips, council communications, or content shared in community groups.

Source: @GoogleDeepMind

Google DeepMind Launches Gemini Omni AI Model That Can Create and Edit Video From Any Input Quiz

5 questions