Google Launches Veo 2 - What is it?

Google Veo 2 is one of those AI tools that just appeared on everyone’s feed—suddenly, everyone’s talking about it. But what is it, really? And why does it matter?

Let’s clear the air.

What is Google Veo 2?

Google—Google DeepMind, a subsidiary under Alphabet Inc. (the same parent brand as Google Search, YouTube, etc.)—has launched an AI model called Veo 2. And suddenly, a lot of people are talking about it. Now in this industry, every day there’s a new product launching into the market, just to test its survival. Everyone’s waiting to see: what will make people click?

Which product will be worth it? Which won’t? And because of this daily supply rush, most people don’t even notice every new tool that launches.

It’s like Instagram Reels—people create content on so many topics that demand doesn’t match the supply. Sure, it’s a feature, and yes, it’s nice to see. But do you really need to see every single reel? Probably not. Similarly, not every new product catches the eyes of the beholders.

Why Is Everyone Talking About It?

Well, the ones that do catch attention—they boom.

People love tools that help them solve their problems. And right now, everyone wants to be a content creator. We saw that with the whole Ghibli thing too—yes, it stirred up some controversy, but people used it anyway. Why? Because it made creating easier, and everyone loves to create.

Even you’re here, reading this, probably because you want to build or make something.

That’s where Veo comes in. It’s a model from DeepMind, and its integration with YouTube helps creators overcome a big hurdle—friction. Making high-quality content takes effort. Veo helps cut that down.

Youtube integrates Google Veo 2 = Less Friction

This isn’t the first YouTube tried it. It has already tested features like:

  • Green Screen (changing video backgrounds),

  • Cut (reusing clips from other videos),

  • Sound Library (providing free music tracks).

These were useful—but minor. Veo 2 is different. It lets users generate full-blown video content from just a prompt.

Now imagine this:

You type in a sentence.
You choose a vertical or widescreen format.
Boom—a polished video clip, ready to post.

That’s the real deal. It reduces production time and makes content creation accessible to people without fancy gear or editing skills.

And let’s be real—when others start using it, the rest of us feel that FOMO. We don’t want to be left behind in the creator economy.

This helps the producers gain traction and beholders get value in exchange.

How Does Veo 2 Actually Work?

Veo 2 is a generative video model developed by Google DeepMind. It turns short text or image prompts into high-quality, cinematic videos.

Right now, it supports just two aspect ratios:

  • 9:16 (for Shorts),

  • 16:9 (for standard YouTube content).

Still no 4:5 support, so not ideal for Instagram Posts—but good enough for the Google ecosystem.

Here’s the current input format:

PARAMETERSTYPEDESCRIPTION
promptString (e.g., “sunset over ocean”)Descriptive text guiding the video content.
imageBase64 String or URIOptional. An image to guide video generation. Recommended size: 1280x720 or 720x1280 pixels.
aspect_ratio“9:16” or “16:9“Defines video orientation. Options: "16:9" (landscape), "9:16" (portrait). Note: 9:16 isn’t supported by veo-3.0-generate-preview.
durationSecondsIntegerLength of the video in seconds. Acceptable values: 5 to 8. Default is 8.
sampleCountIntegerNumber of video variations to generate. Range: 1 to 4.
seedIntegerOptional. Sets a seed for deterministic outputs. Range: 0 to 4,294,967,295.
negativePromptStringOptional. Specifies elements to exclude from the video.
enhancePromptBooleanOptional. If true, uses Gemini to enhance the prompt. Default is true.
personGenerationStringControls generation of people/faces. Options: "allow_adult" (default), "dont_allow".
generateAudioBooleanRequired for veo-3.0-generate-preview. Determines if audio is generated.
storageURIStringOptional. Cloud Storage URI to save the output video. If not provided, the video is returned as base64-encoded bytes.

Sources: Google Cloud Vertex AI Documentation

Let’s peel the curtain and look at the architecture behind Google Veo 2. Here’s a simplified view of what happens when you hit “generate”:

This system is built on Google Cloud Vertex AI and powered by DeepMind’s multimodal video model. When you send a prompt (text/image), it flows through a pipeline:

  1. Frontend (Notebook/API) sends your inputs.

    Example: "sunset over calm ocean" or an image of a beach.

    Note: Keep it visual, cinematic

  2. Prompt Validator / Cleaner
    Note: Google’s system likely strips emojis, sanitizes strings, checks policy violations (e.g., hate, nudity).

  3. Gemini Enhancer

    Example: Your raw prompt "robot in city" becomes "a shiny humanoid robot walking through a neon-lit futuristic cityscape at night"

    Note: Gemini’s magic makes prompts more cinematic—optional but powerful.

  4. Veo API Layer

    Note: This is the versioned gateway like veo-3.0-generate-preview, which handles user inputs and forwards requests to the correct model.

  5. Billing + Quota Checker

    Note: If you don’t enable billing or exceed quotas, this step blocks inference. Treat this as a “soft wall.”

  6. Safety Filter / Moderation

    Note: Based on personGeneration, it may prevent generating faces, kids, NSFW scenes, etc.

  7. Veo 2 Core Model

    Note: The star of the show—turns prompt/image into 8-second video with optional audio.

  8. Async Job Manager

    Example: You ask for 3 samples. Backend queues 3 jobs, polls status, retries on fail.

    Note: Adds fault tolerance + scalability.

  9. Output Formatter

    Options: base64 for web return, OR gs://your-bucket/video.mp4 for persistent storage.

    Note: Ideal for automation pipelines and larger workflows.

In a gist:

  • Vertex AI handles API orchestration, including billing, auth, and Gemini prompt enhancement.

  • The Veo 2 model processes your prompt into a short video.

Pipelines feel similar to how LLM APIs operate—only here, the model returns high-res motion instead of text tokens.

Billing Pitfails

Well… I tried using it. But here’s what I found:

You need either:

  • a Gemini API key with enough quota, or

  • a Google Cloud project with billing enabled.

Now maybe it’s free after that setup—but I couldn’t explore fully. Why? Because of one major caveat.

Google Cloud billing is like a postpaid plan.
It doesn’t warn you when you’re entering paid territory.
You can rack up charges just by playing around.

So I backed off. Will check again with more control over billing later.

Who should use it?

Imagine a small YouTuber trying to post fresh content daily but lacking fancy editing tools or a team. With Veo 2, they just type a prompt like “morning coffee vibes with city skyline” and get a ready-to-post video—saving hours of work.

If you’ve ever delayed posting because editing takes forever, Veo 2 is your shortcut. Just prompt and post.

Or picture a social media manager handling multiple brands. Instead of shooting and editing clips for every platform, they generate quick videos using Veo’s vertical format for Stories—speeding up their workflow dramatically.

But remember the billing risk: if they’re not careful, experimenting with Veo 2 on Google Cloud could lead to surprise charges—kind of like getting an unexpected phone bill after trying a new app.

In my honest opinion, Veo 2 shows a lot of promise for the creator economy, but beware the billing pitfall if you’re trying it out. Treat it like a beta toy — fun to explore but watch your wallet. For now, it’s more of a tool for experimenters than the everyday user.

If Google refines pricing and adds more aspect ratios, this could be a real threat to Canva’s video tools and CapCut.

0
Subscribe to my newsletter

Read articles from Anish Srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anish Srivastava
Anish Srivastava

Currently pursuing advancements in the software development industry while actively contributing to the open-source community. Focused on developing web applications and refactoring codebases to improve efficiency and maintainability.