Introduction
The VideoGen API turns a script, voiceover, or slideshow into a finished video with visuals, narration, and captions. Start a workflow with a single API call, then poll or subscribe to webhooks for the result. The same API also exposes standalone tools for generating individual images, video clips, voiceovers, and more.
Base URL
What you can build
The API has two layers: full video workflows that produce a finished video, and standalone tools that generate a single asset.
Workflows
Workflows are the core of the API. Each one creates a VideoGen project and runs the full generation pipeline (visuals, narration, captions) from a single input. Start a workflow with one POST /v1/workflows/* call, then poll the run or wait for a webhook. When it finishes you get a projectUrl and can export an MP4 through the Projects API.
Available workflows: script to video, voiceover to video, and slideshow to video.
See the Workflows guide for inputs, options, and the run lifecycle.
Tools
Tools generate one asset at a time, with no project or pipeline. Each POST /v1/tools/* call is asynchronous and returns a toolExecutionId to poll. Available tools:
- Generate images from text or an existing image.
- Generate video clips from text, an image, or a video.
- Convert text to speech with 100+ voices.
- Generate sound effects and music from a prompt.
- Create avatar videos with a presenter.
- Upscale images and video, remove image and video backgrounds, vectorize images, and add 3D motion to a still image.
See the REST API reference for every tool, its request fields, and response shape.
Conventions
Timestamps
Every numeric timestamp field in the API (expiresAt, occurredAt, createdAt, and any future additions) is an integer representing seconds since the Unix epoch (UTC, no milliseconds). For example 1745409600 corresponds to 2025-04-23T12:00:00Z.