Interacting with the Video Outline

This guide discusses how to transform a text prompt into a detailed video outline and then convert that outline into a fully rendered video. It also discusses additional parameters for further customization and describes the structure of the outline sections.

Generating a Video Outline from a Prompt

The process begins with the /prompt-to-outline endpoint. This endpoint accepts a text prompt, which may be enhanced with website URLs and a target script word count, and returns a structured video outline. The outline is composed of one or more sections, each containing text content and metadata such as voice, title, or subtitle. This structure serves as the blueprint for the final video.

Example Request

Below is an example JSON payload for generating a video outline:

{
  "prompt": "What is MCP Protocol?",
  "websiteUrls": [
    "https://modelcontextprotocol.io/introduction",
    "https://www.anthropic.com/news/model-context-protocol"
  ],
  "targetScriptWordCount": 250
}

In this example, the prompt requests an explanation of the MCP Protocol. The API uses the provided URLs as additional context and attempts to generate an outline that approximates the target word count.

Converting the Outline to Video

After a video outline is generated, the next step involves using the /outline-to-video endpoint. This endpoint converts the structured outline into a video by utilizing the details specified within the outline and additional parameters that control the video's style, dimensions, and audio.

Example Request with Advanced Optional Parameters

In addition to the basic parameters, the outline can include several optional parameters to further customize the video generation. The following example incorporates additional customization options:

{
  "outline": {
    "sections": [
      {
        "text": "The Model Context Protocol (MCP) is revolutionizing how artificial intelligence interacts with various data sources.",
        "voice": "Matilda",
        "title": "Introduction to MCP",
        "subtitle": "Bridging AI and Data",
        "overlayType": "TITLE_SCREEN"
      },
      {
        "text": "MCP empowers applications to provide dynamic context to large language models by integrating multiple data sources seamlessly.",
        "voice": "Matilda"
      },
      {
        "text": "Its open-source design fosters innovation, inviting developers to contribute and create more intelligent, context-aware systems.",
        "voice": "Matilda"
      }
    ],
    "useGetty": true,
    "useGenerativeImage": false,
    "imageGenStyle": "",
    "musicUrl": "https://example.com/audio/background-track.mp3",
    "musicVolume": 0.75,
    "captionDetails": {
      "captionFontName": "Verdana",
      "captionFontWeight": 700,
      "captionFontSize": 75,
      "captionTextColor": {
        "red": 255,
        "green": 255,
        "blue": 255
      },
      "captionTextJustification": "CENTER",
      "captionVerticalAlignment": "BOTTOM",
      "captionStrokeColor": {
        "red": 0,
        "green": 0,
        "blue": 0
      },
      "captionStrokeWeight": 2,
      "captionBackgroundStyleType": "WRAPPED",
      "captionBackgroundColor": {
        "red": 0,
        "green": 0,
        "blue": 0
      },
      "captionBackgroundBorderRadius": 1,
      "captionBackgroundOpacity": 0.5,
      "captionIsHidden": false
    }
  },
  "aspectRatio": {
      width: 16,
      height: 9
  },
  "minDimensionPixels": 1080,
  "webhookUrl": "https://your-webhook-url.com/endpoint"
}

Explanation of the Optional Parameters

useGetty:
A boolean flag that instructs the API to use Getty Images as a source for backgrounds.

useGenerativeImage:
When set to true, the API generates images using a generative image model.

imageGenStyle:
This parameter accepts a URL that points to a style template or a reference image. The URL guides the generative image model to produce visuals that match the specified style, thereby achieving a customized artistic look.

musicUrl and musicVolume:
These parameters control the background audio for the video. The musicUrl should point to the desired audio file, while musicVolume sets the playback volume.

captionDetails:
This object allows precise customization of the appearance of text overlays (captions) in the video:

captionFontName, captionFontWeight, and captionFontSize determine the typeface, weight, and size for captions.

captionTextColor defines the text color using RGB values.

captionTextJustification sets the horizontal alignment (for example, "CENTER", "LEFT", or "RIGHT").

captionVerticalAlignment specifies whether the caption should be aligned at the top, middle, or bottom (in this example, "BOTTOM").

captionStrokeColor and captionStrokeWeight control the outline (stroke) color and thickness around the text.

captionBackgroundStyleType indicates the style of the background behind the caption (for example, "SOLID").

captionBackgroundColor sets the background color behind the caption text.

captionBackgroundBorderRadius defines the degree of rounding for the background corners.

captionBackgroundOpacity controls the transparency level of the background.

captionIsHidden is a flag that, when set to true, causes the captions to be hidden entirely.

Understanding the Section Structure

The video outline is composed of one or more sections that determine how the content is segmented and presented. Each section may include fields such as:

text: The script for the section.

voice: An optional field specifying the voice-over to be used during the section.

title and subtitle: Headers that emphasize key points or provide additional context.

overlayType: A specifier that determines the type of visual overlay (for example, title screens or captions).

Flexible Use of Sections

If a complete script is already available, there is no requirement to create multiple sections. The entire script may be packaged into a single section and passed to the /outline-to-video endpoint.

Below is an example JSON payload that utilizes a single section:

{
  "outline": {
    "sections": [
      {
        "text": "Welcome to an in-depth exploration of the Model Context Protocol (MCP). This video will provide a comprehensive overview of how MCP is transforming the manner in which artificial intelligence integrates with multiple data sources. The discussion will cover the protocol's architecture, benefits, and practical applications in modern systems.",
        "voice": "Matilda"
      }
    ]
  },
}

In this scenario, the complete script is contained within the text field of a single section. This method simplifies the outline structure when multiple segments are unnecessary.

Conclusion

By utilizing the /prompt-to-outline and /outline-to-video endpoints, it is possible to seamlessly convert a text prompt into a polished video. The inclusion of optional parameters allows for extensive customization. Moreover, understanding the section structure provides flexibility; whether multiple segments are required for diverse content or a single section is sufficient for a pre-prepared script, these endpoints offer a versatile and powerful video generation workflow.

Interacting with the Video Outline

Interacting with the Video Outline#

Generating a Video Outline from a Prompt#

Example Request#

Converting the Outline to Video#

Example Request with Advanced Optional Parameters#

Explanation of the Optional Parameters#

Understanding the Section Structure#

Flexible Use of Sections#

Conclusion#

Interacting with the Video Outline

Generating a Video Outline from a Prompt

Example Request

Converting the Outline to Video

Example Request with Advanced Optional Parameters

Explanation of the Optional Parameters

Understanding the Section Structure

Flexible Use of Sections

Conclusion