Video models

Compare every video generation model available in Kyoso.

Pick a model in the agent input with @. Different models support different durations, aspect ratios, and frame attachment options. Video generations typically take 30–120 seconds.

Models

ModelBest forDurationsAspect ratiosFrames you can attachAudio
Kling O3Real-time, action-focused 4K video3–15s16:9, 9:16, 1:1Start, end, video refKeeps source audio
Veo 3.1 FastFast, cinematic video generation4 / 6 / 8s16:9, 9:16Start, endGenerates audio
Sora 2Realistic, detailed, long-form video4 / 8 / 12s16:9, 9:16Start
Kling V3 StandardPhysics-accurate 4K with camera control3–15s16:9, 9:16, 1:1StartGenerates audio
Kling O1Quick, vivid, action-focused synthesis5 / 10s16:9, 9:16, 1:1Start, end, video ref
Grok ImagineReal-time cinematic, movie-style physics3–15s7 ratios (widest range)Start, video ref
LTX 2 FastOpen-source 4K with synced audio for fast iteration4 / 8s16:9, 9:16Start

How to choose

  • Need start and end frames? Use Kling O3, Veo 3.1 Fast, or Kling O1.
  • Need synced audio? Use Veo 3.1 Fast, Kling V3 Standard, or LTX 2 Fast.
  • Need a video as a style reference? Use Kling O3, Kling O1, or Grok Imagine.
  • Need long clips? Sora 2 goes up to 12s; Kling O3, Kling V3 Standard, and Grok Imagine go up to 15s.
  • Need vertical, square, and landscape support? Grok Imagine has the widest aspect ratio range.

On this page