Sora is a text-to-video AI model developed by OpenAI that can generate videos up to a minute long based on user prompts.
Sora is a diffusion model, that generates videos from a starting point of static noise. While large language models (LLMs) have text tokens, Sora has visual patches, an effective representation for models of visual data. Patches are scalable and allow generative models to be trained on a range of video and image types. Sora is a generalist model for visual data, generating videos and images of diverse durations, aspect ratios, and resolutions, outputting up to one minute of high-definition video. It can generate entire videos at once, extend previously generated videos to make them longer, add missing frames to an existing video, and animate an existing still to generate a video.