Product attributes
Sora is a text-to-video AI model developed by OpenAI that can generate videos up to a minute long based on user prompts. Sora can generate scenes with multiple characters, specific types of motion, details for both the subject and background, and multiple shots within a single generated video with persistent characters and visual style. OpenAI announced Sora on February 15, 2024, making the model available to red teamers to assess potential areas of harm and risks. Upon the announcement, OpenAI also made the model available to visual artists, designers, and filmmakers to gain feedback on performance.
Sora is a diffusion model that generates videos from a starting point of static noise. While large language models (LLMs) have text tokens, Sora has visual patches, an effective representation for models of visual data. Patches are scalable and allow generative models to be trained on a range of video and image types. Sora is a generalist model for visual data, generating videos and images of diverse durations, aspect ratios, and resolutions, outputting up to one minute of high-definition video. It can generate entire videos at once, extend previously generated videos to make them longer, add missing frames to an existing video, and animate an existing still to generate a video.
Sora uses a transformer architecture similar to OpenAI's GPT models. At a high level, Sora turns videos into patches by compressing them into a lower-dimensional latent space and decomposing the representation into spacetime patches. The model builds on previous OpenAI research from the Dall-E and GPT models. In particular, using the recaptioning technique from Dall-E 3 that involves generative descriptive captions for visual training data.
OpenAI states Sora has a number of weaknesses, including accurately simulating physics for complex scenes and not understanding specific instances of cause and effect. Sora can also confuse spatial details of a prompt or struggle with descriptions of events that take place over time. Working with red teamers, OpenAI is testing Sora to identify areas such as misinformation, hateful content, and bias. The company states it is also building tools to detect misleading content as well as leveraging existing safety methods built for other products, such as Dall-E 3. OpenAI did not disclose information on the footage Sora was trained on, only stating that the training corpus contained publicly available videos and videos licensed from copyright owners.