OpenAI video generation AI "Sora" announced! Amazing quality [What can we do?] ]

 

 

OpenAI video generation AI "Sora" announced! Amazing quality [What can we do?] ]
https://www.youtube.com/watch?v=O24JLP45Mxo&t=894s

This time, OpenAI has announced the video generation AI "Sora", so let's look at a sample video. We will also review the technical report and see what we can do. The quality was shocking, and I was speechless from start to finish.


Sora
https://openai.com/sora

https://openai.com/research/video-generation-models-as-world-simulators

1)
We're teaching AI to understand and simulate the physical world in motion, with the goal of training models to help humans solve problems that require real-world interaction.

We present Sora, our text-to-video model. Sora can generate videos of up to one minute in length while maintaining visual quality and adherence to the user's prompt.

2)
Today, Sora becomes available to red teamers to assess critical areas for damage or risk. We are also giving access to some visual artists, designers and filmmakers to get feedback on how to evolve the model to be most helpful to creative professionals.

We're sharing the progress of our research early so that we can work with and get feedback from people outside of OpenAI and give the public a sense of what AI capabilities are on the horizon.

3)
Sora can create complex scenes with multiple characters, specific types of movement, and accurate details of the subject and background. The model understands what the user has asked for in the prompt and how those things exist in the physical world.

4)
The model deeply understands language, enabling it to interpret prompts accurately and create compelling characters that express vivid emotions. Sora can also produce multiple shots within a single generated video, accurately portraying characters and visual style.

5)
The current model has shortcomings. It can struggle to simulate the physics of a complex scene accurately and may not understand certain instances of cause and effect. For example, a person may bite into a cookie, but the cookie may not have a bite mark afterwards.

The model may also confuse spatial details of a prompt, for example, confusing left and right, and struggle with precise descriptions of events that take place over time, such as following a particular camera trajectory.

6)
Security
We're taking several safety steps before we make Sora available in OpenAI's products. We are working with red teamers - domain experts in misinformation, hateful content and bias - who will adversarially test the model.

For example, once in an OpenAI product, our text classifier will check and reject text prompts that violate our terms of use, such as those that request extreme violence, sexual content, hateful images, celebrity likenesses, or the IP of others. We've also developed robust idea classifiers that check the frames of every video generated to ensure it complies with our terms of use before it's shown to the user.

We're engaging with policymakers, educators and artists worldwide to understand their concerns and identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all the beneficial ways people will use our technology or how people will abuse it. Learning from real-world use is critical to developing and releasing increasingly safe AI systems over time.

7)
Research techniques
Sora is a diffusion model that generates a video by starting with what looks like static noise and gradually transforming it by removing it over many steps.

Sora builds on previous research in DALL-E and GPT models. It uses the recaptioning technique from DALL-E 3, which involves generating highly descriptive captions for the visual training data. This allows the model to follow the user's textual instructions in the generated video more faithfully.

In addition to generating a video from text instructions alone, the model can take an existing still image and create a video from it, animating its contents with accuracy and attention to small details. The model can also extend an existing video or fill in missing frames. Read more in our technical report.

Sora is a foundation for models that can understand and simulate the real world, a capability that we believe will be an essential milestone in achieving AGI.

 

 

 

 

OpenAI video generation AI "Sora" announced! Amazing quality [What can we do?] ]

https://www.youtube.com/watch?v=O24JLP45Mxo&t=894s

 

 

 

 

Creating a video from text
Sora is an AI model that can create realistic and imaginative scenes from text instructions.

https://openai.com/sora