OpenAI's Sora: The text-to-video model
Introduction
In this article, I'll introduce Sora a text-to-video model developed by OpenAI. I will explore its capabilities, limitations, safeguard, provide examples to illustrate its functionality and share personal insights.
What is Sora
Sora is an OpenAI text-to-video model that can produce one minute of high quality video from text instructions(prompts). Sora is built by OpenAI in San Francisco, California published February 15, 2024 but not available to the public even at the moment of writing this article. Sora findings suggest that a viable approach to creating universal simulators of the real world is to scale video generating models. Sora has visual patches since patches have historically been able to express models for visual data, much like LLM with tokens.
Examples of Sora
All videos on Sora website were generated directly by Sora without modification.
Here are few examples of videos generated from text instructions by Sora.
Example 1
Prompt (text-instruction): Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
Output
Click the image watch the generated video from the prompt:
Example 2
Prompt: A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
Output
Click the image watch the generated video from the prompt:
At first glance it looks real without been told these are AI generated videos through text instructions (prompts). Watch more videos generated from text-instructions here
Capabilities of Sora
The following are the capabilities of Sora (text-to-video model):
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.
Sora can generate compelling characters that express vibrant emotions from its deep understanding of language.
Sora can generate videos from text instructions (prompts).
Sora can animate images generated from DALL*·*E 3 model.
For example:
prompt: A Shiba Inu dog wearing a beret and black turtleneck.
- Sora can extend or fill in the missing part of an existing video. This method can be used to extend a video both forward and backward to produce a seamless infinite loop.
- Sora can edit existing images and videos from text prompts. This technique enables Sora to transform the styles and environments of input videos zero-shot.
Input video:
Prompt: make it go underwater
Output Video:
- Sora can be used to connect videos by inserting video(s) into another video at a certain point. interpolate different videos into one
Video 1:
Video 2:
Output video of video 1 and video 2 connected together into one video: If not told you won't really know the transition was achieved using Sora.
- Sora is capable of generating images
For example: A snowy mountain village with cozy cabins and a northern lights display, high detail and photorealistic dslr, 50mm f/1.2
Weaknesses
The current Sora model has weaknesses. It might have challenge accurately representing the physics of a complex scene and might not be able to comprehend certain instances of cause and effect. For example:
Prompt: Basketball through hoop then explodes.
Output:
Weakness in Output: An example of inaccurate physical modeling and unnatural object “morphing.”
After hitting the net the basketball reappears and runs through as though it's not an object.
There are some lapses with the current model which will lead to further research and development to make it better.
Advantages of Sora
OpenAI Sora tool will transform how jobs are managed. In my opinion it'll make some jobs redundant but will also create more jobs such as:
Prompt engineers for video contents.
Enable digital marketers to easily come up with video adverts for businesses.
Teachers, instructors and tutors can generate video from prompts to explain a scenario.
Videographers, animators, content creators can try out their thoughts on how to plan a video content or edit an existing video within a short time using text instructions.
More videos can be generated in less time with high quality using Sora. This is an advantage for content creators who spend hours trying to come up with a 30seconds advert.
Safety
As at the time of publishing this article, OpenAI is working with red teamers, domain experts to assess critical areas for harms or risks. OpenAI Sora is leveraging on existing safe methods built for Dall-E 3 and looking forward to adding new ones. The OpenAI team is doing a lot behind the scenes to ensure safety such as text classifier checking and rejecting harmful text input, detecting misleading contents and the ability to know when a video was generated by Sora.
Personal Opinions
On safety, I feel this should be available based on the readiness to comply with safety policies by the public when it is released. The kind of text instructions that can be given to Sora even though there could be algorithm to check against impersonation and the other side effects.
For example, imagine a text instruction to Sora generating a political propaganda with the identity of persons in reality or contents that can lead to chaos. Not everyone will believe or get to know it's AI generated. This could be one of the limitations of Sora because this potential could be harmful to humanity.
On the ends of it impacts on jobs, more jobs will created even though it will reshape or make many jobs redundant. It's a tool and the best way to respond is to learn how to take advantage of the capabilities.
Resources for Further Research
Conclusion
Sora model from OpenAI is another tool just like ChatGPT model but it creates high quality video from text instructions (prompts). There'll be impact (negative and positive) on different careers and jobs due to the capabilities of OpenAI Sora. Learning how to use the Sora model is one of the many ways to be better placed for the specializations and many jobs it'll create. For example, a video animator spending time using scripts to make animation can leverage on writing effective prompts to get high quality video within a short time and lots more.
Let me know what you think about Sora in the comment section.
Find this helpful or resourceful? kindly share with others and feel free to use the comment section for questions, answers, and contributions.
Subscribe to my newsletter
Read articles from Alemoh Rapheal Baja directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Alemoh Rapheal Baja
Alemoh Rapheal Baja
I’m a Software Engineer with over 5+ years of experience in Technology with a track record in building web applications, mobile applications and Technical Writing. Readily available to explore innovations in ICT and creatively use them to build solutions. Advancing my career in software engineering and contributing to the growth of tech communities around the world. I’m also passionate about working with organizations especially tech, security experts and airline industries to learn and also make contributions that will be a legacy. Tech is Easy to Learn - https://amazon.com/dp/B0B8T838K