Words in motion! US-based startup unleashes the secret of generating AI-powered videos from mere words

[ad_1]

The only thing dominating the headlines when it comes to generative AI right now is ChatGPT. Yet, there’s a lot more to generative AI than ChatGPT, such as language models. Text-to-image is already becoming a part of mainstream conversations, but brewing in the background is generative AI capable of converting text to videos.

What is text-to-video AI?
Simply put, you can generate AI-powered videos based on nothing but words. Key in the text, and the AI model will generate a video based on it. US-based startup Runway showcased its Gen-2 model, which can do that with a caveat or two.

Is this a ‘new’ thing?
Not really, as it is very much like Dall-E — developed by creators of ChatGPT — and works using generative AI language models. The results are captivating, and it could certainly catch the fancy of many across the world.

Is ‘Big Tech’ not involved in text-to-video?
They very much are. In September 2022, Meta showcased a rather obviously named tool Make-A-Video. With just a few words or lines of text, Make-A-Video creates videos using generative AI, but those videos didn’t have any sound. Here’s what Meta Inc CEO Mark Zuckerberg had said about it: “It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.” Just a week later, on cue, Google announced a similar model. Google’s generative AI model is called Imagen Video. “Given a text prompt, Imagen Video generates high-definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models,” Google had said. Google also showcased another model called Phenaki, which is aimed at creating long-form videos based on text inputs.

What are the challenges with text-to-video AI?
From operational to ethical, the challenges are far too many. Perhaps that’s one of the reasons why only demos of generative AI models working on text-to-video have emerged. For starters, generating a video with text might sound easy and fascinating, but imagine making a video with just words. One will have to be incredibly precise with the commands, or it could generate the video equivalent of gibberish. There are also ethical challenges. AI-generated videos could be the next weapon in the misinformation arsenal. Deepfakes could become an even bigger problem. Considering the fast-paced developments in the field of AI, it could be a matter of time before text-to-video get out of exploration mode and becomes mainstream.

[ad_2]

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *