Meta, a major player in social media, is advancing its technological frontiers with the development of innovative artificial intelligence (AI) tools tailored for its Facebook and Instagram platforms.
On November 16, Meta announced through a blog post the introduction of new AI models designed for enhancing content creation and editing.
The company has unveiled two groundbreaking AI generative models. The first, known as Emu Video, builds upon Meta’s existing Emu model. It boasts the capability to create video clips using both text and image inputs. The second model, named Emu Edit, concentrates on refining image editing techniques, offering heightened precision.
Currently, these models are in the experimental phase. However, Meta has highlighted their promising applications, particularly for content creators, artists, and animators, citing the initial results as indicative of their potential.
According to Meta’s blog post, the Emu Video was trained with a “factorized” approach, dividing the training process into two steps to allow the model to be responsive to different inputs:
“This process begins with the generation of images based on a text prompt, followed by video creation that relies on both the text prompt and the initially generated image. According to Meta, this two-pronged approach enhances the efficiency of training models for video generation.”
Meta has developed a unique approach to animating images using text prompts. Their method, known as Emu Video, stands out by employing only two diffusion models to create short videos. These videos, spanning four seconds and rendered at 512×512 resolution, achieve a fluid 16 frames per second.
In a separate advancement, Meta has introduced Emu Edit for image editing. This tool is designed to enhance images by adding or removing backgrounds, altering colors and shapes, and making both local and global changes to the image. The focus is not just on creating realistic images but on precise pixel modification in accordance with specific editing requests.
Meta emphasizes precision in its editing process. For example, if a user wants to add the phrase ‘Aloha!’ to a baseball cap in an image, the tool ensures that the cap’s original appearance is maintained while incorporating the new text seamlessly. This level of detail in editing sets Meta’s tool apart, focusing on fine-tuning images while adhering closely to user instructions.
Meta recently unveiled its advanced Emu Edit, a cutting-edge AI model developed through extensive computer vision tasks. This remarkable model was trained on an unprecedented dataset comprising 10 million synthetic images. Each image in this collection was paired with a detailed task description and a corresponding targeted output image, creating what is believed to be the largest dataset of its kind.
The development of Emu Edit marks a significant stride in AI research. It was refined using an enormous trove of data—1.1 billion data elements, to be precise. This includes a vast array of photos and captions, primarily sourced from user-shared content on popular social media platforms like Facebook and Instagram. This revelation was made by Mark Zuckerberg, the CEO of Meta, during the prominent Meta Connect event held in September.
One thought on “Meta introduces AI models for video generation, image editing”