NVIDIA Enters the Text-to-Image Fray
By Adam Pease
NVIDIA has just recently announced the release of its first major text-to-image diffusion tool, called eDiffi.
This tool enters a rapidly-changing market that has seen many new entrants lately.
In this second edition of the Aragon Research Blog Series for Content AI, we explore the implications of this announcement.
NVIDIA Enters the Text-to-Image Fray
2022 has been rife with news about text-to-image artificial intelligence, which can interpret a user’s written prompt and return a generated image in seconds.
Models such as OpenAI’s DALLE-2 and Stability AI’s Stable Diffusion have demonstrated impressive leaps in quality for AI’s ability to generate human-quality images.
NVIDIA has long been an important background player for many state of the art machine learning models that depend on NVIDIA chips and their CUDA programming architecture.
Now, NVIDIA has dipped its toes into the water with the release of a new text-to-image model, eDiffi.
eDiffi stands out from other text-to-image models like DALL-E based on the architecture of its algorithm.
Today’s diffusion models rely on algorithms called encoders that help the AI system understand what image user text might refer to.
In the past, providers like OpenAI have selected one encoder to use for their algorithms.
However, eDiffi is the first to put together multiple state of the art encoders to provide a richer and more varied understanding of image content.
The resulting algorithm allows users to more carefully modify aspects of image style, editing the image more precisely.
Will NVIDIA Win the Diffusion Model Arms Race?
While NVIDIA’s new model brings interesting capabilities to the table, it still faces strong competition from other players such as Midjourney, which recently released a striking update to its image generation model.
In the current landscape of text-to-image AI, it is difficult to imagine a monopoly emerging, not least because there is no clear consensus about the best pricing model for these kinds of AI systems.
Still, NVIDIA has advantages as a hardware provider that may become more clear as it takes its research in generative content from the R&D stage and brings a product to market.
Aragon Research expects that model quality will converge around a similar median, with different models achieving slightly varied results for different benchmarked tasks:
Generating artistic vs. photorealistic images, for example.
Success in the market will depend on how providers market the relative merits of their models and develop platforms that improve the user experience for content creators.
Bottom Line
NVIDIA’s eDiffi represents yet another foray into text-to-image from a significant technology player, and shouldn’t be overlooked.
The strident competition over generative content is pushing the market forward quickly, and it remains to be seen which provider will find the right way to package and market these solutions.
Have a Comment on this?