Catching Up with Generative Computer Vision in the New Year: OpenAI GLIDE
by Adam Pease
On the heels of this year’s Consumer Electronics Show (CES) there is much to be excited about in tech in the new year. But as we watch new technologies get revealed, it can be easy to forget how transformative 2021 was as well. In this blog, we review an exciting but easy-to-miss development in AI that came at the end of last year.
OpenAI and Generative Content
OpenAI, a Bay Area-based artificial intelligence company and Microsoft partner, recently announced the release of a new text-to-image model for generating visual content called GLIDE (Guided Language to Image Diffusion for Generation and Editing). GLIDE builds on OpenAI’s earlier model DALL-E (named after the Spanish surrealist painter Salvador Dali), which takes the company’s unparalleled GPT-3 textual prediction model and applies the same method to images. When it was first revealed, DALL-E was lauded for its ability to produce relatively convincing facsimiles. However, these fabrications—while convincing at a distance—often betrayed the telltale smudges of AI-generated images.
The results of GLIDE, by contrast, speak for themselves. Where previous generative networks were capable of impressive results, they struggled to produce images that could fool the human observer, often giving away their fabrication with smudges and bleeding edges. By contrast, GLIDE’s results show clear edges and definite shapes.
While the model is not available for public use, OpenAI’s demonstration of its power is nothing short of astounding. In many cases, the model was able to create photorealistic renditions of scenes that had never occurred—such as a convincing photograph of a hedgehog using a calculator. The recent crop of images appears far more consistent and high-quality than the results of the DALL-E model, but time will tell if the results can be replicated.
Computer Vision and the Future of Content
Last year, Aragon published a Research Note predicting future trends in the computer vision market, and one key trend that we assessed was the rise of generative content. We predicted that organizations will turn to AI to produce visual content for them in the near future.
With GLIDE, it appears that the underlying technology for photorealistic generative content is already here. How long before marketing teams make use of AI models to visualize entire campaigns? With a model appropriately packaged as a SaaS product and delivered in a no-code interface, content marketers could quickly be creating anything from a stylized logo to a photorealistic product shoot featuring human beings that never existed.
AI Content and Usage Restrictions
While the results of GLIDE are impressive, it will likely be sometime before these tools become table-stakes for any content creator. The prohibitive compute power and processing time for producing visual content means that the value of generative technology will likely be linked to providers’ ability to provide the right kind of hardware support and packaging to make the user experience painless. Further, the GLIDE model, like many OpenAI products, is only partly accessible to the public. Microsoft’s exclusive GPT-3 license has led to a strict set of usage requirements designed to ensure ethical and safe use of AI products, but this also has the effect of limiting the growth of the open source ecosystem. It is too early to say exactly how these restrictions will shake out in terms of copyright law, but we surely expect that copyright issues will arise considering the model’s content is assembled from preexisting data that is not completely in the public domain.
Nevertheless, when looking at GLIDE’s results, it is perhaps a bit comforting to know that it is not yet possible for anyone with a computer to quickly generate a photorealistic image of whatever is in their imagination. Usage requirements and ethical guidelines will be essential in a world where computers are capable of producing fake images that human beings can no longer distinguish from real ones. The ramifications that the widespread use of photorealistic content generation will have on society at large certainly cannot be understated.
The reveal of GLIDE at the end of 2021 represents yet another OpenAI breakthrough that is driving forward the AI market. The photorealistic results of the algorithm suggest that the world of generative content is not an if but a when, and that when it does come, it will change the way we look at content forever
Have a Comment on this?