And it does it at least as well as DALL-E 2. kartinki-po-opisaniju-7c07343.jpg” alt=”Google introduced the Imagen neural network. It generates pictures according to the description” />
Daria Gromova
Google has announced Imagen, a neural network that converts a text query into images. This is a direct competitor to DALL-E 2 from OpenAI – which works even better in some scenarios.
To recognize a text query, the neural network uses large language models – natural speech processing algorithms like GPT-3 are based on them.
The system works in three stages. The first one draws a small 64 x 64 pixel image, which is refined until the neural network can change it to better match the original request. The image is then scaled up to 256 x 256 pixels and Imagen refines the details. At the third stage, the same thing is repeated with the canvas of the final size – 1024 x 1024 pixels.
The text of the study notes that Imagen copes with understanding complex queries better than DALL-E 2. For example, for the query “Panda makes latte art” DALL- E 2 produced exclusively panda latte art, while Google's neural network managed to produce mostly correct results:
But Google also admits that none of these neural networks coped with the request “horse on astronaut”: both stubbornly put the astronaut on the horse, and not vice versa. Both clearly have room to grow.
Independent viewer evaluation results show that Imagen outperforms DALL-E 2 in terms of accuracy and relevance. And although this comparison can be considered subjective, such results are still impressive, given that DALL-E 2 has so far been an unattainable ideal that other neural networks of a similar purpose could not compare with.
In any case, Imagen while it remains an experimental project that ordinary users cannot access. It's unclear how long it will be before Google builds an open-source service around it.
Cover: Google