Text To Image

Text-to-Image (or: text2image, txt2img, etc.) is a name for a family of Machine Learning algorithms that are able to synthesize images on the basis of a random input text.

The most popular Text-to-Image model is named DALLE-2 by OpenAI, unfortunately it is closed-sourced, but there is a waiting list (as of April 2022). Luckily, there are many open source implementations available though!

Since mid 2021 people experimented by combining a newly released ML model named CLIP (by OpenAI) with image generating models like BigGAN or VQGAN. The resulting models are named VQGAN+CLIP and CLIP-guided-Diffusion. An older attempt to generate images from text (before CLIP) was named AttnGAN.

On Twitter, Reddit etc. you find an explosion of visual media (images, films) created with these tools. Here are some links for inspiration:

Inspiration

  • Reddit DALLE-2: https://www.reddit.com/r/dalle2/top/
  • Weird DALL-E Mini Creations (Twitter): https://twitter.com/weirddalle
  • Weird DALL-E Mini Creations (Reddit): https://www.reddit.com/r/weirddalle/
  • https://twitter.com/images_ai
  • Reddit with lots of examples: https://www.reddit.com/r/bigsleep/

Reading & Watching

  • Video by Vox “The AI that creates any picture you want, explained”: https://www.youtube.com/watch?v=SVcsDDABEkM
  • Video about DALLE-2 (OpenAI): https://openai.com/dall-e-2/
  • Video by the great Youtube Channel “Artifical Images” (Derrick Schultz): “How does CLIP Text-to-image generation work?”: https://www.youtube.com/watch?v=-b7xKWeADHQ
  • “VQGAN+CLIP how does it work” blogpost (08/2021) by Alexa Steinbrück (XLab Burg Halle): https://alexasteinbruck.medium.com/vqgan-clip-how-does-it-work-210a5dca5e52
  • “AI-Generated Art Scene Explodes as Hackers Create Groundbreaking New Tools” in: VICE (07/2021): https://www.vice.com/en/article/n7bqj7/ai-generated-art-scene-explodes-as-hackers-create-groundbreaking-new-tools
  • “Alien Dreams: An Emerging Art Scene” by Charlie Snell: https://ml.berkeley.edu/blog/posts/clip-art/
  • “Doorways” by Ryan Moulton: https://moultano.wordpress.com/2021/08/23/doorways/

Tools

(sorted by: easy to more difficult)

Easy-to-use Web Applications (No Coding required)

Google Colab notebooks

Learn what Google Colab is here: Google Colab

Github Repos

Big projects

  • DALLE-2: https://openai.com/dall-e-2/
  • DALLE-Mega (open source)
  • DALLE-Mini (open source)
  • Midjourney

More (Lists)

Technicalities

  • How did they arrive at the VQGAN+CLIP Architecture? “Tree of Knowledge” visualisation by LJ Miranda: https://ljvmiranda921.github.io/assets/png/vqgan/tree_of_knowledge.png from this blogpost about VQGAN “The illustrated VQGAN”: https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqgan/
  • “Explaining the code of the popular text-to-image algorithm (VQGAN+CLIP in PyTorch)” (04/2022) by Alexa Steinbrück (XLab Burg Halle): https://alexasteinbruck.medium.com/explaining-the-code-of-the-popular-text-to-image-algorithm-vqgan-clip-a0c48697a7ff

Table of Content

Notes mentioning this note