Log in
Enquire now
DeepFloyd

DeepFloyd

DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.

OverviewStructured DataIssuesContributors

Contents

Is a
Product
Product

Product attributes

Industry
Generative AI
Generative AI
Product Parent Company
Stability AI
Stability AI
Competitors
Dall-E 2
Dall-E 2
Midjourney
Midjourney
Overview

DeepFloyd is a multimodal AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. Stability AI released a non-commercial research preview of DeepFloyd IF on April 28, 2023, providing research labs the opportunity to examine and experiment with the text-to-image model. Stability AI plans to release IF as a fully open-source model in the future.

Examples of images generated using DeepFloydIF.

Examples of images generated using DeepFloydIF.

IF is a modular cascaded, pixel diffusion model, which means.

  • Modular—the model consists of several neural networks that solve independent tasks such as generating images from prompts or upscaling.
  • Cascaded—IF models high-resolution data in a cascading manner using a series of individually trained models at different resolutions. The process begins with a base model that produces unique low-res samples that are upscaled by successive models known as amplifiers.
  • Diffusion—the base and super-resolution models are diffusion models where a Markov chain of steps is used to inject random noise into data until the process is reversed to generate new samples.
  • Pixel—this diffusion is implemented on a pixel level, unlike latent diffusion models (such as Stable Diffusion) that utilize latent representations.

Images are generated using a three-stage process passing the text prompt through the frozen T5-XXL language model to convert it to a qualitative text representation.

  1. The base diffusion model transforms natural language text into a 64x64 image. DeepFloyd has trained three versions of the base model, each with different parameters: IF-I 400M, IF-I 900M, and IF-I 4.3B.
  2. To ‘amplify’ the image, two text-conditional super-resolution models (Efficient U-Net) are applied to the output of the base model. The first of these upscales the 64x64 image to a 256x256 image. Again, several versions of this model are available: IF-II 400M and IF-II 1.2B.
  3. The second super-resolution diffusion model is applied to produce a vivid 1024x1024 image. The final third stage model IF-III has 700M parameters.
Diagram showing the image generation process of DeepFloyd IF and the various models it uses.

Diagram showing the image generation process of DeepFloyd IF and the various models it uses.

Features

DeepFloyd IF features include:

Deep text prompt understanding

IF's generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.

Text descriptions in images

Incorporating the T5 model, IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations.

Photorealism

IF achieves an impressive zero-shot FID score of 6.66 on the COCO dataset, FID is a metric used to evaluate the performance of text-to-image models.

Aspect ratio shifts

IF can generate images with a non-standard aspect ratio, vertical or horizontal, as well as the standard square aspect.

Zero-shot image-to-image translations

Image modification is possible by resizing the original image to 64 pixels, adding noise through forward diffusion, and using backward diffusion with a new prompt to denoise the image. The style can be changed further through super-resolution modules via a prompt text description.

Training

DeepFloyd IF was trained on a custom high-quality LAION-A dataset, containing 1B image-text pairs. LAION-A is an aesthetic subset of the English part of the LAION-5B dataset. It was obtained after deduplication based on similarity hashing, extra cleaning, and other modifications to the original dataset. The DeepFloyd team’s custom filters were used to remove watermarked, NSFW, and other inappropriate content.

Limitations and bias

DeepFloyd IF does not achieve perfect photorealism and was trained primarily with English captions, limiting its ability to return accurate images in other languages. While filters were applied, the LAION dataset used to train the model does contain contains adult, violent, and sexual content. IF may also reinforce or exacerbate social Biases. Again due to training based on English descriptions, texts and images from other languages are likely to be insufficiently accounted for.

License

Upon release, DeepFloyd IF was released under a research license with plans to move to a permissive license release. Any attempt to deploy the model in production requires not only that the license is followed but full liability over the person deploying the model. Stability AI believes research on DeepFloyd IF can lead to the development of novel applications in various domains including art, design, storytelling, virtual reality, accessibility, and more. Possible areas and tasks include:

  • Generation of artistic imagery and use in design
  • Safe deployment of models which have the potential to generate harmful content
  • Probing and understanding the limitations and biases of generative models
  • Applications in educational or creative tools
  • Research on generative models

Excluded uses of IF include:

  • Out-of-scope use—the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
  • Misuse and malicious use—using the model to generate content that is cruel to individuals is a misuse of this model.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

Building The Next Large Model: DeepFloyd LLM + Text-to-Image = IF (Stability AI)

https://www.youtube.com/watch?v=vlxnDNVkWFo

Web

April 7, 2023

References

Find more entities like DeepFloyd

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.