DeepFloyd, a fresh multimodal research AI lab that is a part of Stability AI, releases IF — a text-to-image cascaded pixel diffusion model.

DeepFloyd IF is a modular neural network based on the cascaded approach.

– IF is built with multiple neural modules (independent neural networks that tackle specific tasks), joining forces within a single architecture to produce a synergistic effect.

– IF generates high-resolution images in a cascading manner: the action kicks off with a base model that produces low-resolution samples, which are then boosted by a series of upscale models to create stunning high-resolution images.

– IF’s base and super-resolution models adopt diffusion models, making use of Markov chain steps to introduce random noise into the data, before reversing the process to generate new data samples from the noise.

– IF operates within the pixel space, as opposed to latent diffusion (e.g. Stable Diffusion) that depends on latent image representations