via The Verge…
Apple released an open-source AI model, called “MGIE,” that can edit images based on natural language instructions. MGIE (MLLM-Guided Image Editing), leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model can handle various editing aspects, such as Photoshop-style modification, global photo optimization, and local editing. MGIE is the result of a collaboration with researchers from the University of California, Santa Barbara.
MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user input. For example, given the input “make the sky more blue”, MGIE can produce the instruction “increase the saturation of the sky region by 20%.”
Second, it uses MLLMs to generate visual imagination, a latent representation of the desired edit. This representation captures the essence of the edit and can be used to guide the pixel-level manipulation. MGIE’s training scheme jointly optimizes the instruction derivation, visual imagination, and image editing modules.
MGIE is available as an open-source project on GitHub. The project provides a demo notebook that shows how to use MGIE for various editing tasks. Users can also try out MGIE through a web demo at Hugging Face Spaces.