Skip to content

SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

Awesome License: MIT Made With Love arXiv visitors

The repository is based on our survey Diffusion Model-Based Image Editing: A Survey.

Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen

Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)

Abstract

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.

🔖 News!!!

📌 We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.

📰 2024-10-25: Our benchmark EditEval_v2 is now released.

📰 2024-03-22: The template of computing LMM Score using GPT-4V, along with a corresponding leaderboard comparing several leading methods, is released.

📰 2024-03-14: Our benchmark EditEval_v1 is now released.

📰 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue button within Issues or by clicking here. Once there, please select the Paper Submission Form and complete it following the guidelines provided.

📰 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available.

🔍 BibTeX

If you find this work helpful in your research, welcome to cite the paper and give a ⭐.

@article{huang2024diffusion,
  title={Diffusion Model-Based Image Editing: A Survey},
  author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Chen, Shifeng and Cao, Liangliang},
  journal={arXiv preprint arXiv:2402.17525},
  year={2024}
}

Table of contents

Papers

Training-Based

Training-Based: Domain-Specific Editing

Title Publication Date
TexFit: Text-Driven Fashion Image Editing with Diffusion Models AAAI 2024 2024.03
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation NeurIPS 2023 2023.10
Stylediffusion: Controllable disentangled style transfer via diffusion models ICCV 2023 2023.08
Hierarchical diffusion autoencoders and disentangled image manipulation WACV 2024 2023.04
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models arXiv 2023 2023.04
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models CVPR workshop 2023 2022.12
Diffstyler: Controllable dual diffusion for text-driven image stylization TNNLS 2024 2022.11
Diffusion Models Already Have A Semantic Latent Space ICLR 2022 2022.10
Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations NeurIPS 2022 2022.07
Diffusion autoencoders: Toward a meaningful and decodable representation CVPR 2022 2021.11
Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models arXiv 2021 2021.04
Diffusionclip: Text-guided diffusion models for robust image manipulation CVPR 2022 2021.01

Training-Based: Reference and Attribute Guided Editing

Title Publication Date
MagicEraser: Erasing Any Objects via Semantics-Aware Control ECCV 2024 2024.10
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control CVPR 2024 2023.12
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting arXiv 2023 2023.12
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models arXiv 2023 2023.12
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model ACM MM 2023 2023.10
Face Aging via Diffusion-based Editing BMVC 2023 2023.09
Anydoor: Zero-shot object-level image customization CVPR 2024 2023.07
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model ICASSP 2024 2023.06
Text-to-image editing by image information removal WACV 2024 2023.05
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model CVPR workshop 2023 2023.04
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor CVPR 2024 2023.03
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting CVPR 2023 2022.12
Smartbrush: Text and shape guided object inpainting with diffusion model CVPR 2023 2022.12
ObjectStitch: Object Compositing With Diffusion Model CVPR 2023 2022.12
Paint by example: Exemplar-based image editing with diffusion models CVPR 2023 2022.11

Training-Based: Instructional Editing

Title Publication Date
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction arXiv 2024 2024.09
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing arXiv 2024 2024.05
InstructGIE: Towards Generalizable Image Editing arXiv 2024 2024.03
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models CVPR 2024 2023.12
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following arXiv 2023 2023.12
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation CVPR 2024 2023.12
Emu edit: Precise image editing via recognition and generation tasks arXiv 2023 2023.11
Guiding instruction-based image editing via multimodal large language models ICLR 2024 2023.09
Instructdiffusion: A generalist modeling interface for vision tasks CVPR 2024 2023.09
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers arXiv 2023 2023.09
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation NeurIPS 2023 2023.08
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models arXiv 2023 2023.04
HIVE: Harnessing Human Feedback for Instructional Visual Editing CVPR 2024 2023.03
DialogPaint: A Dialog-based Image Editing Model arXiv 2023 2023.01
Learning to Follow Object-Centric Image Editing Instructions Faithfully EMNLP 2023 2023.01
Instructpix2pix: Learning to follow image editing instructions CVPR 2023 2022.11

Training-Based: Pseudo-Target Retrieval-Based Editing

Title Publication Date
Text-Driven Image Editing via Learnable Regions CVPR 2024 2023.11
iEdit: Localised Text-guided Image Editing with Weak Supervision arXiv 2023 2023.05
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation arXiv 2023 2023.05

Testing-Time Finetuning

Testing-Time Finetuning: Denosing Model Finetuning

Title Publication Date
Kv inversion: Kv embeddings learning for text-conditioned real image action editing arXiv 2023 2023.09
Custom-edit: Text-guided image editing with customized diffusion models CVPR workshop 2023 2023.05
Unitune: Text-driven image editing by fine tuning an image generation model on a single image ACM TOG 2023 2022.10

Testing-Time Finetuning: Embeddings Finetuning

Title Publication Date
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing NeurIPS 2023 2023.09
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models ICCV 2023 2023.05
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models CVPR 2023 2022.12
Null-text inversion for editing real images using guided diffusion models CVPR 2023 2022.11

Testing-Time Finetuning: Guidance with Hypernetworks

Title Publication Date
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing arXiv 2023 2023.05
Inversion-based creativity transfer with diffusion models CVPR 2023 2022.11

Testing-Time Finetuning: Latent Variable Optimization

Title Publication Date
StableDrag: Stable Dragging for Point-based Image Editing arXiv 2024 2024.03
FreeDrag: Feature Dragging for Reliable Point-based Image Editing CVPR 2024 2023.12
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing CVPR 2024 2023.11
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion Models arXiv 2023 2023.10
Dragondiffusion: Enabling drag-style manipulation on diffusion models ICLR 2024 2023.07
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing CVPR 2024 2023.06
Delta denoising score ICCV 2023 2023.04
Directed Diffusion: Direct Control of Object Placement through Attention Guidance AAAI 2024 2023.02
Diffusion-based Image Translation using disentangled style and content representation ICLR 2022 2022.09

Testing-Time Finetuning: Hybrid Finetuning

Title Publication Date
Forgedit: Text Guided Image Editing via Learning and Forgetting arXiv 2023 2023.09
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models arXiv 2023 2023.05
Sine: Single image editing with text-to-image diffusion models CVPR 2023 2022.12
Imagic: Text-Based Real Image Editing With Diffusion Models CVPR 2023 2022.10

Training and Finetuning Free

Training and Finetuning Free: Input Text Refinement

Title Publication Date
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques arXiv 2023 2023.06
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation arXiv 2023 2023.05
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions arXiv 2023 2023.05
Preditor: Text guided image editing with diffusion prior arXiv 2023 2023.02

Training and Finetuning Free: Inversion/Sampling Modification

Title Publication Date
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing arXiv 2024 2024.12
Inversion-Free Image Editing with Natural Language CVPR 2024 2023.12
Fixed-point Inversion for Text-to-image diffusion models arXiv 2023 2023.12
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing arXiv 2023 2023.12
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing ICLR 2024 2023.11
LEDITS++: Limitless Image Editing using Text-to-Image Models CVPR 2024 2023.11
A latent space of stochastic diffusion models for zero-shot image editing and guidance ICCV 2023 2023.10
Effective real image editing with accelerated iterative diffusion inversion ICCV 2023 2023.09
Fec: Three finetuning-free methods to enhance consistency for real image editing arXiv 2023 2023.09
Iterative multi-granular image editing using diffusion models WACV 2024 2023.09
ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance WACV 2024 2023.06
Diffusion self-guidance for controllable image generation NeurIPS 2023 2023.06
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images arXiv 2023 2023.06
Null-text guidance in diffusion models is secretly a cartoon-style creator ACM MM 2023 2023.05
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models arXiv 2023 2023.05
An Edit Friendly DDPM Noise Space: Inversion and Manipulations CVPR 2024 2023.04
Training-Free Content Injection Using H-Space in Diffusion Models WACV 2024 2023.03
Edict: Exact diffusion inversion via coupled transformations CVPR 2023 2022.11
Direct inversion: Optimization-free text-driven real image editing with diffusion models arXiv 2022 2022.11

Training and Finetuning Free: Attention Modification

Title Publication Date
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing CVPR 2024 2024.03
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models arXiv 2023 2023.12
Tf-icon: Diffusion-based training-free cross-domain image composition ICCV 2023 2023.07
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models NeurIPS 2023 2023.06
Conditional Score Guidance for Text-Driven Image-to-Image Translation NeurIPS 2023 2023.05
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing ICCV 2023 2023.04
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models ICCV 2023 2023.03
Zero-shot image-to-image translation ACM SIGGRAPH 2023 2023.02
Shape-Guided Diffusion With Inside-Outside Attention WACV 2024 2022.12
Plug-and-play diffusion features for text-driven image-to-image translation CVPR 2023 2022.11
Prompt-to-prompt image editing with cross attention control ICLR 2023 2022.08

Training and Finetuning Free: Mask Guidance

Title Publication Date
Grounded-Instruct-Pix2Pix: Improving Instruction Based Image Editing with Automatic Target Grounding ICASSP 2024 2024.03
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance ACM MM 2024 2023.12
ZONE: Zero-Shot Instruction-Guided Local Editing CVPR 2024 2023.12
Watch your steps: Local image and scene editing by text instructions arXiv 2023 2023.08
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models NeurIPS 2023 2023.06
Differential Diffusion: Giving Each Pixel Its Strength arXiv 2023 2023.06
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing arXiv 2023 2023.06
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference AAAI 2024 2023.05
Inpaint anything: Segment anything meets image inpainting arXiv 2023 2023.04
Region-aware diffusion for zero-shot text-driven image editing CVM 2023 2023.02
Text-guided mask-free local image retouching ICME 2023 2022.12
Blended diffusion for text-driven editing of natural images CVPR 2022 2021.11
DiffEdit: Diffusion-based semantic image editing with mask guidance ICLR 2023 2022.10
Blended latent diffusion SIGGRAPH 2023 2022.06

Training and Finetuning Free: Multi-Noise Redirection

Title Publication Date
Object-aware Inversion and Reassembly for Image Editing ICLR 2024 2023.10
Ledits: Real image editing with ddpm inversion and semantic guidance arXiv 2023 2023.07
Sega: Instructing diffusion using semantic dimensions NeurIPS 2023 2023.01
The stable artist: Steering semantics in diffusion latent space arXiv 2022 2022.12

Benchmark EditEval_v1

EditEval_v1 is a benchmark tailored for evaluation of general diffusion-model based image editing algorithms. It contains 50 high-quality images selected from Unsplash, each accompanied by a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. This benchmark covers seven most popular specific editing tasks across semantic, stylistic and structural editing defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Benchmark EditEval_v2

EditEval_v2 is an enhanced benchmark designed to evaluate general diffusion-model-based image editing algorithms. This version expands upon its predecessor by including 150 high-quality images selected from Unsplash. Each image is paired with a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. EditEval_v2 continues to cover the seven most popular specific editing tasks across semantic, stylistic, and structural editing as defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Leaderboard

To facilitate a user-friendly application of LMM Score, here we provide a comprehensive template for its implementation in GPT-4V. This template comes with step-by-step instructions and all required materials, making it easy for users to apply. Additionally, we construct a leaderboard comparing various representative methods evaluated using LMM Score on our EditEval_v1 benchmark, which can be found here.

Star History

Star History Chart

About

Diffusion Model-Based Image Editing: A Survey (arXiv)

Resources

License

Stars

Watchers

Forks