Text-Guided-Image-Colorization

This project utilizes the power of Stable Diffusion (SDXL/SDXL-Light) and the BLIP (Bootstrapping Language-Image Pre-training) captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.

Features

Interactive Colorization: Users can specify desired colors for different objects in the image.
ControlNet Approach: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.
High-Quality Outputs: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.
User-Friendly Interface: Easy-to-use interface for seamless interaction with the model.

Installation

To set up the project locally, follow these steps:

Clone the Repository:

git clone https://github.com/nick8592/text-guided-image-colorization.git
cd text-guided-image-colorization

Install Dependencies: Make sure you have Python 3.7 or higher installed. Then, install the required packages:
```
pip install -r requirements.txt
```
Install torch and torchvision matching your CUDA version:
```
pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXX
```
Replace XXX with your CUDA version (e.g., 118 for CUDA 11.8). For more info, see PyTorch Get Started.

Download Pre-trained Models:

Models	Hugging Face (Recommand)	Other
SDXL-Lightning Caption	link	link (2kNJfV)
SDXL-Lightning Custom Caption (Recommand)	link	link (KW7Fpi)

text-guided-image-colorization/sdxl_light_caption_output
└── checkpoint-30000
    ├── controlnet
    │   ├── diffusion_pytorch_model.safetensors
    │   └── config.json
    ├── optimizer.bin
    ├── random_states_0.pkl
    ├── scaler.pt
    └── scheduler.bin

Quick Start

Run the gradio_ui.py script:

python gradio_ui.py

Open the provided URL in your web browser to access the Gradio-based user interface.
Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.
The model will generate a colorized version of the image based on your input (or automatic). See the demo video.

Dataset Usage

You can find more details about the dataset usage in the Dataset-for-Image-Colorization.

Training

For training, you can use one of the following scripts:

train_controlnet.sh: Trains a model using Stable Diffusion v2
train_controlnet_sdxl.sh: Trains a model using SDXL
train_controlnet_sdxl_light.sh: Trains a model using SDXL-Lightning

Although the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.

Evaluation

For evaluation, you can use one of the following scripts:

eval_controlnet.sh: Evaluates the model using Stable Diffusion v2 for a folder of images.
eval_controlnet_sdxl_light.sh: Evaluates the model using SDXL-Lightning for a folder of images.
eval_controlnet_sdxl_light_single.sh: Evaluates the model using SDXL-Lightning for a single image.

Results

Prompt-Guided

Caption	Condition 1	Condition 2	Condition 3

a photography of a woman in a soccer uniform kicking a soccer ball	+ "green shirt"	+ "purple shirt"	+ "red shirt"

a photography of a photo of a truck	+ "bright red car"	+ "dark blue car"	+ "black car"

a photography of a cat wearing a hat on his head	+ "orange hat"	+ "pink hat"	+ "yellow hat"

Prompt-Free

Ground truth images are provided solely for reference purpose in the image colorization task.

Grayscale Image	Colorized Result	Ground Truth

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
example		example
images		images
000000000285.jpg		000000000285.jpg
000000000724.jpg		000000000724.jpg
000000007991.jpg		000000007991.jpg
000000018837.jpg		000000018837.jpg
000000122962.jpg		000000122962.jpg
000000295478.jpg		000000295478.jpg
README.md		README.md
eval_controlnet.py		eval_controlnet.py
eval_controlnet.sh		eval_controlnet.sh
eval_controlnet_sdxl_light.py		eval_controlnet_sdxl_light.py
eval_controlnet_sdxl_light.sh		eval_controlnet_sdxl_light.sh
eval_controlnet_sdxl_light_single.py		eval_controlnet_sdxl_light_single.py
eval_controlnet_sdxl_light_single.sh		eval_controlnet_sdxl_light_single.sh
gradio_ui.py		gradio_ui.py
requirements.txt		requirements.txt
train_controlnet.py		train_controlnet.py
train_controlnet.sh		train_controlnet.sh
train_controlnet_sdxl.py		train_controlnet_sdxl.py
train_controlnet_sdxl.sh		train_controlnet_sdxl.sh
train_controlnet_sdxl_light.py		train_controlnet_sdxl_light.py
train_controlnet_sdxl_light.sh		train_controlnet_sdxl_light.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Guided-Image-Colorization

Table of Contents

Features

Installation

Quick Start

Dataset Usage

Training

Evaluation

Results

Prompt-Guided

Prompt-Free

License

About

Releases

Packages

Languages

nick8592/text-guided-image-colorization

Folders and files

Latest commit

History

Repository files navigation

Text-Guided-Image-Colorization

Table of Contents

Features

Installation

Quick Start

Dataset Usage

Training

Evaluation

Results

Prompt-Guided

Prompt-Free

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages