Juan A. Rodríguez, David Vázquez, Issam Laradji, Marco Pedersoli, Pau Rodríguez
ServiceNow Research, Montréal, Canada
ÉTS Montreal, University of Québec
FigGen is a latent diffusion model that generates scientific figures of papers conditioned on the text from the papers (text-to-figure). We use OCR-VQGAN to project scientific figures (images) into a latent representation, and use a latent diffusion model to learn a generator. We jointly train a Bert transformer to learn text embeddings and perform text-to-figure generation.
This code is adapted from Latent Diffusion at CompVis/stable-diffusion.
Abstract
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain unexplored. In this paper, we introduce the problem of text-to-figure generation, that is creating scientific figures of papers from text descriptions. We present FigGen, a diffusion-based approach for text-to-figure as well as the main challenges of the proposed task. Code and models are available in this repository.
Create a conda environment named figgen
,
and activate it:
conda env create -f environment.yaml
conda activate figure-diffusion
pip install -e .
- Download Paper2Fig100k dataset from Zenodo and extract it in a
data
folder. Download the trained models from HuggingFace and extract them in amodels
folder. You will need the image encoder and the diffusion model.
Modify the config files in configs/figure-diffusion/fig-gen-{...}.yaml
to point to the correct paths. You must change the ckpt_path
(in model.first_stage_config
) and json_file
(in data
) with the corrsponding paths.
To train the latent diffusion model from scratch, run the following command:
python main.py --config configs/figure-diffusion/fig-gen-{...}.yaml
Some qualitative results of our model. We show the text description of the figure, the generated figure, and the ground truth figure. Check the paper for more results.
- Automatically download Paper2Fig100k dataset (from Zenodo) and trained models (from HF)
High-Resolution Image Synthesis with Latent Diffusion Models by Rombach et al, CVPR 2022 Oral.
OCR-VQGAN: Taming Text-within-Image Generation by Rodriguez et al, WACV 2023.
If you use this code please cite the following paper:
@article{rodriguez2023figgen,
title={FigGen: Text to Scientific Figure Generation},
author={Rodriguez, Juan A and Vazquez, David and Laradji, Issam and Pedersoli, Marco and Rodriguez, Pau},
journal={arXiv preprint arXiv:2306.00800},
year={2023}
}
Juan A. Rodríguez (joanrg.ai@gmail.com).