Skip to content


Repository files navigation

ConFIDential: Sampling Bias in Ground Truth Based Generative Model Evaluation

This is the repository of our seminar paper (PDF) for the practical "Visual Representation Learning" at the chair of Prof. Ommer at LMU Munich. In our project we investigated the effects of sampling bias in FID computation using the models DiT, latent-diffusion, MDTv2, MaskDiT, Guided Diffusion, StyleGAN-XL, U-DiTs, VAR, LlamaGen, U-ViT, and MAR.


Evaluating the quality of deep generative models in computer vision is challenging, especially in aligning with human judgment. Traditional metrics such as Fréchet Inception Distance (FID) are widely used, but their standard computation introduces an unaddressed sampling bias. This involves generating a representative image sample according to a uniform class distribution, which completely ignores the class distribution underlying the ground truth dataset. This paper highlights the statistical error caused by this systemic bias and its impact on ground truth based metrics. We further empirically investigate its influence on FID by generating images according to uniform and ground truth class distributions. Our experiments on ten major generative models reveal discrepancies in FID results when different sampling methods are used. Based on our theoretical and empirical findings, we advocate for sampling according to the class distribution of the ground truth dataset to ensure consistent and reliable evaluations.


Each of the ten models has an independent folder; if you want to use these models to generate images based on your own research needs, see the Image Generation section.

The scripts folder includes helper files to generate the different class distributions for ~50k images given a folder of 80k generated images as discussed in the paper:


The script is used to generate distribution .txt files according to either of the three discussed protocols, while the script is used to create three image folders containing images according to the real class distribution as well as the two variations of uniform class distributions.

These folders can then be used as input for Fréchet distance (FD) calculation with e.g. dgm-eval.

The following FDs were obtained using a global random seed of 42 unless otherwise specified.

FID Results (dgm-eval: FD using Inception embeddings)

Model Uniform (50 per class) Uniform (50k times random choice of 1000 classes) Real (Underlying ImageNet1k distribution ~50k)
VAR 5.36 5.41 5.42
MDT 2.28 2.30 2.27
DiT 2.82 2.83 2.79
LDM 3.56 3.54 3.53
StyleGAN-XL (seed=1000) 2.60 2.56 2.60
StyleGAN-XL 2.61 2.55 2.56
MaskedDiT 2.32 2.34 2.30
LlamaGen 2.81 2.79 2.78
U-ViT 2.73 2.70 2.66
U-DiT 2.98 2.95 2.93
Mar 2.18 2.21 2.15

FDD Results (dgm-eval: FD using DINOv2 embeddings)

Model Uniform (50 per class) Uniform (50k times random choice of 1000 classes) Real (Underlying ImageNet1k distribution ~50k)
VAR 117.5 118.0 117.39
MDT 57.82 58.0 57.5
DiT 68.0 68.5 67.5
LDM 132.45 133.53 133.56
StyleGAN-XL (seed=1000) 133.85 133.80 132.88
StyleGAN-XL 133.56 133.53 132.45
MaskedDiT 59.0 59.5 58.5
LlamaGen 68.0 67.5 67.0
U-ViT 64.87 65.36 65.56
U-DiT 70.5 70.0 69.5
Mar 55.0 56.0 54.5

Image Generation

First, you need to run the following commands to make sure all the submodules are activated correctly:

git clone
cd fid-flaws
git submodule update --init --recursive

To generate 80 images for each ImageNet1k class using StyleGAN-XL, please run the following commands:

cd stylegan-xl
python \
--outdir=samplesheet --trunc=1.0 \
--network= \
--num-classes 1000 \
--num-samples-per-class 80 \
--batch-size 32

To generate 80 images for each ImageNet1k class using latent-diffusion, please run the following commands:

cd latent-diffusion
conda env create -f environment.yaml
conda activate ldm

To generate 80 images for each ImageNet1k class using MDTv2, please run the following commands:

cd MDT
conda create -n MDT python==3.10
conda init
conda activate MDT

pip install -r requirements.txt


python --tf32

To generate 80 images for each ImageNet1k class using MaskDiT, please run the following commands:

cd MaskDiT
conda create -n MaskDiT python==3.10
conda activate MaskDiT
pip install -r requirements.txt

python3 --name vae --dest assets/stable_diffusion
bash scripts/

python \
--config configs/test/maskdit-256.yaml \
--cfg_scale GUIDANCE_SCALE \
--num_images_per_class IMAGE_PER_CLASS \ 
--tf 32

To generate 80 images for each ImageNet1k class using VAR, please run the following commands:

cd VAR
conda create -n var python==3.10
pip install -r requirements.txt

To generate 80 images for each ImageNet1k class using DiT, please run the following commands:

cd DiT
conda env create -f environment.yml
conda activate DiT
python \
--num_classes 1000 \ 
--cfg-scale 1.5 \ 
--batch_size 32 \
--images_per_class 80 \
--ckpt /path/to/

To generate 80 images for each ImageNet1k class using mar, please run the following commands:

cd mar
conda env create -f environment.yaml
conda activate mar

python util/

python \
--batch-size 32 \
--cfg-scale 1.5 \
--cfg-schedule constant \
--samples-per-class 80 \
--tf32 # the tf32 will accelerate the generation 

To generate 80 images for each ImageNet1k class using LlamaGen, please run the following commands:

cd LlamaGen
conda env create -n LlamaGen python==3.11
conda activate LlamaGen
pip install -r requirements.txt

mkdir pretrained_models
cd pretrained_models
cd ..

python \
--batch-size 32 \
--cfg-scale 1.65 \
--gpt-model GPT-3B \
--ckpt ./pretrained_models/ \
--vq-ckpt ./pretrained_models/ \
--from-fsdp \
--num-samples-per-class 80 \
--tf32 # the tf32 will accelerate the generation 

To generate 80 images for each ImageNet1k class using U-DiT, please run the following commands:

cd U-DiT
conda env create -n U-DiT python==3.11
conda activate U-DiT
pip install -r requirements.txt


python \
--batch-size 32 \
--model U-DiT-L \
--cfg-scale 1.5 \
--image-size 256 \
--tf32 \ # the tf32 will accelerate the generation 

To generate 80 images for each ImageNet1k class using U-ViT, please run the following commands:

cd U-ViT
conda env create -n U-ViT python==3.11
conda activate U-ViT
pip install torch torchvision --extra-index-url  # install torch-1.13.1
pip install accelerate==0.12.0 absl-py ml_collections einops wandb ftfy==6.1.1 transformers==4.23.1

apt install gdown
gdown 13StUdrjaaSXjfqqF7M47BzPyhMAArQ4u
gdown 10nbEiFd4YCHlzfTkJjZf45YcSMCN34m6

pip install -U xformers
pip install -U --pre triton

python \
--batch-size 32 \
--cfg-scale 0.4 \
--steps 50 \
--num-samples-per-class 80 \
--tf32 # the tf32 will accelerate the generation 

FiD score calculation

To calculate the FiD score by using the dgm-eval repo dgm-eval, please run the following commands:

conda create --name dgm-eval pip python==3.10
conda activate dgm-eval
git clone
cd dgm-eval
pip install -e .

python -m dgm_eval \
--model inception \
--metrics fd \
--save \
--nsample 1500000

You can change the model flag to inception or dinov2 to calculate FID or FDD respectively. If you use the save flag, the calculated representation of each image folder will be saved in the dgm-eval/experiments folder.


True-distribution FID evaluation






No releases published


No packages published
