Skip to content

Releases: rkhamilton/vqgan-clip-generator

1.2.0

22 Oct 03:03
Compare
Choose a tag to compare

Important change to handling initial images
I discovered that the code that I started from had a major deviation in how it handled initial images, which I carried over in my code. The expected behavior is that passing any value for init_weight would drive the algorithm to preserve the original image in the output. The code I was using had changed this behavior completely to an (interesting) experimental approach so that the initial image feature was putting pressure on the output to drive it to an all grayscale, flat image, with a decay of this effect with iteration. If you set the init_weight very high, instead of ending up with your initial image, you would get a flat gray image.

The line of code used in all other VQGAN+CLIP repos returns the diffrence between the outut tensor z (the current output image) and the orginal output tensor (original image):

F.mse_loss(self._z, self._z_orig) * self.conf.init_weight / 2

The line of code used in the upstream copy that I started from is very different, with an effect that decreases with more iterations:

F.mse_loss(self._z, torch.zeros_like(self._z_orig)) * ((1/torch.tensor(iteration_number*2 + 1))*self.conf.init_weight) / 2

New features:

  • Alternate methods for maintaining init_image are provided.
    • 'decay' is the method used in this package from v1.0.0 through v1.1.3, and remains the default. This gives a more stylized look. Try values of 0.1-0.3.
    • 'original' is the method from the original Katherine Crowson colab notebook, and is in common use in other notebooks. This gives a look that stays closer to the source image. Try values of 1-2.
    • specify the method using config.init_weight_method = 'original' if desired, or config.init_weight_method = 'decay' to specify the default.
  • Story prompts no longer cycle back to the first prompt when the end is reached.
  • encode_video syntax change. input_framerate is now required. As before, if output_framerate differs from input_framerate, interpolation will be used.
  • PNG outputs have data chunks added which describe the generation conditions. You can view these properties using imagemagick. "magick identify -verbose my_image.png"

1.1.3

19 Oct 22:28
Compare
Choose a tag to compare

Bug Fixes

  • generate.restyle_video* functions now no longer rename the source files. Original filenames are preserved. As part of this fix, the video_tools.extract_video_frames() now uses a different naming format, consistent with generate.restyle_video. All video tools now use the filename frames_%12d.png.

1.1.2

19 Oct 22:28
Compare
Choose a tag to compare
  • When generating videos, the pytorch random number generator was getting a new seed every frame of video, instead of keeping the same seed. This is now fixed, and video is more consistent from frame to frame.

1.1.1

19 Oct 22:27
Compare
Choose a tag to compare

By user request, it is now possible to set an Engine.conf.model_dir to store downloaded models in a subfolder of the current working directory.

esrgan.inference_realesrgan(input='.\\video_frames',
        output_images_path='upscaled_video_frames',
        face_enhance=False,
        model_dir='models')

config = VQGAN_CLIP_Config()
config.model_dir = 'models'
generate.single_image(eng_config = config,
        image_prompts = 'input_image.jpg',
        iterations = 500,
        save_every = 10,
        output_filename = output_filename)

1.1.0

19 Oct 22:26
Compare
Choose a tag to compare
1.1.0 Pre-release
Pre-release

This is a significant change that breaks compatibility.

New features:

  • Real-ESRGAN integration for upscaling images and video. This can be used on generated media or existing media.
  • In order to accomodate flexible upscaling, all generate.*_video() methods have been changed to only generate folders of images (and renamed generate.*_video_frames()). You will need to optionally include a call to the upscaler, followed by a call to the video encoder.
  • All examples have been updated to include upscaling.
  • Model files for VQGAN and Real-ESRGAN are dynamically downloaded and cached in your pytorch hub folder instead of your working folder ./models subfolder. You will provide a URL and filename for the model to the vqgan_clip_generator.Engine object, and if there is no local copy available it will be downloaded and used. If a local copy has already been downloaded, it will not be downloaded again. This should give you a cleaner project folder / working directory, and allow model reuse across multiple project folders.
    • These files will need to be manually removed when you uninstall vqgan_clip_generator. On Windows, model files are stored in ~\.cache\torch\hub\models
    • You can copy your existing downloaded model files to ~\.cache\torch\hub\models and they will be used and not re-downloaded.
  • Initial images (init_image) used to initialize VQGAN+CLIP image generation has been moved to be an argument of the generate.* methods, instead of being accessed as part of Engine configuration. It was confusing that initializing image generation required accessing Engine config. The philosophy is that Engine config should not need to be touched in most cases except to set your output image size. Internally, generate.* methods just copy the init_image to the Engine config structure, but it seemed more clear to expose this as a generate.* argument.

Known issues:

  • Story prompts aren't working when restyling videos. Only the initial prompts (before the ^) are used. I need to change the prompt cycling to be based on video frame, not iteration, since the iterations reset for each frame.
  • Unit tests don't cover Real-ESRGAN yet.
  • The Colab notebook isn't fully tested for these changes yet.

1.0.0

18 Oct 01:00
Compare
Choose a tag to compare
1.0.0 Pre-release
Pre-release

First feature-complete release.
Be aware that a significant change is planned for v1.1 that will break compatibility for video generation. It will also introduce easy (hopefully) integration with Real-ESRGAN for upscaling.