Skip to content

Releases: rkhamilton/vqgan-clip-generator

2.3.2

11 Nov 13:44
Compare
Choose a tag to compare

Bug Fixes

  • style_transfer did not correctly pass an init_weight to image(), resulting is non-changing video output. Fixes issue #61.

2.3.1

11 Nov 11:55
Compare
Choose a tag to compare

Bug Fixes

  • extract_video_frames() had a misleading error message, which is clarified.
  • restyle_video_naive.py had some bugs because it hadn't been kept up with API changes. It should run now, but it is still using the deprecated restyle_video function, so you should move to style_transfer.py as your example starting point.
  • extract_video_frames() will do a better job purging existing files from the extraction folder.

v2.3.0

07 Nov 15:46
Compare
Choose a tag to compare
  • added init_weight as option to generate.image(). It works the same as in the other functions, but was missing from image.

Bug Fixes

  • esrgan.inference_realesrgan() will now raise an exception if it is unable to load an image from the passed file or folder.
  • generate.* functions now check for valid function inputs, and will raise exceptions appropriately.

2.2.0

05 Nov 20:02
Compare
Choose a tag to compare

This release changes style_transfer to work better at low iterations_per_frame. Previously it was resetting the gradient (training) with each new frame of video. Now it is preserved.

The cut_method is also now saved to media metadata.

2.1.1

04 Nov 22:15
Compare
Choose a tag to compare

Bug Fixes

  • generate.video_frames was not working for low iterations_per_frame. This is no corrected for non-zooming videos. As long as zoom_scale==1.0, and shift_x, and shift_y, are 0, you can freely set iterations_per_frame to 1 and get expected results.

Known Issues
There is still an issue with iterations_per_frame < ~5 when using zoom_scale==1.0, and shift_x, and shift_y. It takes more iterations_per_frame than expected to see progress in the result. For the time being, use a higher iterations_per_frame if you are using these parameters, than if you are not.

2.1.0

03 Nov 21:25
1a0f2f7
Compare
Choose a tag to compare

This release adds support for multiple export filetypes in addition to PNG. Exports to jpeg or PNG will have metadata embedded that describe the media generation settings. PNG files have already had metadata stored in PNG data chunks. JPG files, available in 2.1, have metadata stored in the exif fields XPTitle and XPComment. Other export filetypes are supported for still images, provded they are types supported by Pillow.

I ran a lot of side-by-side comparisons of different cut_method approaches, and found the 'kornia' method produces more interesting small details in results, as compared to the 'original' and 'sg3' methods. I've changed the default cut_method to 'kornia'. You can get the old behavior back by setting config.cut_method='original' if desired. I'm doing more detailed comparisons into ways to create cuts and exploring alternatives for the future.

API changes

  • Engine.save_current_output() argument png_info, type changed to img_metadata.
  • _functional.copy_PNG_metadata replaced with _functional.copy_image_metadata. This function handles jpg and png files.

Bug Fixes

  • Real-ESRGAN script was not handling folders of complex filenames with spaces and special character.
  • Fix for extracting video from folders with long paths with spaces.
  • Improvements to progress bar accuracy for generate.video_frames().
  • Fixed regression in RIFE wrapper. Now tested and working on Google Colab.
  • The tqdm progressbar has been updated to work correctly in Jupyter notebooks.
  • video_tools.encode_video() fixed to work on linux systems (Google Colab).

2.0.0

29 Oct 16:42
29f94d8
Compare
Choose a tag to compare

This release introduces major improvements to style transfers, in which VQGAN style is applied to an existing video. The improvements should result in videos that are more consistant from frame-to-frame (less flicker). Associated with the style transfer improvements, there are major changes in the video generation API to make it easier to calculate video durations.

API changes

  • generate.style_transfer added with the new video generation features.
  • generate.zoom_video_frames and generate.video_frames have been combined to a single function: generate.video_frames. If you do not specify zoom_scale, shift_x, or shift_y, these values default to 0, and non-zooming images are generated.
  • generate.video_frames arguments changed. iterations and save_every are removed. New arguments are provided to make it easier to calculate video durations.
    • num_video_frames : Set the number of video frames (images) to be generated.
    • iterations_per_frame : Set the number of vqgan training iterations to perform for each frame of video. Higher numbers are more stylized.
  • generate.multiple_images removed. Functionally it was identical to repeatedly running generate.single_image
  • generate.single_image renamed to generate.image
  • generate.single_image argument change_prompt_every is removed. It is not relevant for generating a single image.
  • generate.restyle_video renamed to generate.restyle_video_legacy. It will be removed in a future version.
  • generate.restyle_video_naive removed. Use generate.style_transfer instead.
  • video_tools.RIFE_interpolation added as a wrapper to the arXiv2020-RIFE inference_video.py script.

New Features

  • generate.zoom_video lets you specify specific video frames where prompts should be changed using the argument change_prompts_on_frame. E.g. to change prompts on frames 150 and 200, use change_prompts_on_frame = [150,200]. Examples are updated with this argument.
  • video_tools now sets ffmpeg to output on error only

Bug Fixes

  • The upscaling video example file had a bug in the ffmpeg command.
  • The generate.encode_video method was not producing file with the expected framerate.
  • Many problems were resolved that impacted paths that included spaces. In general, be sure to pass f-strings as paths (f'my path{os.sep}here').

1.3.0

24 Oct 19:04
Compare
Choose a tag to compare

This release adds smoothing to the output of video_frames and restyle_video_frames. The smoothing is done by combining a user-specifiable number of latent vectors (z) and averaging them together using a modified exponentially weighted moving average (EWMA). The approach used here creates a sliding window of z frames (of z_smoothing_buffer length). The center of this window is considered the key frame, and has the greatest weight in the result. As frames move away from the center of the buffer, they have exponentially decreasing weight, by factor (1-z_smoothing_alpha)**offset_from_center.

To increase the temporal smoothing, increase the buffer size. To increase the weight of the key frame of video, increase the z_smoothing_alpha. More smoothing will combine adjacent z vectors, which will blur rapid motion from frame to frame.

1.2.2

23 Oct 13:32
Compare
Choose a tag to compare

Test coverage increased to include all generate, esrgan, and video_tools functions.

Bug Fixes

  • generate.extract_video_frames was still saving jpgs. Changed to only save png.

1.2.1

22 Oct 18:05
Compare
Choose a tag to compare

New features:

  • Video metadata is encoded by the encode_video function in the title (text prompts) and comment (generator parameters) fields.

Bug Fixes

  • generate.restyle_video* functions no longer re-load the VQGAN network each frame, which results in a 300% speed-up in running this function. This means that training doesn't start over each frame, so the output will look somewhat different than in earlier versions.
  • generate functions no longer throw a warning when the output file argument doesn't have an extension.
  • v1.2.0 introduced a bug where images were saved to output/output/filename. This is fixed.