fix for video_frames with iterations_per_frame < 5

rkhamilton · Nov 4, 2021 · a317fe6 · a317fe6
1 parent 4e9ac4a
commit a317fe6
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 13 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,10 @@
+# v2.1.1
+**Bug Fixes**
+* generate.video_frames was not working for low iterations_per_frame. This is no corrected for non-zooming videos. As long as zoom_scale==1.0, and shift_x, and shift_y, are 0, you can freely set iterations_per_frame to 1 and get expected results.
+
+**Known Issues**
+There is still an issue with iterations_per_frame < ~5 when using zoom_scale==1.0, and shift_x, and shift_y. It takes more iterations_per_frame than expected to see progress in the result. For the time being, use a higher iterations_per_frame if you are using these parameters, than if you are not.
+
 # v2.1.0
 This release adds support for multiple export filetypes in addition to PNG. Exports to jpeg or PNG will have metadata embedded that describe the media generation settings. PNG files have already had metadata stored in PNG data chunks. JPG files, available in 2.1, have metadata stored in the exif fields XPTitle and XPComment. Other export filetypes are supported for still images, provded they are [types supported by Pillow](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html).
 

diff --git a/src/vqgan_clip/generate.py b/src/vqgan_clip/generate.py
@@ -338,19 +338,19 @@ def video_frames(num_video_frames,
                     eng.encode_and_append_prompts(current_prompt_number, parsed_text_prompts, parsed_image_prompts, parsed_noise_prompts)
 
             # Zoom / shift the generated image
-            pil_image = TF.to_pil_image(eng.output_tensor[0].cpu())
-            if zoom_scale != 1.0:
-                new_pil_image = VF.zoom_at(pil_image, output_image_size_x/2, output_image_size_y/2, zoom_scale)
-            else:
-                new_pil_image = pil_image
-
-            if shift_x or shift_y:
-                new_pil_image = ImageChops.offset(new_pil_image, shift_x, shift_y)
-
-            # Re-encode and use this as the new initial image for the next iteration
-            eng.convert_image_to_init_image(new_pil_image)
-
-            eng.configure_optimizer()
+            if zoom_scale != 1.0 or shift_x or shift_y:
+                pil_image = TF.to_pil_image(eng.output_tensor[0].cpu())
+                if zoom_scale != 1.0:
+                    new_pil_image = VF.zoom_at(pil_image, output_image_size_x/2, output_image_size_y/2, zoom_scale)
+                else:
+                    new_pil_image = pil_image
+
+                if shift_x or shift_y:
+                    new_pil_image = ImageChops.offset(new_pil_image, shift_x, shift_y)
+                
+                # Re-encode and use this as the new initial image for the next iteration
+                eng.convert_image_to_init_image(new_pil_image)
+                eng.configure_optimizer()
 
             if verbose:
                 # display some statistics about how the GAN training is going whever we save an interim image