Skip to content

Utility to compress images on the server with nearly no downtime through duplication

License

Notifications You must be signed in to change notification settings

LucaMozzo/WebServerImageCompressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Compressor for Web Servers

Screenshot example

This program has been created to reduce the size on web servers to increase the performance of the page loading. Despite my efforts to make the program as generic as possible, I must admit that it's been designed around my need to batch-compress all the images in one of my customers' Prestashop store.

How it works

The program duplicates the entire tree of the filesystem (starting from the specified path), and converts all the images in the subfolders, keeping the same folder structure.

The idea is that after running the program you could effortlessly switch to the new folder tree with compressed images with nearly no downtime (the time to rename the folder).

Usage

  1. Clone the repo on your web server git clone https://github.com/LucaMozzo/WebServerImageCompressor.git
  2. Enter the folder cd WebServerImageCompressor
  3. Install the libraries:
    python3 -m pip install console-progressbar
    python3 -m pip install PIL
  4. Run e.g. python3 compress.py --source ~/source --output ~/destination --quality 70 --logs ~/failures.log
Argument name Required Description
--source Yes The base directory where the images are
--output Yes The base directory where the compressed images will be saved
--quality Yes A value 1-100 of the output quality, where 100 is the current quality (no compression)
--logs No The file where to write the failures
--threads No The number of threads to use to compress. Defaults to 10

Example of application on a Prestashop store

Prestashop stores the product images in the folder img/p/. So let's assume our prestashop installation is in /var/www/html/prestashop.

We would run the script

python3 compress.py --source /var/www/html/prestashop/img/p/ --output /var/www/html/prestashop/img2/p/ --quality 70 --logs ~/failures.log

Then check the failed images in the output logs file and make adjustments as needed.

To switch between the current images and the compressed ones, we make a folder name swap

mv -r /var/www/html/prestashop/img/ /var/www/html/prestashop/img_old/ && mv -r /var/www/html/prestashop/img2/ /var/www/html/prestashop/img/

Now the original images will be in the folder img_old and the compressed ones in img and will be used by Prestashop for future requests.

Performance considerations

One of the parameters that you can specify is the number of threads. The number of threads needs to be considered carefully before running the script.

More threads =/= less time to complete

Creating a thread has an overhead, so this overhead needs to be worth the effort. For example (using random numbers here) if creating a thread takes 1ms and the operations to be executed also take 1ms, you're probably better off performing those operations sequentially. What I'm saying here is that the time you save by parallelizing the work should be higher than the time spent scheduling the threads.

The experiment

In this section I approach this problem experimentally. I have a folder with multiple subfolders, which ultimately contain 5772 images stored on a HDD. The total size of those images is ~105MB, and their size is variable (from 64x64 to 1000+x1000+).

I then ran the script with multiple number of threads and plotted the execution time against the number of threads and here's the result:

Performance chart

It's clear that going past 15 threads is not worth it in this case, as the time taken is the same, if not higher.

This chart has been made with the average of 3 runs for each number of threads, the data is in the table below:

Number of threads Average run time (s)
1 66.344
2 40.107
3 37.052
5 36.959
10 36.694
15 26.149
20 29.735
30 26.316
40 29.288
50 28.023
60 24.855
85 24.028
100 26.974

It must also be said that the run time is variable. The standard deviation of 5 data points is 2.1663 (relative standard deviation RDS=7.76%)

CPU & Disk utilisation

As you would expect, more operations done in "parallel" mean higher resource utilisation, and more importantly, less resources available for servicing incoming requests (more threads to be scheduled = less CPU time for each of the threads). This means that running this script on a production server with limited resources will slow down the response time, the extent of which has not been measured (as it's extremely variable).

About

Utility to compress images on the server with nearly no downtime through duplication

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages