This repository has the production grade code service built for Face Clustering on cropped Celebrities Face Dataset, obtained from Pinterest.
The dataset is available on Kaggle. Here is the Dataset Link !
- Downloading the data from Kaggle, consider whole data or a subset of data for the project, completely your choice. I considered a subset of data due to computing limitations.
- Building input_data.py for the input processing (creating a sample dataset).
- Generate 128 dimensional face encodings using face_recognition library, for the cropped face data. Saving the encodings in a pickle file.
- Create face clusters using the already generated facial embeddings. Create face clusters montages as well. Save the cluster results into unique face folders with a label id.
- After cloning the repo, you would need to make changes in the config yaml files. directory/file paths changes are mandatory, wherein you would use your custom paths. Other changes in yaml are optional, you can play with those.
- You can ignore/delete the input_data.py file, if you are going to consider whole dataset.
- If you want a subset of data from the whole dataset, you need to make minor changes in the input_data.py file, where I've hardcoded first 100 images from the whole dataset.
- This service is built considering different environemnts in the mind, like local, dev, staging, prod. If you don't need environment related code, simply remove the code where env_value is mentioned. This is mentioned at a lot of place. Also, a considerable amount of code will be removed from init() function, which is defined at the very starting of each class.
- The datset is not uploaded in this repository as it's size is huge. You can download the dataset from the above given Kaggle's link.
git clone https://github.com/sachelsout/celebrities_face_clustering.git
pip install -r requirements.txt
Here is one of the yaml files. This is for input. You will need to make changes in the source and destination directories by inserting your custom folders' paths. Similarly, you would need to make changes in other yaml files as well, where you need to have your custom files/folder paths.
local:
SOURCE_DIR:
- name: SOURCE_DIR
value: 'E:/face_clustering/105_classes_pins_dataset'
DESTINATION_DIR:
- name: DESTINATION_DIR
value: 'E:/face_clustering_service/src/DATASETS/sample_dataset'
NUMBER_OF_IMAGES:
- name: NUMBER_OF_IMAGES
value: "10"
dev:
SOURCE_DIR:
- name: SOURCE_DIR
value: 'E:/face_clustering/105_classes_pins_dataset'
DESTINATION_DIR:
- name: DESTINATION_DIR
value: 'E:/face_clustering_service/src/DATASETS/sample_dataset'
NUMBER_OF_IMAGES:
- name: NUMBER_OF_IMAGES
value: "10"
Here while running the app, do mention the env_value as well. env_value is environment value, the environment in which you are going to run the service. e.g. local env, dev env, staging env, prod env.
python3 app.py -e <env_value>
Here is the sample montage of clustered faces of a celebrity. It's clearly visible, the service is able to cluster the faces successfully. There are some faces which were unable to cluster, those are stored in a folder with id label as '-1'.