POC for similarity search by abstract features.
An image vectorizer (a convolutional neural network which was trained to classify genre/style of paintings) is used to extract different vector representations from approx 75'000 paintings.
The image metadata and extracted vectors are indexed in elasticsearch.
Clients can search for similar paintings by letting elasticsearch compute vector similarity using either cosine similarity or l2 normalization.
-
Download paintings-by-numbers from kaggle
-
Create a volume with the downloaded paintings
docker volume create -d local --opt device=~/art-classification/painter-by-numbers/train --opt o=bind --opt type=none paintings
-
create a topic for the images:
docker exec -it kafka /bin/bash kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 2 --topic paintings
-
Create the Elastic Connector for the paintings topic
curl -X POST -H "Content-Type: application/json" -d @paintings.connector.json localhost:8083/connectors
-
Check the status with
curl http://localhost:8083/connectors/elastic-paintings-connector/tasks/0/status
-
Create the schema mapping for the Elasticsearch index with
curl -X PUT "localhost:9200/paintings?pretty" -H 'Content-Type: application/json' -d @paintings.mapping.json
-
Run the python script to process all images and submit image representations in kafka
The sink connector will ensure that processed records are store din the previously created elasticsearch index -
Run the gallery-backend (
mvn spring-boot:run
) -
Run the gallery-frontend (
nvm use && npm install && npm start
) -
Open http://localhost:4200/ to see the result