fast-clip-research

For each of these tests, each document consists of a single field (text or image).

The images were locally hosted on a Python image server.

EC2 Instance

Add_Documents() Time

Model Name	Image Indexing Time (CBS = 100)	Text Indexing Time (CBS = 100)	Image Indexing Time (CBS = 50)	Text Indexing Time (CBS = 50)	Image Indexing Time (CBS = 10)	Text Indexing Time (CBS = 10)	Image Indexing Time (CBS = 1)	Text Indexing Time (CBS = 1)
Vit-B/32 *	18	7	19	8	26	14	70	65
fast/Vit-B/32 **	17	6	36	8	44	14	80	80
Vit-L/14	74	9	74	11	80	15	129	65
fast/Vit-L/14	58	9	410	10	420	28	500	139
openclip/Vit-L/14	76	11.8	78	13	89	22	220	14
opencv/Vit-L-14/cuda	73	9	77	11	88	15	218	65
opencv/Vit-L-14/trt todo	73	9	77	11	88	15	218	65
onnx/ViT-L/14	64	9	60	10	71	28	226	139

For onnx/ViT-L/14, there is a converging process in the processing speed. The indexing time starts from 150m/per doc and converges to 64ms/per doc after 40 batches.

Note:

CBS == client_batch_size

Inference Speed

Models	Time cost	Difference	Comments
ViT-L/14	18.6 ms ± 60.2 µs	N/A	The inference speed is super fast in this unit test
open-clip/ViT-L/14	66.9 ms ± 435 µs	N/A	This is a more reasonable speed on pytorch
cuda:onnx/ViT-L/14	55.7 ms ± 166 µs	9e-6	Using clip_onnx package
tensorrt:onnx/ViT-L/14	47.7 ms ± 639 µs	9e-6	The environment is really unstable，it has very strict requirements on onnxruntime, cuda, tensorrt version
TorchDynam	21 ms ± 234 µs	N/A	Basicly this is just another version of onnx or tensorrt, so it is not helping, link
kernlai			It requires python>3.9 and gpu capability > 8, g5 instancem maybe, link

Preprocessing Speed

TRANSFORMS	TIME (ms) (PNG File with size = (2162, 762))	TIME (ms) (JPG File with size = (640, 425))
original_clip	27.4 ms ± 94.8 µs	4.39 ms ± 15 µs
our_clip_implementation	27.4 ms ± 49.8 µs	4.4 ms ± 16.8 µs
opencv_based	4.8 ms ± 194 µs	1.08 ms ± 3.02 µs
script_based	11.8 ms ± 51.2 µs	2.26 ms ± 21.1 µs
rgb_conversion	18.4 ms ± 28.4 µs	4.47 ms ± 13 µs
grey_conversion	12.7 ms ± 15.5 µs	3 ms ± 60.1 µs
read_from_cv	672 µs ± 143 µs	652 µs ± 70.4 µs

Performance

Model Name	Text-to-image score (single-label)	Text-to-image score (double-label)	Text-to-image (trible-label)	Image-to-text score	Image-to-Image score
Vit-B/32	92.5	78.75	46.7	91	good
Vit-L/14	97.5	82.5	52.3	91	good
fast/Vit-B/32	97.5	72.5	48	88	good
fast/Vit-L/14	90	81.25	52.3	88	good
openclip/Vit-L/14	97.5	82.5	52.3	91	good
opencv/Vit-L-14	90	81.25	52.3	88	good
onnx/ViT-L/14	97.5	82.5	52.3	91	good

Inference Speed breakdown

Time Break Down for function `vectorise( )`

For the pytorch “Vit-L/14” ,

if we load the model from “cpu”, which is float32

INFO:marqo.s2_inference.s2_inference:The client gives 1 documents to vectorise

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to load all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to preprocess all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It take about 0.011s to encode all images. The average time for each image is 0.011s

INFO:marqo.s2_inference.clip_utils:It takes 0.049s to convert the output with float32 to ndarray from cuda

INFO:marqo.s2_inference.s2_inference:It take about 0.071s to vectorise all documents. The average time for each document is 0.071s

if we load the model from “cuda”, which is float16

INFO:marqo.s2_inference.s2_inference:The client gives 1 documents to vectorise

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to load all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to preprocess all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It take about 0.012s to encode all images. The average time for each image is 0.012s

INFO:marqo.s2_inference.clip_utils:It takes 0.004s to convert the output with float16 to ndarray from cuda

INFO:marqo.s2_inference.s2_inference:It take about 0.026s to vectorise all documents. The average time for each document is 0.026s

np.abs(np a - np b).sum(). 0.13

if we load the model from “cpu” but cast it to float16

INFO:marqo.s2_inference.s2_inference:The client gives 1 documents to vectorise

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to load all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It takes about 0.005s to preprocess all images. The average time for each image is 0.005s

INFO:marqo.s2_inference.clip_utils:It take about 0.011s to encode all images. The average time for each image is 0.011s

INFO:marqo.s2_inference.clip_utils:It takes 0.051s to convert the output with float16 to ndarray from cuda

INFO:marqo.s2_inference.s2_inference:It take about 0.072s to vectorise all documents. The average time for each document is 0.072s

M1 Mac

Add documents part

We test the time of adding a document into index under different batch sizes.

Timing Test (On M1 Mac Device = "CPU")

CBS_ = Client_Batch_Size

Model Name	Image Indexing Time (CBS = 50)	Text Indexing Time (CBS = 50)	Image Indexing Time (CBS = 10)	Text Indexing Time (CBS = 10)	Image Indexing Time (CBS = 1)	Text Indexing Time (CBS = 1)
Vit-B/32 *	64	41	66	64	117	171
Vit-L/14	335	55	345	61	672	128
fast/Vit-B/32 **	36	22	44	27	80	80
fast/Vit-L/14	410	41	420	48	500	95
openclip/Vit-L/14	295	52	306	63	360	105
opencv/Vit-L-14	280	49	285	66	347	105
onnx/ViT-L/14	426	41	636	58	488	91

Performance

Model Name	Text-to-image score (single-label)	Text-to-image score (double-label)	Text-to-image (trible-label)	Image-to-text score	Image-to-Image score
Vit-B/32	92.5	78.75	46.7	91	good
Vit-L/14	97.5	82.5	52.3	91	good
fast/Vit-B/32	97.5	72.5	48	88	good
fast/Vit-L/14	90	81.25	52.3	88	good
openclip/Vit-L/14	97.5	82.5	52.3	91	good
opencv/Vit-L-14	90	81.25	52.3	88	good
onnx/ViT-L/14	97.5	82.5	52.3	91	good

*Vit-B/32 and Vit-L/14 are openai implementations of clip.

**fast means the model is using opencv preprocessing and using onnx model to inference

Fastclip, with opencv preprocessing and onnx model, can reduce the preprocessing time of model ViT-B/32 without losing performance.

However, onnx model is even increasing the inference time for ViT-L/14

Opencv will affect the performance a littile bit but the results are still acceptable.

Preprocessing:

This section compares different image preprocessing methods.

TRANSFORMS	TIME (ms)	PROCESSED DIFF (mean)	ENCODE DIFF (mean)
original_clip	14.6	0.0	0.0
our_clip_implementation	14.7	0.0	0.0
opencv_based	4.67	1.22	0.19
script_based	8.07	0.037	0.0526
rgb_conversion	12.1	0.031	0.0475
grey_conversion	5.33	0.053	0.121
read_from_cv	0.940	1.22	0.19

Inference:

Models	Time cost	Comments	Links	Difference
ViT-B/32	7.76 ms ± 127 µs	N/A	N/A	N/A
onnx/ViT-B/32	4.16 ms ± 152 µs	Using clip_onnx package	link	9e-6
open_clip/ViT-B-32/openai	8.05 ms ± 104 µs	N/A	N/A	N/A
Pytorch Dynamic Quantization	N/A	Does not support GPU (support CPU)	link	N/A
Neural Magic	N/A	Does not support GPU (support CPU)	link	N/A
DeepSpeed	N/A	Can’t get it work on my windows	link	N/A
Optimized onnx	4.12 ms ± 152 µs	No difference between onnx	link	9e-6

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Preprocessing_Time.ipynb		Preprocessing_Time.ipynb
README.md		README.md
Vit-b-32-coco-2017.ipynb		Vit-b-32-coco-2017.ipynb
Vit-l-14-coco-2017.ipynb		Vit-l-14-coco-2017.ipynb
fast-Vit-b-32-coco-2017.ipynb		fast-Vit-b-32-coco-2017.ipynb
fast-Vit-l-14-coco-2017.ipynb		fast-Vit-l-14-coco-2017.ipynb
onnx-Vit-l-14-coco-2017.ipynb		onnx-Vit-l-14-coco-2017.ipynb
open-clip-Vit-l-14-coco-2017.ipynb		open-clip-Vit-l-14-coco-2017.ipynb
opencv-Vit-l-14-coco-2017.ipynb		opencv-Vit-l-14-coco-2017.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fast-clip-research

EC2 Instance

Add_Documents() Time

Inference Speed

Preprocessing Speed

Performance

Inference Speed breakdown

Time Break Down for function `vectorise( )`

For the pytorch “Vit-L/14” ,

if we load the model from “cpu”, which is float32

if we load the model from “cuda”, which is float16

if we load the model from “cpu” but cast it to float16

M1 Mac

Add documents part

Timing Test (On M1 Mac Device = "CPU")

Performance

Preprocessing:

Inference:

About

Releases

Packages

Languages

pandu-k/fast-clip-research

Folders and files

Latest commit

History

Repository files navigation

fast-clip-research

EC2 Instance

Add_Documents() Time

Inference Speed

Preprocessing Speed

Performance

Inference Speed breakdown

Time Break Down for function vectorise( )

For the pytorch “Vit-L/14” ,

if we load the model from “cpu”, which is float32

if we load the model from “cuda”, which is float16

if we load the model from “cpu” but cast it to float16

M1 Mac

Add documents part

Timing Test (On M1 Mac Device = "CPU")

Performance

Preprocessing:

Inference:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Time Break Down for function `vectorise( )`

Packages