Computer Vision

This repository contains a collection of links to my repositories showcasing implementations of Computer Vision models in Python. It features several basic models using Convolutional Neural Networks (CNNs). CNNs are a type of neural network designed to process grid-structured data like images and are particularly effective at recognizing visual patterns by capturing local features through convolutions.

Additionally, it includes models applying Transfer Learning, a technique that reuses pre-trained models on large datasets, allowing for high performance with lower computational costs. This technique leverages the knowledge acquired by models trained on millions of images or videos for specific new tasks, enhancing efficiency and accuracy.

Furthermore, several notebooks are implemented where Vision Transformer (ViT) models are fine-tuned. ViTs use attention mechanisms to process images as a whole, outperforming state-of-the-art convolutional networks and requiring fewer computational resources for training. These models excel at capturing long-range dependencies in input data, resulting in a better understanding of the global structures in images.

What is Computer Vision?

Computer Vision is a field of artificial intelligence that enables machines to interpret and understand the visual world through images and videos. It employs algorithms and deep learning models to perform tasks such as object recognition, action detection, and image segmentation. Today, it is a crucial technology in various applications, from autonomous vehicles to medical diagnostics and surveillance systems.

Implemented Models

The following are the Computer Vision models I have implemented to date:

Image Classification: This task involves assigning a label or class to a whole image. The input consists of pixel values that make up an image, whether in grayscale or RGB, and the goal is to predict the class to which the image belongs. Key use cases include medical image classification, categorizing photos on social media, and detecting products in inventories.
Object Detection: Object detection models identify and locate instances of objects, such as cars, people, buildings, animals, etc., in images or videos. They return bounding box coordinates along with class labels and confidence scores for each detected object. Current use cases include security surveillance, autonomous driving, and augmented reality.
Image Segmentation: Image segmentation classifies each pixel in an image into a specific category or instance of a category. This task is divided into three types:
- Semantic Segmentation: Assigns a class label to each pixel, without distinguishing between different instances of the same class.
- Instance Segmentation: Labels each pixel and differentiates between individual instances of the same class, producing masks that outline each object with class labels and confidence scores.
- Panoptic Segmentation: A combination of semantic and instance segmentation, where all pixels in the image are classified, providing detailed segmentation of complex scenes.
Important use cases of image segmentation include segmenting organs in medical images, identifying specific areas in satellite images, and object segmentation for robotics and autonomous vehicles.
Video Classification: This task involves assigning a label or class to a video. Models process video frames and generate the probability of each class being represented. Important use cases are activity detection in security, multimedia content classification, and sports analysis.
Image Captioning: This multimodal task combines Computer Vision and Natural Language Processing (NLP) to generate textual descriptions of images. These models are useful for describing images on social media platforms, improving accessibility for visually impaired individuals, and indexing images for search engines.