-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Google Summer Of Code
Spend your summer doing something exciting and valuable for the open-source community, and join Google Summer of Code. Read more about how the program works on this page.
OpenVINO Toolkit has been a mentoring organization since 2022!
Please subscribe this discussion and check it regularly for important announcements.
We require one pull request sent to our OpenVINO repository from each potential GSoC contributor before accepting participation for GSoC. We would like to see if you know how to code, use git and GitHub, and your coding style. To fulfill this requirement, please:
- Visit the OpenVINO Good First Issues board or Anomalib Good First Issues.
- Select one of the unassigned tickets ("Contributors Needed" column) and ask for the assignment.
- Discuss the solution with the OpenVINO developers.
- Implement it according to the OpenVINO contribution guide or Anomalib contribution guide.
- If you encounter any issues talk to our developers on Discord.
- Create a new pull request with your work.
- Wait for the review and eventual merge.
Please note the above task is mandatory. However, we reserve the right to review and merge only the selected PRs. Not merging or closing your PR doesn't change your chances of being accepted for GSoC. Due to the expected large number of requests, the review process can be delayed, so please be patient.
If you're unfamiliar with git and GitHub, check out this blog. The blog is about contributing to OpenVINO core project, but the workflow is same for all projects.
Your application should consist of the following parts:
- About you
- Your full name
- Your university/current enrollment
- The timezone you live in
- Short bio
- Your experience in programming (especially C++ and Python)
- Your experience in ML and DL
- About the project
- What is your choice?
- Why did you choose this specific idea?
- How much time do you plan to invest in the project?
- Provide an abstract of the solution
- Provide a detailed timeline of how you want to implement the project (include the main points you want to cover and dates)
- General questions
- How do you know OpenVINO?
- What do you know about OpenVINO?
- Have you already contributed to the OpenVINO project? (please include links)
- How could you apply it to your professional development?
- Describe any other career development plan you have for the summer in addition to GSoC.
- Why should we pick you?
- Tasks
- Link to your pull request (for the prerequisite task – the top part of this document), even if it is already merged or closed
Proposal examples can be found here and here. Please get in touch with us early to discuss your application with the mentor.
Short description: Neural Language Model can work locally without the internet. You will write your own Chatbot desktop cross-platform application using OpenVINO and Electron (or analogs). The Chatbot may be general or crafted to your needs (subject to the NLP model).
Expected outcomes:
- Desktop Chat-Bot application works without the internet
- Project uses an NLP model
- Project uses OpenVINO in Electron environment
Skills required/preferred: JavaScript, Electron
Mentors: Nikolai Vishniakov, Alicja Miłoszewska
Size of project: 175 hours
Difficulty: Medium
Short description: OpenVINO has supported WASM build, but there are hundreds of thousands of CPU plugin functional test cases that cannot run based on WASM build due to there are many native issues that need to be resolved, such as multi-threads, local file access, dynamic libraries, memory sharing. We propose talents to participate in this activity to enable more WASM tests of the OpenVINO CPU plugin.
Expected outcomes:
- Learn to build and apply OpenVINO WASM libraries with a C/C++ project
- Enable more than 80% CPU plugin C++ functional tests based on WASM
Skills required/preferred: C++, JS, WASM and Emscripten
Mentors: River Li, Xuejun Zhai
Size of project: 350 hours
Difficulty: Medium to hard
Short description: OpenVINO adopts conditional compilation to optimize package binaries size, which has achieved considerable binarie size reduction. But conditional compilation has a limitation - it is device dependent, that to say the conditional compilation packages generated in one CPU platform cannot always run on another different CPU platform. We encourage talents to research a solution to support device agnostic condition compilation(permit slight binaries size increase), which can support the conditional compilation packages generated in one CPU platform and can run on other Intel CPU platforms.
Expected outcomes:
- Learn conditional compilation and provide more improvement as possible
- Enable device agnostic conditional compilation with binaries size increasing < 5%
Skills required/preferred: C++
Mentors: River Li, Xuejun Zhai
Size of project: 175 hours
Difficulty: Medium
Short description: Quantization, a widely adopted technique for reducing model size and accelerating model inference, often leads to a slight decrease in accuracy compared to the original floating-point model. However, in some cases, quantization can introduce significant performance degradation. Identifying the root cause of these accuracy drops can be a difficult task, and often even more challenging than fixing it. In this project, we will develop an analytic tool for analyzing quantization errors in the quantized models using NNCF. We will provide a user-friendly interface for inspecting quantization errors, calculating statistics and metrics such as MSE, and SQNR to identify the layers with the most significant errors, and visualizing them together with the model in Netron. We will also create a tutorial with recommendations for optimizing the quantization process to improve model accuracy based on collected analytics.
Expected outcomes:
- Pull request with the implementation of the quantization analytics tool in NNCF.
- A tutorial with recommendations for optimizing the quantization process to improve model accuracy based on analyzing quantization errors.
Skills required/preferred: DL basics, understanding of ML model optimization, Python programming
Mentors: Alexander Suslov, Andrey Churkin
Size of project: 350 hours
Difficulty: Medium
Short description: The goal of this project to accelerate pytorch-based frameworks like PyTorch Lightning and ComfyUI by leveraging torch.compile OpenVINO backend. PyTorch Lightning provides a structured and modular framework for developing deep learning models, particularly tailored for tasks such as image classification, natural language processing, and reinforcement learning. ComfyUI is a popular stable diffusion WebUI framework for image generation workflows. In this project, you will create optimized PyTorch Lightning Modules and ComfyUI Stable diffusion models by utilizing torch.compile OpenVINO backend for inference. Optimized models should demonstrate performance improvements on Intel CPUs and GPUs over native PyTorch execution.
Expected outcomes:
- For PyTorch Lightning, the custom Lightning modules should be created based on existing transfer learning samples with AutoEncoder, Resnet50 and BERT models. These modules should be accelerated using torch.compile OpenVINO backend to improve the inference performance. (Existing examples: https://lightning.ai/docs/pytorch/stable/advanced/transfer_learning.html)
- For ComfyUI, the workflows for text to image, image to image, and inpainting should be accelerated with torch.compile OpenVINO backend. (ComfyUI examples: https://comfyanonymous.github.io/ComfyUI_examples/)
- The performance improvements over native PyTorch on inference should be demonstrated using Intel CPUs and GPUs.
- The accuracy of each implemented module should be confirmed. The output of the inference from each model should be matching with the output of the native PyTorch implementation.
- Each implemented module or workflow should support configurations like setting OpenVINO device name and enabling model caching.
Skills required/preferred: Python, C++, PyTorch. Good to have: OpenVINO, PyTorch Lightning, Stable Diffusion, torch.compile feature
Mentors: Mustafa Cavus, Yamini Nimmagadda
Size of project: 350 hours
Difficulty: Medium to hard
Short description: The goal of this project is to enhance the compatibility and performance of popular deep learning models with the OpenVINO backend in torch.compile. Specifically, the focus will be on enabling four diverse models: Omni3D, AudioCraft, LLaVA and Code Llama. The project involves identifying and handling unsupported operations within these models, implementing necessary operations, and ensuring accuracy, all while optimizing performance through thorough testing and benchmarking. A contributing guide is available to facilitate collaboration.
Expected outcomes:
- At least 3 models among: Omni3D, AudioCraft, LLaVA and Code Llama, are functional with torch.compile OpenVINO backend.
- Develop a suite of unit test cases to validate the correct implementation and functioning of supported and newly implemented operations.
- Verify the accuracy of the models with the OpenVINO backend, comparing results with the Stock PyTorch version.
- Assess and benchmark the performance of the models with the OpenVINO backend to ensure optimization and efficiency gains.
Skills required/preferred: Python, C++, PyTorch. Good to have: OpenVINO, torch.compile feature
Mentors: Ravi Panchumarthy, Surya Pemmaraju
Size of project: 350 hours
Difficulty: Medium to hard
Short description: Flax/JAX is a new solution for training models, that provides much faster training than TensorFlow and PyTorch. So we should expect increase in amount of Flax/Jax models. However, OpenVINO currently supports PyTorch, TensorFlow, ONNX, PDPD models but it lacks of native support for JAX models. Not all JAX models can be exported to TensorFlow SavedModel format. So there exists of a problem for native and direct support for JAX models without intermediate format usage. The implemented functionality should include logic for parsing traced JAX objects, translators for conversion basic JAX operations into OpenVINO opset decompositions. The functionality should be relied and inherit common FE API so that it will be possible to convert JAX models using ovc.convert_model. The feature should be extendable in the future by others (OV team and open-source community) to support new JAX operations and models. The goal is to implement a prototype with basic functionality to support fundamental models (ResNet, BERT) trained with JAX.
As alternative choice, you can consider of adding support for Mindspore framework (https://github.com/mindspore-ai/mindspore) and its MIDIR format support. Mindspore is a new open-source deep-learning framework that collects popularity in papers-with-code trends.
Expected outcomes:
- showcases with supporting Flax/JAX (Mindspore) models (ResNet, Bert) by OpenVINO
- merge prototype into the master
- (optional) support models in Pynative mode for dynamic computation
Skills required/preferred: C/C++, Python
Mentors: Roman Kazantsev, Maxim Vafin, Andrei Kochin
Size of project: 175 hours
Difficulty: Hard
Short description: The goal of this project is to provide a unified suite for OpenVINO Vision-Language model performance benchmarking for both uni-modal and joint-modal tasks. The benchmarking suite should include the following functions:
- Be able to load and convert the specific models from Huggingface: both PyTorch and OpenVINO IR models should be supported and probably compressed model as well (INT8)
- Provide an accuracy checker for the model accuracy check
- Performance benchmarking scripts that work for unimodal/joint-modal tasks
Resources:
- Models:
- Dataset:
- VQA: https://huggingface.co/tasks/visual-question-answering
- STS: https://huggingface.co/tasks/sentence-similarity
- Dataset: Public Multimodal Dataset (PMD)
- OpenVINO example for the CLIP model: https://docs.openvino.ai/2023.3/notebooks/228-clip-zero-shot-convert-with-output.html
Expected outcomes:
- A Github repo as the code base
- The repo should including the scripts for the above functionalities and work for at least two of the models, probably CLIP and FLAVA
- Works for at least one task for each of the following modals
- Vision tasks: zero-shot image classification for COCO dataset
- Language tasks: need to identify which one would be the most appropriate, example task such as sentence similarity. Please refer to GLUE benchmark: https://gluebenchmark.com/
- Joint-modal vision and language tasks: need to identify which one would be the most appropriate, for example, visual question answering
- A blog that provides an overview and step-by-step demo
Skills required/preferred: Good understanding of transformer models and vision-LLM models. Has experience working with Huggingface LLM models is preferred.
Mentors: Junwen Wu, Helena Kloosterman
Size of project: 350 hours
Difficulty: Medium
Short description: Anomalib currently only contains visual anomaly detection algorithms that operate in the image- and video domain. However, the principles of anomaly detection can be applied to other domains, such as audio, as well. The goal of this project is to add support for anomaly detection in 1D (time-series) data, such as audio signals. The following components would need to be added/modified to achieve time-series anomaly detection support:
- Pytorch-Lightning compatible dataset adapters for reading 1D-data
- At least 1 fully functional time-series anomaly detection model
- Metrics and visualization utilities for qualitative and quantitative evaluation of the model’s performance.
Expected outcomes: Time-series data adapters, 1D anomaly detection model, metrics and visualization utilities
Skills required/preferred: Basic ML knowledge, Signal processing basics, Python
Mentors: Dick Ameln, Samet Akcay, Ashwin Vaidya
Size of project: 350 hours
Difficulty: Medium to hard depending on the steps the participant works on
Short description: Anomalib is a Python library designed to facilitate visual anomaly detection research, where the task is to identify anomalous, or abnormal images in a dataset. While Anomalib’s models only need normal images during training, some examples of anomalous images are still needed in the validation and testing stages for threshold selection and model evaluation. In real-world use-cases however, anomalous samples may not be available at training time, preventing accurate computation of these metrics.
The purpose of this project is to create a synthetic anomaly generation module within
Anomalib that will allow researchers to generate a wide variety of synthetic anomalies for the purpose of selecting appropriate thresholds and evaluating anomaly detection models.
The project will involve the development of algorithms that can introduce anomalies into normal datasets in a controllable and scalable manner. These synthetic anomalies may range from simple, like noise injection, to complex, like generating contextually out-of-place objects using advanced generative models such as diffusion models.
Technical details:
- Develop algorithms for generating different types of synthetic anomalies.
- Implement a user-friendly API within anomalib for generating and injecting synthetic anomalies into datasets.
- Ensure the module can be used with existing and future models within the Anomalib framework.
Expected outcomes:
- A synthetic anomaly generation module fully integrated into the anomalib library.
- A guide and examples on using the synthetic anomaly generation module.
- A benchmarking report showcasing the effectiveness of different types of synthetic anomalies.
Optional Outcomes:
- A potential research paper that introduces the module and its application in anomaly detection research.
Skills required/preferred:
- Proficiency in Python and experience with deep learning libraries such as PyTorch Lightning.
- Solid understanding of machine/deep learning and basic understanding of generative models.
- Understanding of anomaly detection principles and experience with anomalib is a plus.
Mentors: Samet Akcay, Dick Ameln, Ashwin Vaidya
Size of project: 350 hours
Difficulty: Hard
Short description: We are currently preparing a community plugin for OpenVINO inference that would take specially prepared IR models known to correspond to llama.cpp-supported architectures, parse them for weights and other GGML-required parameters and defer actual inference to the llama.cpp/GGML executors. The initial version of the plugin will only be limited to standard GGML inference without HW-specific acceleration. Since in its current state the llama.cpp/GGML executors already support HW-accelerated inference as well (via CUDA, and more recently, SYCL), the natural extension of this approach would be to make the community plugin accept options pertaining to the actual GGML backend selection. In this fashion a user flow would be enabled that would allow the developers to execute certain IR-format models with llama.cpp/GGML instead of regular OpenVINO plugins, while still using the familiar OpenVINO APIs such as ov::Model and ov::InferRequest, since the dispatching of the loads to the llama.cpp/GGML would be abstracted under the OpenVINO plugin system and the implementation of the future plugin.
Expected outcomes: The community plugin should provide an API and implementation to enable the selection of the actual GGML backend performing inference. The new functionality should be sufficiently covered by unit tests. The build process of the plugin should be adjusted if necessary, as well as CI checks in the plugin’s Github repository.
Skills required/preferred: C/C++, GTest, OpenVINO API, Github Actions
Mentors: Vasily Shamporov, Alexander Kozlov
Size of project: 175 hours
Difficulty: Medium
Short description: The goal of this project is to evaluate and compare the performance of PyTorch's torch.compile OpenVINO backend against the Stock PyTorch using TorchBench, a collection of open-source benchmarks for evaluating PyTorch performance. The benchmarking will be performed on various EC2 CPU instances using an automated manner, leveraging either the EC2 API or Amazon Sagemaker. The benchmarking script/methodology should be reproducible, allowing the community to reproduce and customize.
Expected outcomes:
- A GitHub repo with automated benchmarking scripts for evaluating performance on various EC2 CPU instances. The benchmarking script should include running the TorchBench benchmarks, collecting performance metrics, and comparing the results for the torch.compile OpenVINO backend versus Stock PyTorch.
- A comprehensive report containing performance metrics, cost metrics, and test outcomes, highlighting key insights.
- A comparison report of the performance and cost-effectiveness of the OpenVINO backend versus the Stock backend on various EC2 instances which helps in identification of the optimal EC2 instance for running PyTorch workloads with OpenVINO backend.
Skills required/preferred: Python, PyTorch, OpenVINO, Docker, EC2 API or Amazon SageMaker
Mentors: Ravi Panchumarthy, Surya Pemmaraju
Size of project: 350 hours
Difficulty: Medium
Short description: People use messengers daily not just for communication, but also for reading news and gathering information on a variety of topics by subscribing to channels. For any popular messenger implement a Desktop AI-Assistant for AI PC, which can read messages from a specified time interval and use Retrieval-augmented Generation (RAG) to enhance the local Language Model (LLM) with this private data, providing useful information such as a daily digest. Users should be able to interact with the OpenVINO Messenger AI Assistant to ask questions related to any discussions extracted from the messenger.
Expected outcomes:
- A standalone desktop application capable of retrieving messages from popular messaging platforms, such as by using API access.
- The project incorporates OpenVINO, utilizing a local Language Model (LLM) and Retrieval-augmented Generation (RAG) technique, running on AI PC integrated GPU.
- The application features a user interface that allows interaction with the local LLM to generate valuable output.
Skills required/preferred: Python or C++, LLMs, RAG, UI/Qt
Mentors: Ria Cheruvu, Dmitriy Pastushenkov
Size of project: 350 hours
Difficulty: Medium
The projects below have already been implemented and are here just for reference. Don't select any project below for your application.
Short description: In most cases, DL model optimization requires the presence of real data that the user should provide for the optimization method (e.g. quantization or pruning). However, it was noticed that such methods can perform quite well on synthetic data which are not even relevant to the use case. In this task, we will create a synthetic GAN or VAE-based DL model that is capable of generating synthetic images based on the text hints provided by the user. This data will be used to evaluate the 8-bit post-training quantization method on a wide range of Computer Vision models.
Expected outcomes: DL model that generates synthetic data based on the text input.
Skills required/preferred: DL model training, understanding of GAN, VAE architectures
Mentors: Mansi Sharma
Size of project: 350 hours
Difficulty: Medium
Year: 2022
Implemented by: ThanosM97
The result: Code | Article 1 | Article 2 | Technical report
Short description: ARM CPUs support has been added to Inference Engine via the dedicated ARM CPU plugin. ARM processors are widely used in Android smartphones, so we want to develop an Android demo application that demonstrates plugin possibilities on this platform. The demo should be written in Java and use Java wrappers to reach Inference Engine public API. We suggest reviewing the functionality of OMZ object detection demo and propagating its core functionality to the Android demo.
Expected outcomes: Android demo application with object detection functionality. Any model (or several models) from the list supported by the plugin could be used.
Skills required/preferred: Practical experience with developing Java apps for Android, at least a basic understanding of computer vision and deep learning technologies, enough to run a network and make it run at real-time speed
Mentors: Junwen Wu
Size of project: 350 hours
Difficulty: Medium
Year: 2022
Implemented by: IRONICBo
Short description: The world is 3D, but in AI development, we often work on flat 2D displays with flat data visualization plots and charts. In machine learning, many of the tasks may be much better understood if we provide a 3D or 4D (space and time) perspective. In this project, we will answer this question by providing the beginner-friendly Jupyter 3D engine for machine learning visualization along with WebGL integration. Our main goal is not only to make 3D or 4D datasets easier to visualize, but also to make machine learning easier to understand in a more humanistic way.
Expected outcomes: Working 3D support for Jupyter Notebooks running OpenVINO (AI inference). For example, visualizing 3D body pose and characters, visualization 2D-3D mapping, and also providing a clean interface to set up these without a 3D engine or graphics programming background.
Skills required/preferred: Understanding of graphics pipelines (GPU programming is a plus), Software Engineer background and code releasing, Python, C++
Mentors: Raymond Lo
Size of project: 350 hours
Difficulty: Medium to hard
Year: 2022
Implemented by: spencergotowork
Short description: DL model inference performance has been a hop trend in recent years aimed at bringing AI into real-world applications. And OpenVINO is a famous DL inference solution that delivers best-in-class performance on Intel Architectures. In this task, we will provide a Jupyter notebook/tutorial where we showcase inference performance on the set of popular DL models that come from the PyTorch Image Models project on GitHub (timm). The tutorial will provide details on installing OpenVINO, converting models to OpenVINO Intermediate Representation, and properly benchmarking under the various settings.
Expected outcomes: Pull-request with the tutorial into PyTorch Image Models.
Skills required/preferred: DL basics, understanding of ML model optimization
Mentors: Alexander Kozlov, Liubov Talamanova
Size of project: 175 hours
Difficulty: Easy
Year: 2023
Implemented by: sawradip
Short description: Spark NLP is an open-source NLP library in production that offers state-of-the-art transformers at scale by extending Apache Spark natively. It supports Python, R as well as JVM ecosystems (Java, Scala, and Kotlin). It is available on PyPI, Conda, and Maven and ships with many NLP features, pre-trained models and pipelines. Currently, CPU optimization is realized via Intel Optimized Tensorflow. The goal of this project is to add support for OpenVINO in Spark NLP, such that users can create and deploy optimized NLP models on Intel hardware through OpenVINO. The project will involve the following tasks:
- Explore OpenVINO for Java solution and use JNI to load and infer OpenVINO IR models in Java. Identify additional ops and configs or additional engineering work that is needed for this JNI implementation.
- Benchmarking on some representative pre-trained models.
This project would benefit the Spark-NLP developers to take advantage of OpenVINO's optimization capabilities on Intel hardware. It would also benefit Intel and the OpenVINO community by expanding the reach of OpenVINO to a wider range of use cases and developers.
Expected outcomes:
- Pull request to the Spark-NLP repo for OpenVINO integration
- Documentation and sample scripts to demonstrate how to use OpenVINO with Spark-NLP
- Summary on model/operator coverages for Spark-NLP OpenVINO integration (optional)
Skills required/preferred: Java knowledge, basic knowledge of OpenVINO framework, basic knowledge of Spark. Experience with NLP deep learning models, such as BERT etc.
Mentors: Junwen Wu, Ravi Panchumarthy
Size of project: 350 hours
Difficulty: Medium
Year: 2023
Implemented by: rajatkrishna
The result: Code | Article 1 | Article 2 | Article 3
Short description: Automatic industrial meter reading could help utility providers (such as gas or electricity providers) and manufacturers monitor the real-time status data from industrial meters and enable further diagnosis of possible anomalies without periodic trips to each physical location to read a meter. We have provided code examples of automatic meter reading using the PaddlePaddle DL framework for analog and digital meters. To extend the automatic meter reading solutions, we’d like to see solutions using the DL framework on TensorFlow or PyTorch and supporting multi-task input with GUI. The solution pipeline could follow our code examples on the OpenVINO Notebooks repository.
Expected outcomes: In the first stage, we expect a demo repository including testing datasets, demo codes, and test results for metering reading with OpenVINO, using DL models from TensorFlow or PyTorch frameworks. The result should cover contributions to Open Model Zoo demos and can be represented as a notebook as well. We can follow the workflow and reuse the functions in 203-meter-reader and 405-paddle-ocr-webcam, and combine the new work with these notebooks.
In the second stage, this demo can be refined to support multiple camera inputs with GUI, making it more like a real solution for the industry. In this case, the features of OpenVINO, like async API and performance hints, can help improve multi-task workload performance. With GUI, we can demonstrate a live demo with some web/IP cameras for a better user experience. Tools like Flask, HTML, or Java Script can help us quickly create an interactive user interface.
Skills required/preferred: Python programming, knowledge of popular frameworks for DL - PyTorch or/and TensorFlow, Flask, HTML, Java Script
Mentors: Ethan Yang, Zhuo Wu
Size of project: 350 hours
Difficulty: Medium
Year: 2023
Implemented by: ashish-2005
The result: Report | Code | Article
Short description: Detecting small defects in high-resolution images can be a challenging task for anomaly detection models. Anomalib currently has a tiling mechanism in place to improve the detection capabilities for such datasets, which involves dividing the input images into a grid of tiles which are then processed separately by the model. A limitation of the tiling mechanism is that a single model is trained on all tile locations combined, leaving the approach ineffective for locally-aware models that require a fixed position and orientation of the objects in the images. For such models, an ensemble approach would be required. The idea is to train separate models for each of the tile locations and combine the predictions of the models in the post-processing stage. This project involves adding such an ensemble approach to tiling to the Anomalib library.
Expected outcomes: Data-to-prediction pipeline incorporating ensemble-based tiling
Skills required/preferred: ML basics, Python
Mentors: Dick Ameln, Samet Akcay
Size of project: 175 hours
Difficulty: Medium
Year: 2023
Implemented by: blaz-r
The result: Report | Code | Article
Short description: This project proposes novel evaluation metrics for anomaly segmentation in computer vision, taking into account pixel-level and spatial information. The aim is to provide a more comprehensive evaluation of anomaly segmentation algorithms, aiding researchers and practitioners in selecting and fine-tuning models. The first proposed metric is the False Positive Blob Relative Volume (FP-BRV), which accounts for the visual nuisance of false positive pixels, complementing the Per-Region Overlap (PRO). The proposed metric will be evaluated on popular anomaly segmentation public datasets and visually validated. Milestones: prototype implementation, testing and validation on public datasets, production implementation, optimization/unit testing/documentation, and research paper writing. See "Section 5 Detailed project proposal" in the PDF for details and a timeline.
Expected outcomes: A new metric for segmentation prediction
Skills required/preferred: ML basics, Python
Mentors: Samet Akcay, Dick Ameln
Size of project: 175 hours
Difficulty: Medium
Year: 2023
Implemented by: jpcbertoldo
The result: Report | Demo | Article | Paper
Short description: Automatic1111 is a powerful web user interface based on Gradio library specifically designed for Stable Diffusion. It’s most popular open-source Stable Diffusion WebUI on GitHub with 119K+ stars, which supports a lot of features like text-to-image, image-to-image, inpainting, Lora models, custom models from model hubs like civitai.com and huggingface etc. OpenVINO support for Automatic1111 enables Stable Diffusion run on Intel CPUs and GPUs, this solution is currently supported using a custom script. Implementing OpenVINO through the Automatic1111 extension will provide an easier way to use OpenVINO. This project will also aim to provide support for more AUTOMATIC111 features with OpenVINO. Task Description:
- Develop a built-in extension for Automatic1111 SD WebUI based on the existing OpenVINO custom script leveraging Diffusers library.
- Support some of the new features like Hires upscalers, new samplers, tiling, face restoration etc. Develop test scripts to evaluate these features.
- Evaluate with different Stable Diffusion variants (V1.5, V2.1, XL, LCM etc.) on Intel CPUs and GPUs.
- Optional: Evaluate compatibility with other extensions like ControlNet.
Expected outcomes:
- Raise a PR with all the contributions to the OpenVINO fork and eventually to the mainstream Automatic1111 repo.
- Documentation with clear description of features in the extension and demo videos.
- Medium/OpenVINO blogs.
Skills required/preferred: Python, good understanding of Stable diffusion architectures, experience with Hugging Face and Diffusers libraries, experience with PyTorch (OpenVINO is a plus), Git.
Mentors: Anna Likholat, Mustafa Cavus
Size of project: 350 hours
Difficulty: Medium to hard
Year: 2024
Implemented by: mengbingrock
The result: Code | Report | Article
Short description: There is a new OpenVINO NPM package. You can work with a neural model directly from your Node.js application now. We propose to rework existing samples that use Python API to Node.js API, or even implement new examples using OpenVINO JS API.
Expected outcomes:
- Several reworked samples from documentation
- Project uses a neural model to solve the task
- Project uses OpenVINO in Node.js environment
Skills required/preferred: JavaScript, Node.js, Python
Mentors: Nikolai Vishniakov, Alicja Miłoszewska
Size of project: 90 hours
Difficulty: Easy to medium
Year: 2024
Implemented by: qxprakash
The result: Code | Report | Article
Short description: Currently, openvino.genai has C++ image generation pipeline with Stable Diffusion and LCM, but only Text-to-Image is supported. This project would add Image-to-Image generation and Image-to-Text generation support with C++ cross platform GUI.
Expected outcomes:
- Enable Image-to-Image Generation with ControlNet Conditioning(Canny edge or OpenPose), based on the python OpenVINO notebook and C++ Text-to-Image pipeline.
- Enable Image-to-Text generation pipeline, based on the Python CLIP image classification OpenVINO notebook.
Skills required/preferred: C++ programming, knowledge of PyTorch and Qt Creator
Mentors: Su Yang, Fiona Zhao
Size of project: 175 hours
Difficulty: Medium
Year: 2024
Implemented by: chux0519
Short description:
OpenVINO adapters' goal is to provide a lightweight layer of Python code that enables easy switching between already used framework (PyTorch) and OpenVINO - targeting inference purposes. The goal is to showcase the potential performance benefits of using OV. The change on the user side should look like import torch --> import openvino_adapters.torch
and... that's it! (maybe some minor changes like excluding cuda()
calls:)) The project includes the whole development cycle: creating POC, productization of the solution, adding tests, making an installable package, and creating documentation.
Expected outcomes:
-
openvino_adapters
package that might be the full fledge open-sourced solution - Learning OpenVINO and PyTorch frameworks and creating translations between different APIs
- Creating the first adapter for PyTorch
Skills required/preferred: Python, PyTorch knowledge
Mentors: Jan Iwaszkiewicz, Przemyslaw Wysocki, Anastasia Kuporosova
Size of project: 175 hours
Difficulty: Medium to hard
Year: 2024
Implemented by: LucaTamSapienza
The result: Code | Report | Article
Short description: The goal of this project is to implement set of optimizations inside OpenVINO runtime which target GenAI models (e.g. text-generation, diffusers). Optimizations should include improvements in terms of latency/throughput metrics, faster model compilation time and lower memory consumption.
Expected outcomes:
- Better GenAI workloads adoption in OpenVINO on ARM devices
- Better compilation and latency/throughput performance on GenAI models
- Lower memory requirements to run GenAI applications
Skills required/preferred: Mac device with M1 / M2 / M3 chip is a must, C++
Mentors: Alexandr Voron, Dmitry Gorokhov
Size of project: 350 hours
Difficulty: Medium
Year: 2024
Implemented by: mory91
The result: Code | Report | Article
14. AnomalyGPT: Integrating Vision Language Models (VLMs) in Anomalib for zero- and few-shot anomaly detection
Short description: The goal of this project is to integrate the capabilities of Visual Language Models (VLMs) within the anomalib framework. Anomalib is a deep learning library designed for anomaly detection research. VLMs could be used for anomaly detection purposes by prompting the VLM to determine if an image contains an anomaly or not. An advantage of this approach over classical anomaly detection techniques is that a VLM can explain its decision in natural language, which benefits the interpretability of the predictions.
The first part of the project consists of creating a coupling between Anomalib and OpenAI’s ChatGPT API. The student will use the ChatGPT UI to create a custom GPT that is instructed to detect anomalies in images using ChatGPT’s internal VLM. On the Anomalib side, the student will create a model wrapper that interacts with the GPT through API calls so that the GPT’s predictions can be used within the Anomalib framework. Design choices will need to be made around parsing the responses of the GPT and presenting the predictions to the user. The student will conduct a series of experiments to investigate how the performance of the GPT compares to other Anomalib models on structural and logical anomaly detection datasets such as MVTec AD and MVTec LOCO.
Optional extensions of the project include replacing the ChatGPT model with a locally deployed open-source VLM such as LLaVa, and implementing a simple UI that allows a discourse-style interaction with the model.
Expected outcomes:
- Integrate a custom GPT capable of zero-/few-shot anomaly detection into Anomalib using the ChatGPT API.
- Design and update the prediction entity of Anomalib to return the predicted description from the model.
- Conduct experiments to rate the performance of the GPT on different datasets.
Optional:
- Replace the GPT API by utilizing an open-source VLM models such LLaVa.
- Design a UI to interact with the model, and show the predictions by utilizing open-source tools such as Huggingface or Gradio.
Skills required/preferred:
- Proficient in Python and familiar with deep learning frameworks like PyTorch Lightning.
- Understanding of anomaly detection principles and experience with anomalib is a plus.
- Knowledge of basic machine/deep learning and computer vision would be useful.
Mentors: Samet Akcay, Dick Ameln, Ashwin Vaidya
Size of project: 350 hours
Difficulty: Medium to hard
Year: 2024
Implemented by: Bepitic
The result: Code | Report | Article
Short description: Many people spend a significant portion of their lives sitting in front of a PC, which can lead to eye fatigue. However, using a webcam it is possible to monitor and manage the amount of time a user spends in front of a computer screen.
Implement application uses a webcam to detect the user's gaze, thereby determining the duration of screen time. If the user exceeds a certain limit, the application sends an alert suggesting a break. This feature helps to prevent excessive screen time and promotes healthier computer usage habits.
The application also includes a user interface that allows users to set intervals for microbreaks.
In addition, the application provides reports on the amount of time spent looking at the display.
To optimize energy efficiency and performance, the application is to be designed to utilize the Neural Processing Unit (NPU) on an AI PC. This ensures that the application runs smoothly and efficiently, minimizing its impact on the computer's overall performance.
Expected outcomes:
- A standalone desktop application, which is capable of detecting the user's gaze using a webcam and estimating the amount of time a user spends in front of a computer screen.
- The application also provides a user interface for configuration and reporting.
- To use AI responsibly the application must ask a user for consent when tracking eye gaze and confirm, that data is kept locally on the PC.
- The application utilizes the Neural Processing Unit (NPU) of an AI PC to perform the inference of the corresponding AI model locally. This eliminates the need for an internet connection, making the application more convenient and accessible.
- The application showcases accuracy and power consumption levels, making it suitable for daily usage.
Skills required/preferred: Python, Deep Learning, Computer Vision
Mentors: Dmitriy Pastushenkov, Zhuo Wu
Size of project: 175 hours
Difficulty: Medium
Year: 2024
Implemented by: inbasperu
The result: Code | Report | Article
Short description: The project aims to enhance the runtime performance of OpenVINO on RISC-V devices to fully leverage their power efficiency for running DL/AI workloads efficiently. [Methodology] This will be achieved through three key optimization strategies: adopting or improving third-party libraries with RISC-V optimized primitives, porting existing x86/ARM optimized kernels to RISC-V ISA, and implementing device-specific transformation passes tailored for RISC-V backend optimizations.
Expected outcomes:
- Adoption of optimized RISC-V kernels from third-party libraries for more operations in OpenVINO.
- Porting of critical x86/ARM optimizations to RISC-V ISA.
- Implementation of device-specific transformation passes for RISC-V backend requirements.
- Demonstrated improvement in OpenVINO runtime performance on RISC-V CPU devices through benchmarking against determined workloads.
Skills required/preferred: C++, Deep Learning
Mentors: Alexandra Sidorova, Dmitry Gorokhov
Size of project: 175 hours
Difficulty: Medium to hard
Year: 2024
Implemented by: BHbean
Contribution guidelines can be found here.
- Open OpenVINO discussions tab
- Start a new discussion by pushing the green button (if you cannot see the button, it means you're not logged in)
- Select a "Google Summer of Code" category and add the "gsoc" label
- Ask your question (please be aware everything you post there is publicly available)
Please get in touch with us early to discuss your application with the mentor. Mentors will do their best to reply to all contributors, but due to a large contributor interest this year, they may not be able to respond to all inquiries
© Copyright 2018-2024, OpenVINO team
- Home
- General resources
- How to build
-
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
- Tests