Name		Name	Last commit message	Last commit date
parent directory ..
275-llm-question-answering.ipynb		275-llm-question-answering.ipynb
README.md		README.md
config.py		config.py

README.md

LLM Instruction-following pipeline with OpenVINO

LLM stands for “Large Language Model”, which refers to a type of artificial intelligence model that is designed to understand and generate human-like text based on the input it receives. LLMs are trained on large datasets of text to learn patterns, grammar, and semantic relationships, allowing them to generate coherent and contextually relevant responses. One core capability of Large Language Models (LLMs) is to follow natural language instructions. Instruction-following models are capable of generating text in response to prompts and are often used for tasks like writing assistance, chatbots, and content generation.

In this tutorial, we consider how to run an instruction-following text generation pipeline using popular LLMs and OpenVINO. We will use pre-trained models from the Hugging Face Transformers library. To simplify the user experience, the Hugging Face Optimum Intel library converts the models to OpenVINO™ IR format.

The tutorial supports different models, you can select one from provided options to compare quality of open source LLM solutions.

The available options are:

tiny-llama-1b-chat - This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in model card
phi-2 - Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. More details about model can be found in model card.
dolly-v2-3b - Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on Pythia and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation. More details about model can be found in model card.
red-pajama-3b-instruct - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. The model was fine-tuned for few-shot applications on the data of GPT-JT, with exclusion of tasks that overlap with the HELM core scenarios.More details about model can be found in model card.
mistral-7b - The Mistral-7B-v0.2 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the model card, paper and release blog post.

Notebook Contents

The tutorial consists of the following steps:

Install prerequisites
Download and convert the model from a public source using the OpenVINO integration with Hugging Face Optimum.
Compress model weights to INT8 and INT4 with OpenVINO NNCF
Create an instruction-following inference pipeline
Run instruction-following pipeline

The image below illustrates the provided user instruction and model answer examples.

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

275-llm-question-answering

275-llm-question-answering

README.md

LLM Instruction-following pipeline with OpenVINO

Notebook Contents

Installation Instructions

Files

275-llm-question-answering

Directory actions

More options

Directory actions

More options

Latest commit

History

275-llm-question-answering

Folders and files

parent directory

README.md

LLM Instruction-following pipeline with OpenVINO

Notebook Contents

Installation Instructions