Skip to content

Latest commit

 

History

History
 
 

05_llm_part2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Introduction to Neural Networks: Part 2

Author: Archit Vasan (avasan@anl.gov), including and adapting materials and discussions over time by Varuni Sastri, Carlo Graziani, Taylor Childers, Venkat Vishwanath, Jay Alammar and Kevin Gimpel.

This tutorial continues the discussion with Carlo Graziani from last week on large language models (LLMs) where he introduced sequential data modeling, tokenization methods and embeddings. Here, we will attempt to demystify aspects of the Transformer model architecture.

We will refer to this notebook:

https://github.com/argonne-lcf/ai-science-training-series/blob/architvasan/05_llm_part2/LLM_part02.ipynb

The discussion will include:

  • positional encodings,
  • attention mechanisms,
  • output layers,
  • and training loops.

This will hopefully also provide the necessary background for next week's discussion of distributed training of LLMs.

We are first going to use "text-generation" using the popular GPT-2 model and the Hugging Face pipeline. Then we are going to code the model elements of a simple LLM from scratch and train this ourselves.

Next week, we'll learn about how to train more complicated LLMs using distributed resources.

Environment Setup

  1. If you are using ALCF, first log in. From a terminal run the following command:
ssh username@polaris.alcf.anl.gov
  1. Although we already cloned the repo before, you'll want the updated version. To be reminded of the instructions for syncing your fork, click here.

  2. We will be downloading data in our Jupyter notebook, which runs on hardware that by default has no Internet access. From the terminal on Polaris, edit the ~/.bash_profile file to have these proxy settings:

export HTTP_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export http_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export https_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export ftp_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"
  1. Now that we have the updated notebooks, we can open them. If you are using ALCF JupyterHub or Google Colab, you can be reminded of the steps here.

  2. Reminder: Change the notebook's kernel to datascience/conda-2023-01-10 (you may need to change kernel each time you open a notebook for the first time):

    1. select Kernel in the menu bar
    2. select Change kernel...
    3. select datascience/conda-2023-01-10 from the drop-down menu

Exciting example:

Here is an image of GenSLM described earlier by Arvind Ramanathan. This is a language model that can model genomic information in a single model. It was shown to model the evolution of SARS-COV2 without expensive experiments. GenSLM

References:

Here are some recommendations for further reading and additional code for review.