Name		Name	Last commit message	Last commit date
parent directory ..
OLD		OLD
bertviz		bertviz
dataset		dataset
images		images
03-19-2024_Marija_Iloska_Session5.ipynb		03-19-2024_Marija_Iloska_Session5.ipynb
LLM_part02.ipynb		LLM_part02.ipynb
README.md		README.md

README.md

Introduction to Neural Networks: Part 2

Author: Archit Vasan (avasan@anl.gov), including and adapting materials and discussions over time by Varuni Sastri, Carlo Graziani, Taylor Childers, Venkat Vishwanath, Jay Alammar and Kevin Gimpel.

This tutorial continues the discussion with Carlo Graziani from last week on large language models (LLMs) where he introduced sequential data modeling, tokenization methods and embeddings. Here, we will attempt to demystify aspects of the Transformer model architecture.

We will refer to this notebook:

https://github.com/argonne-lcf/ai-science-training-series/blob/architvasan/05_llm_part2/LLM_part02.ipynb

The discussion will include:

positional encodings,
attention mechanisms,
output layers,
and training loops.

This will hopefully also provide the necessary background for next week's discussion of distributed training of LLMs.

We are first going to use "text-generation" using the popular GPT-2 model and the Hugging Face pipeline. Then we are going to code the model elements of a simple LLM from scratch and train this ourselves.

Next week, we'll learn about how to train more complicated LLMs using distributed resources.

Environment Setup

If you are using ALCF, first log in. From a terminal run the following command:

ssh username@polaris.alcf.anl.gov

Although we already cloned the repo before, you'll want the updated version. To be reminded of the instructions for syncing your fork, click here.
We will be downloading data in our Jupyter notebook, which runs on hardware that by default has no Internet access. From the terminal on Polaris, edit the ~/.bash_profile file to have these proxy settings:

export HTTP_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export http_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export https_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export ftp_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"

Now that we have the updated notebooks, we can open them. If you are using ALCF JupyterHub or Google Colab, you can be reminded of the steps here.
Reminder: Change the notebook's kernel to datascience/conda-2023-01-10 (you may need to change kernel each time you open a notebook for the first time):
1. select Kernel in the menu bar
2. select Change kernel...
3. select datascience/conda-2023-01-10 from the drop-down menu

Exciting example:

Here is an image of GenSLM described earlier by Arvind Ramanathan. This is a language model that can model genomic information in a single model. It was shown to model the evolution of SARS-COV2 without expensive experiments.

References:

Here are some recommendations for further reading and additional code for review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_llm_part2

05_llm_part2

README.md

Introduction to Neural Networks: Part 2

Environment Setup

Exciting example:

References:

Files

05_llm_part2

Directory actions

More options

Directory actions

More options

Latest commit

History

05_llm_part2

Folders and files

parent directory

README.md

Introduction to Neural Networks: Part 2

Environment Setup

Exciting example:

References: