Add first draft of Accelerate example (#178)

* Add first draft of Accelerate example * Update links * Update links and fix minimum version of accelerate * Update links
comet-ml · Jun 18, 2024 · 8ab9e78 · 8ab9e78
1 parent 49b61e4
commit 8ab9e78
Showing 1 changed file with 338 additions and 0 deletions.
diff --git a/integrations/model-training/accelerate/notebooks/Comet_and_Accelerate.ipynb b/integrations/model-training/accelerate/notebooks/Comet_and_Accelerate.ipynb
@@ -0,0 +1,338 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "PWVljpddz_vN"
+   },
+   "source": [
+    "<img src=\"https://cdn.comet.ml/img/notebook_logo.png\">"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "A0-thQauBRRL"
+   },
+   "source": [
+    "[Comet](https://www.comet.com/site/products/ml-experiment-tracking/??utm_source=accelerate&utm_medium=colab&utm_campaign=comet_examples) is an MLOps Platform that is designed to help Data Scientists and Teams build better models faster! Comet provides tooling to track, Explain, Manage, and Monitor your models in a single place! It works with Jupyter Notebooks and Scripts and most importantly it's 100% free to get started!\n",
+    "\n",
+    "[Hugging Face Accelerate](https://github.com/huggingface/accelerate) was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.\n",
+    "\n",
+    "Instrument Accelerate with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.\n",
+    "\n",
+    "[Find more information about our integration with Accelerate](https://www.comet.ml/docs/v2/integrations/ml-frameworks/transformers/?utm_source=accelerate&utm_medium=colab&utm_campaign=comet_examples)\n",
+    "\n",
+    "Curious about how Comet can help you build better models, faster? Find out more about [Comet](https://www.comet.com/site/products/ml-experiment-tracking/?utm_source=accelerate&utm_medium=colab&utm_campaign=comet_examples) and our [other integrations](https://www.comet.com/docs/v2/integrations/overview/?utm_source=accelerate&utm_medium=colab&utm_campaign=comet_examples)\n",
+    "\n",
+    "Get a preview for what's to come. Check out a completed experiment created from this notebook [here](https://www.comet.com/examples/comet-example-accelerate-notebook/0373dd068a484105b16c4053407f1bb2/?utm_source=accelerate&utm_medium=colab&utm_campaign=comet_examples).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "V2UZtdWitSLf"
+   },
+   "source": [
+    "# Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "vIQsPNvatQIU"
+   },
+   "outputs": [],
+   "source": [
+    "%pip install \"comet_ml>=3.31.5\" torch torchvision tqdm \"accelerate>=0.17.0\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "lpCFdN33tday"
+   },
+   "source": [
+    "# Initialize Comet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "kGyz_i-dtfk4"
+   },
+   "outputs": [],
+   "source": [
+    "import comet_ml\n",
+    "\n",
+    "comet_ml.init()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "zMVITKxktj7H"
+   },
+   "source": [
+    "# Import Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "cuRPreREp8y0"
+   },
+   "outputs": [],
+   "source": [
+    "from accelerate import Accelerator\n",
+    "from torch.autograd import Variable\n",
+    "from tqdm import tqdm\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torchvision.datasets as datasets\n",
+    "import torchvision.transforms as transforms"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "a15YYRcKp9xV"
+   },
+   "source": [
+    "# Define Parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "rZ9biJqMtMq0"
+   },
+   "outputs": [],
+   "source": [
+    "hyper_params = {\"batch_size\": 100, \"num_epochs\": 3, \"learning_rate\": 0.01}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "htcACg9grtTe"
+   },
+   "source": [
+    "# Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "crCfZX9Zrufn"
+   },
+   "outputs": [],
+   "source": [
+    "# MNIST Dataset\n",
+    "train_dataset = datasets.MNIST(\n",
+    "    root=\"./data/\", train=True, transform=transforms.ToTensor(), download=True\n",
+    ")\n",
+    "\n",
+    "test_dataset = datasets.MNIST(\n",
+    "    root=\"./data/\", train=False, transform=transforms.ToTensor()\n",
+    ")\n",
+    "\n",
+    "# Data Loader (Input Pipeline)\n",
+    "train_loader = torch.utils.data.DataLoader(\n",
+    "    dataset=train_dataset, batch_size=hyper_params[\"batch_size\"], shuffle=True\n",
+    ")\n",
+    "\n",
+    "test_loader = torch.utils.data.DataLoader(\n",
+    "    dataset=test_dataset, batch_size=hyper_params[\"batch_size\"], shuffle=False\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "vqJaYzGvryUz"
+   },
+   "source": [
+    "# Define Model and Optimizer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "pMRQESULrzXg"
+   },
+   "outputs": [],
+   "source": [
+    "accelerator = Accelerator(log_with=\"comet_ml\")\n",
+    "accelerator.init_trackers(\n",
+    "    project_name=\"comet-example-accelerate-notebook\", config=hyper_params\n",
+    ")\n",
+    "device = accelerator.device\n",
+    "\n",
+    "\n",
+    "class Net(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(Net, self).__init__()\n",
+    "        self.conv1 = nn.Conv2d(1, 32, 3, 1)\n",
+    "        self.conv2 = nn.Conv2d(32, 64, 3, 1)\n",
+    "        self.dropout1 = nn.Dropout(0.25)\n",
+    "        self.dropout2 = nn.Dropout(0.5)\n",
+    "        self.fc1 = nn.Linear(9216, 128)\n",
+    "        self.fc2 = nn.Linear(128, 10)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.conv1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.conv2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        x = self.dropout1(x)\n",
+    "        x = torch.flatten(x, 1)\n",
+    "        x = self.fc1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.dropout2(x)\n",
+    "        x = self.fc2(x)\n",
+    "\n",
+    "        return x\n",
+    "\n",
+    "\n",
+    "model = Net().to(device)\n",
+    "\n",
+    "# Loss and Optimizer\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = torch.optim.Adam(model.parameters(), lr=hyper_params[\"learning_rate\"])\n",
+    "\n",
+    "model, optimizer, train_loader = accelerator.prepare(model, optimizer, train_loader)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "7hgzjMnYqZPQ"
+   },
+   "source": [
+    "# Train a Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "zmb3DAsXqa4K"
+   },
+   "outputs": [],
+   "source": [
+    "def train(model, optimizer, criterion, dataloader, epoch):\n",
+    "    model.train()\n",
+    "    total_loss = 0\n",
+    "    correct = 0\n",
+    "    for batch_idx, (images, labels) in enumerate(\n",
+    "        tqdm(dataloader, total=len(dataloader))\n",
+    "    ):\n",
+    "        optimizer.zero_grad()\n",
+    "        images = images.to(device)\n",
+    "        labels = labels.to(device)\n",
+    "\n",
+    "        outputs = model(images)\n",
+    "\n",
+    "        loss = criterion(outputs, labels)\n",
+    "        pred = outputs.argmax(\n",
+    "            dim=1, keepdim=True\n",
+    "        )  # get the index of the max log-probability\n",
+    "\n",
+    "        accelerator.backward(loss)\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        # Compute train accuracy\n",
+    "        batch_correct = pred.eq(labels.view_as(pred)).sum().item()\n",
+    "        batch_total = labels.size(0)\n",
+    "\n",
+    "        total_loss += loss.item()\n",
+    "        correct += batch_correct\n",
+    "\n",
+    "        # Log batch_accuracy to Comet; step is each batch\n",
+    "        accelerator.log({\"batch_accuracy\": batch_correct / batch_total})\n",
+    "\n",
+    "    total_loss /= len(dataloader.dataset)\n",
+    "    correct /= len(dataloader.dataset)\n",
+    "\n",
+    "    # Log data at an epoch level\n",
+    "    accelerator.get_tracker(\"comet_ml\").tracker.log_metrics(\n",
+    "        {\"accuracy\": correct, \"loss\": total_loss}, epoch=epoch\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "kpNDbdvQuPZw"
+   },
+   "outputs": [],
+   "source": [
+    "# Train the Model\n",
+    "print(\"Running Model Training\")\n",
+    "\n",
+    "max_epochs = hyper_params[\"num_epochs\"]\n",
+    "for epoch in range(max_epochs + 1):\n",
+    "    print(\"Epoch: {}/{}\".format(epoch, max_epochs))\n",
+    "    train(model, optimizer, criterion, train_loader, epoch)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "F__x_H052a1E"
+   },
+   "source": [
+    "# End Experiment "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "YsTMrwd_2cmo"
+   },
+   "outputs": [],
+   "source": [
+    "accelerator.end_training()"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [],
+   "name": "Comet and Pytorch.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}