diff --git a/entity_recognition/entity_recognition_training.ipynb b/entity_recognition/entity_recognition_training.ipynb index 0883942..c48a055 100644 --- a/entity_recognition/entity_recognition_training.ipynb +++ b/entity_recognition/entity_recognition_training.ipynb @@ -21,7 +21,7 @@ "id": "a6676421", "metadata": {}, "source": [ - "This notebook demonstrates how to train a NLP model for entity recognition and use it to produce out-of-sample predicted probabilities for each token. These are required inputs to find label issues in token classification datasets with cleanlab. The specific token classification task we consider here is Named Entity Recognition with the [CoNLL-2003 dataset](https://deepai.org/dataset/conll-2003-english), and we train a Transformer network from [HuggingFace's transformers library](https://github.com/huggingface/transformers). This notebook demonstrates how to produce the `pred_probs`, using them to find label issues is demonstrated in cleanlab's [Token Classification Tutorial](https://docs.cleanlab.ai/stable/tutorials/token_classification.html). \n", + "This notebook demonstrates how to train a NLP model for entity recognition and use it to produce out-of-sample predicted probabilities for each token. These are required inputs to find label issues in token classification datasets with cleanlab. The specific token classification task we consider here is Named Entity Recognition with the [CoNLL-2003 dataset](https://deepai.org/dataset/conll-2003-english), and we train a Transformer network from [HuggingFace's transformers library](https://github.com/huggingface/transformers). This notebook demonstrates how to produce the `pred_probs`, using them to find label issues is demonstrated in cleanlab's [Token Classification Tutorial](https://docs.cleanlab.ai/stable/tutorials/token_classification.html). Note that running this notebook requires the .py files in the entity_recognition/ parent folder, if running in Colab or locally, make sure you've copied these helper .py files to your environment as well. \n", "\n", "**Overview of what we'll do in this notebook:** \n", "- Read and process text datasets with per-token labels in the CoNLL format. \n",