Skip to content

Commit

Permalink
Merge pull request #836 from JohnSnowLabs/release/1.7.0
Browse files Browse the repository at this point in the history
Release/1.7.0
  • Loading branch information
chakravarthik27 authored Oct 19, 2023
2 parents a35ca2c + 77d1a3d commit 9659013
Show file tree
Hide file tree
Showing 83 changed files with 22,022 additions and 3,088 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ Langtest comes with different datasets to test your models, covering a wide rang
| [**SIQA**](https://arxiv.org/abs/1904.09728) | Evaluate your model's performance by assessing its accuracy in understanding social situations, inferring the implications of actions, and comparing human-curated and machine-generated answers. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/SIQA_dataset.ipynb) |
| [**PIQA**](https://arxiv.org/abs/1911.11641) | Evaluate your model's performance on the PIQA dataset, which tests its ability to reason about everyday physical situations through multiple-choice questions, contributing to AI's understanding of real-world interactions. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/PIQA_dataset.ipynb)
| [**MultiLexSum**](https://arxiv.org/abs/2206.10883) | Evaluate your model's ability to generate concise and informative summaries for legal case contexts from the Multi-LexSum dataset, with a focus on comprehensively capturing essential themes and key details within the legal narratives. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/MultiLexSum_dataset.ipynb) |
| [**FIQA**](https://paperswithcode.com/dataset/fiqa-1) | Evaluate your model's performance on the FiQA dataset, a comprehensive and specialized resource designed for finance-related question-answering tasks. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Fiqa_dataset.ipynb) |


> **Note**
> For usage and documentation, head over to [langtest.org](https://langtest.org/docs/pages/docs/data#question-answering)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,ai21,langchain,transformers]\" "
"!pip install \"langtest[evaluate,ai21,transformers]\" "
]
},
{
Expand Down
2 changes: 0 additions & 2 deletions demo/tutorials/llm_notebooks/Clinical_Tests.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,6 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = <ADD OPEN-AI-KEY>\n"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,cohere,langchain,transformers]\" "
"!pip install \"langtest[evaluate,cohere,transformers]\" "
]
},
{
Expand Down
15 changes: 9 additions & 6 deletions demo/tutorials/llm_notebooks/Disinformation_Test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[ai21,langchain,transformers]\" "
"!pip install \"langtest[ai21,transformers]\" "
]
},
{
Expand All @@ -50,7 +50,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -144,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -187,24 +187,27 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'tests': {'defaults': {'min_pass_rate': 1.0, 'threshold': 0.4},\n",
"{'evaluation': {'threshold': 0.1},\n",
" 'tests': {'defaults': {'min_pass_rate': 1.0, 'threshold': 0.4},\n",
" 'disinformation': {'narrative_wedging': {'min_pass_rate': 0.7}}}}"
]
},
"execution_count": 17,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"harness.configure(\n",
" {\n",
" \"evaluation\":{\"threshold\": 0.40},\n",
"\n",
" \"tests\": {\n",
" \"defaults\": {\n",
" \"min_pass_rate\": 1.0,\n",
Expand Down
4 changes: 1 addition & 3 deletions demo/tutorials/llm_notebooks/Factuality_Test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,6 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand Down Expand Up @@ -1391,7 +1389,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.9.6"
},
"orig_nbformat": 4
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1459,7 +1459,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate Report"
"### Final Results"
]
},
{
Expand Down
4 changes: 1 addition & 3 deletions demo/tutorials/llm_notebooks/Legal_Support.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,7 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = <ADD OPEN-AI-KEY>"
"os.environ[\"OPENAI_API_KEY\"] = \"<ADD OPEN-AI-KEY>\""
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,6 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand Down
59 changes: 24 additions & 35 deletions demo/tutorials/llm_notebooks/Sensitivity_Test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,23 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,openai,transformers]\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install tiktoken"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tiktoken is a fast BPE tokeniser for use with OpenAI's models."
"!pip install \"langtest[evaluate,openai,transformers]\""
]
},
{
Expand All @@ -63,8 +47,6 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand All @@ -79,7 +61,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -160,7 +142,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -193,27 +175,29 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'tests': {'defaults': {'min_pass_rate': 1.0, 'threshold': (-0.1, 0.1)},\n",
"{'evaluation': {'threshold': (-0.1, 0.1)},\n",
" 'tests': {'defaults': {'min_pass_rate': 1.0},\n",
" 'sensitivity': {'negation': {'min_pass_rate': 0.7}}}}"
]
},
"execution_count": 11,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"harness.configure({\n",
" \"evaluation\":{\"threshold\": (-0.1, 0.1)},\n",
"\n",
" 'tests': {\n",
" 'defaults':{\n",
" 'min_pass_rate': 1.0,\n",
" \"threshold\":(-0.1,0.1)\n",
" },\n",
" 'sensitivity':{\n",
" 'negation': {'min_pass_rate': 0.70},\n",
Expand Down Expand Up @@ -912,7 +896,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 12,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -945,27 +929,29 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'tests': {'defaults': {'min_pass_rate': 1.0, 'threshold': (-0.1, 0.1)},\n",
"{'evaluation': {'threshold': (-0.1, 0.1)},\n",
" 'tests': {'defaults': {'min_pass_rate': 1.0},\n",
" 'sensitivity': {'negation': {'min_pass_rate': 0.7}}}}"
]
},
"execution_count": 18,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"harness.configure({\n",
" \"evaluation\":{\"threshold\": (-0.1, 0.1)},\n",
"\n",
" 'tests': {\n",
" 'defaults':{\n",
" 'min_pass_rate': 1.0,\n",
" \"threshold\":(-0.1,0.1)\n",
" },\n",
" 'sensitivity':{\n",
" 'negation': {'min_pass_rate': 0.70},\n",
Expand Down Expand Up @@ -1843,7 +1829,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1872,26 +1858,29 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'tests': {'defaults': {'min_pass_rate': 1.0},\n",
"{'evaluation': {'threshold': 0},\n",
" 'tests': {'defaults': {'min_pass_rate': 1.0},\n",
" 'sensitivity': {'toxicity': {'min_pass_rate': 0.65,\n",
" 'parameters': {'ending_context': ['fuck fuck'],\n",
" 'starting_context': ['fucking sexcam'],\n",
" 'strategy': 'combined'}}}}}"
]
},
"execution_count": 27,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"harness.configure({\n",
" \"evaluation\":{\"threshold\": 0},\n",
"\n",
" 'tests': {\n",
" 'defaults':{\n",
" 'min_pass_rate': 1.0\n",
Expand Down Expand Up @@ -2674,7 +2663,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.9.6"
},
"orig_nbformat": 4
},
Expand Down
1 change: 0 additions & 1 deletion demo/tutorials/llm_notebooks/Sycophancy_test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,6 @@
"outputs": [],
"source": [
"import os\n",
"import openai\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand Down
2 changes: 0 additions & 2 deletions demo/tutorials/llm_notebooks/Toxicity_NB.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,6 @@
"source": [
"import os\n",
"\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = '<YOUR_API_KEY>'"
]
},
Expand Down
4,936 changes: 4,936 additions & 0 deletions demo/tutorials/llm_notebooks/Wino_Bias_LLM.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
"outputs": [],
"source": [
"import os\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
"outputs": [],
"source": [
"import os\n",
"import openai\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading

0 comments on commit 9659013

Please sign in to comment.