-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding helper to bootstrap the generation of the meta.yaml file
- Loading branch information
1 parent
8f04af5
commit d3eea1b
Showing
4 changed files
with
110 additions
and
168 deletions.
There are no files selected for viewing
154 changes: 0 additions & 154 deletions
154
data/tabular/blood_brain_barrier_martins_et_al/meta.yaml
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Meta YAML Generator | ||
|
||
## Overview | ||
|
||
The Meta YAML Generator is a tool designed to automatically create a `meta.yaml` file for chemical datasets using Large Language Models (LLMs). It analyzes the structure of a given DataFrame and generates a comprehensive metadata file, including advanced sampling methods and template formats. | ||
|
||
The model used by default is `gpt4o`. For using it, you need to expose the `OPENAI_API_KEY` environment variable. | ||
|
||
## `generate_meta_yaml` | ||
|
||
::: chemnlp.data.meta_yaml_generator.generate_meta_yaml | ||
handler: python | ||
options: | ||
show_root_heading: true | ||
show_source: false | ||
|
||
## Usage Example | ||
|
||
```python | ||
import pandas as pd | ||
from chemnlp.data.meta_yaml_generator import generate_meta_yaml | ||
|
||
# Load your dataset | ||
df = pd.read_csv("your_dataset.csv") | ||
|
||
# Generate meta.yaml | ||
meta_yaml = generate_meta_yaml( | ||
df, | ||
dataset_name="Polymer Properties Dataset", | ||
description="A dataset of polymer properties including glass transition temperatures and densities", | ||
output_path="path/to/save/meta.yaml" | ||
) | ||
|
||
# The meta_yaml variable now contains the dictionary representation of the meta.yaml | ||
# If an output_path was provided, the meta.yaml file has been saved to that location | ||
``` | ||
|
||
You can also use it as a command-line tool: | ||
|
||
```bash | ||
python -m chemnlp.data.meta_yaml_generator path/to/your_dataset.csv --dataset_name "Polymer Properties Dataset" --description "A dataset of polymer properties including glass transition temperatures and densities" --output_path "path/to/save/meta.yaml" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
bibtex: | ||
- "@article{martins2023,\nauthor = {Martins, John and Doe, Jane and Smith, Alice},\ntitle = {Study on Blood-Brain Barrier Penetration of Various Drugs},\njournal = {Journal of Pharmacology},\nvolume = {12},\nnumber = {3},\npages = {123-134},\nyear = {2023},\ndoi = {10.1234/jpharm.2023.56789}}" | ||
description: Describing the ability of different drugs to penetrate the blood-brain barrier. | ||
identifiers: | ||
- description: Simplified Molecular Input Line Entry System | ||
id: SMILES | ||
type: SMILES | ||
- description: Name of the compound | ||
id: compound_name | ||
names: | ||
- noun: compound name | ||
type: Other | ||
license: CC BY 4.0 | ||
links: | ||
- description: corresponding publication | ||
url: https://example.com/publication | ||
- description: data source | ||
url: https://example.com/data_source | ||
name: blood_brain_barrier_martins_et_al | ||
num_points: 2030 | ||
targets: | ||
- description: Indicates whether the compound can penetrate the blood-brain barrier (1 for yes, 0 for no) | ||
id: penetrate_BBB | ||
names: | ||
- noun: blood-brain barrier penetration | ||
type: integer | ||
templates: | ||
- The compound {compound_name__names__noun} with SMILES {SMILES#} can {#penetrate|not penetrate!} the blood-brain barrier. | ||
- The compound {compound_name__names__noun} with SMILES {SMILES#} is in the {split#} set. | ||
- "Question: Which of the following compounds can penetrate the blood-brain barrier?\nOptions: {%multiple_choice_enum%4%aA1}\n{compound_name%}\nAnswer: {%multiple_choice_result}" | ||
- The compound with SMILES {SMILES#} can penetrate the blood-brain barrier:<EOI>{penetrate_BBB#} |