LLM Anthropomorphization

LING-472 / ANLY-521 Final Project | Georgetown University | Spring 2023

Developers:

Background

Presentation Slides

This project studies the issue of anthropomorphization of large language models and AI by classifiying individual sentences as either anthropomorphic or not.

There is a rudimentary rule-based baseline model and a fine-tuned BERT-based model. If this were to be a full package, the modeling code would be incorporated into main.py. However, since additional compute is required, the colab notebook that we used in Google Colab is included in the repo instead. The fine-tuned model is hosted on Hugging Face's website for sentence checking.

You can download the fine-tuned llm_anthro_detection model from Hugging Face.

Demo of final anthropomorphization detection model

(hosted on Hugging Face)

LABEL_1 (anthropomorphic) was given to the input sentence Where AI unexpectedly teaches itself a new skill

Development

Set up & Installation

To install required packages:

conda env create --name llm --file environment.yml

To add required packages to the environment.yml

conda env export > environment.yml --no-builds

To install the project:

pip install .

Formatting

Code is formatted with black.

Data

For information about the data we used and how to retrieve it yourself, see scraper. The paths provided assume your data directory is on the same level as LLM-anthropomorphization. If your data is stored elsewhere, make sure to adjust the path.

Running the Project

cd labeler/bin
python main.py --data <data> --process <baseline,model> --export

[required] data is the location of your data files
[required] process indicates whether you want to run the baseline labeler or model labeler. for the model labeler, baseline is still run first.
[optional] export will write baseline output to csv

For example, this command runs the baseline labeler against the labeled data in our private repository and will not write the baseline labeled output to csv.

python main.py -d ../../../llm-data/labeled-data/ -p baseline

Architecture

editable link: https://drive.google.com/file/d/1tKZ5nE8fUW5WPYzKtyIoeKjDIkxWm7Mk/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
images		images
labeler		labeler
scraper		scraper
.gitignore		.gitignore
BERT-model-colab.ipynb		BERT-model-colab.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Anthropomorphization

Background

Presentation Slides

Demo of final anthropomorphization detection model

Development

Set up & Installation

Formatting

Data

Running the Project

Architecture

About

Releases

Packages

Contributors 3

Languages

License

cngish98/LLM-anthropomorphization

Folders and files

Latest commit

History

Repository files navigation

LLM Anthropomorphization

Background

Presentation Slides

Demo of final anthropomorphization detection model

Development

Set up & Installation

Formatting

Data

Running the Project

Architecture

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages