Skip to content

Final project for LING-472/ANLY-521 Computational Linguistics Advanced Python at Georgetown University (S'23) | Jessica Cusi, Caroline Gish, Cindy Li

License

Notifications You must be signed in to change notification settings

cngish98/LLM-anthropomorphization

Repository files navigation

LLM Anthropomorphization

LING-472 / ANLY-521 Final Project | Georgetown University | Spring 2023

Developers:

Background

This project studies the issue of anthropomorphization of large language models and AI by classifiying individual sentences as either anthropomorphic or not.

There is a rudimentary rule-based baseline model and a fine-tuned BERT-based model. If this were to be a full package, the modeling code would be incorporated into main.py. However, since additional compute is required, the colab notebook that we used in Google Colab is included in the repo instead. The fine-tuned model is hosted on Hugging Face's website for sentence checking.

You can download the fine-tuned llm_anthro_detection model from Hugging Face.

Demo of final anthropomorphization detection model

(hosted on Hugging Face)

architecture diagram

LABEL_1 (anthropomorphic) was given to the input sentence Where AI unexpectedly teaches itself a new skill

Development

Set up & Installation

To install required packages:

conda env create --name llm --file environment.yml 

To add required packages to the environment.yml

conda env export > environment.yml --no-builds

To install the project:

pip install .

Formatting

Code is formatted with black.

Data

For information about the data we used and how to retrieve it yourself, see scraper. The paths provided assume your data directory is on the same level as LLM-anthropomorphization. If your data is stored elsewhere, make sure to adjust the path.

Running the Project

cd labeler/bin
python main.py --data <data> --process <baseline,model> --export
  • [required] data is the location of your data files
  • [required] process indicates whether you want to run the baseline labeler or model labeler. for the model labeler, baseline is still run first.
  • [optional] export will write baseline output to csv

For example, this command runs the baseline labeler against the labeled data in our private repository and will not write the baseline labeled output to csv.

python main.py -d ../../../llm-data/labeled-data/ -p baseline

Architecture

architecture diagram

editable link: https://drive.google.com/file/d/1tKZ5nE8fUW5WPYzKtyIoeKjDIkxWm7Mk/view?usp=sharing

About

Final project for LING-472/ANLY-521 Computational Linguistics Advanced Python at Georgetown University (S'23) | Jessica Cusi, Caroline Gish, Cindy Li

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published