Skip to content

An Evaluation of Adversarial Reasoning and Bias Adherence in Fine-Tuned LLMs Using Automated Scoring Techniques.

Notifications You must be signed in to change notification settings

isaac-berlin/LLM_Model_Bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Political Bias in LLMs Through Debate: A Multi-Agent Framework

This project investigates the performance of fine-tuned large language models (LLMs) with distinct left-leaning and right-leaning political biases in structured debates on politically sensitive topics. Leveraging models such as LLAMA 3.2 and PHI 1.5, we analyze their adherence to predefined biases, reasoning capabilities, and persuasiveness within a multi-agent framework and through automated evaluation metrics.

A comprehensive overview of the methodologies, results, and models used in this project, as well as the final paper, is available here. This work was conducted as part of the final project for the course CSCI 5541: Natural Language Processing. We extend our gratitude to Professor Dongyeop Kang and the teaching assistants for their invaluable guidance and support throughout the project.

Structure

Finetuning Data

In this project, we required left-leaning, right-leaning, and neutral LLM agents. To achieve this, we generated separate datasets to fine-tune each type of model. We utilized the BABE dataset, which consists of sentences from left- and right-leaning news organizations with varying levels of bias. The sentences were partitioned into left-leaning and right-leaning subsets, with each respective model fine-tuned on their own subset. The resulting dataset partitions are available in the src/data directory.

LLMs

For this project, we fine-tuned left, right, and center versions of two open-source LLMs: LLama 3.2 and Phi 1.5. These models, developed by Meta and Microsoft respectively, were selected for their similarity to other state-of-the-art models, providing a representative basis for our experiments.

Debate Arena

We designed a custom debate arena object tailored for this task. The framework is compatible with any Hugging Face LLM model for text generation. The code for the debate arena, along with an example usage, is available in the src/debate_arena.ipynb file. While the framework was employed for political-style debates in this project, it is adaptable to any topic. The class instantiation allows users to configure response length, the number of debate turns, and the temperature of model responses.

About

An Evaluation of Adversarial Reasoning and Bias Adherence in Fine-Tuned LLMs Using Automated Scoring Techniques.

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •