CPSC-440-550-speaker-identification

CPSC 440 project on speaker Identification in novels using large language models by Richard Han, Charles Yan, Mu-Chen Liu.

Abstract

Dialogue attribution in the context of text analysis refers to the process of associating dialogue with the correct speaker in a conversation or narrative text. This is a crucial task in various fields such as natural language processing (NLP), literary analysis, and dialogue systems. Automating this task can be challenging, especially in texts where multiple characters interact closely, or where there are limited clues to identify the speaker. In recent years, large language models (LLMs) have gained significant attention for their ability to handle a wide range of natural language tasks with high proficiency. In this paper, we explore the capabilities of zero-shot LLMs in dialogue attribution through experimental analysis. The Mistral 7B Instruct model achieves the highest overall accuracy, while the LLamA 2 Chat model struggles to follow the instructions in the prompt to perform the task. We observe a positive correlation between dialogue attribution performance and the amount of provided context, particularly when the speaker is not explicitly stated in the dialogue. Additionally, the Mistral 7B Instruct model shows a performance plateau when identifying speakers whose identities are directly mentioned in the dialogue.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Emma		Emma
PrideAndPrejudice		PrideAndPrejudice
context0		context0
context1		context1
context16		context16
context2		context2
context4		context4
context8		context8
Emma_quotes.csv		Emma_quotes.csv
Emmacontext1.csv		Emmacontext1.csv
Emmacontext16.csv		Emmacontext16.csv
Emmacontext2.csv		Emmacontext2.csv
Emmacontext4.csv		Emmacontext4.csv
Emmacontext8.csv		Emmacontext8.csv
LLM_speaker_identification.ipynb		LLM_speaker_identification.ipynb
PrideAndPrejudice_context1.csv		PrideAndPrejudice_context1.csv
PrideAndPrejudice_context16.csv		PrideAndPrejudice_context16.csv
PrideAndPrejudice_context2.csv		PrideAndPrejudice_context2.csv
PrideAndPrejudice_context4.csv		PrideAndPrejudice_context4.csv
PrideAndPrejudice_context8.csv		PrideAndPrejudice_context8.csv
PrideAndPrejudice_quotes.csv		PrideAndPrejudice_quotes.csv
README.md		README.md
analysis.py		analysis.py
cpsc440.pdf		cpsc440.pdf
dataset.py		dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPSC-440-550-speaker-identification

Abstract

About

Releases

Packages

Contributors 2

Languages

rrhan0/CPSC-440-540-speaker-identification

Folders and files

Latest commit

History

Repository files navigation

CPSC-440-550-speaker-identification

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages