Skip to content

Repository for all content related to Karen Mazidi's Natural Language Processing course.

Notifications You must be signed in to change notification settings

zaiquiriw/nlp-portfolio

Repository files navigation

Howdy!

The Sapir–Whorf hypothesis, also known as the linguistic relativity hypothesis, refers to the proposal that the particular language one speaks influences the way one thinks about reality. As Large Lagnauge Models become the way we interact with software,there will be a greater and greater importance in making sure that these models accurately reflect the purpose they are given. If these algorithms are not finely tuned, we may face an erasure of culture or constant bias reflected in major pieces of software. The greater the scale, the greater the power usage and reliance on large monopolistic data cennters as well. I am into NLP because I want to work on these issues, creating architecture to make smaller more interpretable models. I am extremely interested in improving the way we interface with technology, and I want to work on the problems with LLMs while I can.

Project 0:

I've written a summary of NLP, and what that means to me!

Project 1: Basic Python

Just a simple script that shows off some basic text processing in python. If you would like to learn more, check out my summary.

Project 2: POS Tagging and NLTK

Here we have a interactive guessing game where the words are taken from a text, tokenized, preprocessed, and the top 50 most common nouns are selected to be options for the guessing game.

Project 3: WordNet

This notebook plays with some of the functionality of WordNet, a database that links words based on semantic relationships. Read it here and download it here

Project 4: N-Grams

Utilzing NLTK I wrote python scripts to create n-grams of some examples of languages here and then create a simple language model that identifies if a string of text is likely to be in the analyzed languages: English, French, and Italian. You can run the code here. If you are curious about n-grams, I've written a explanatory summary talking about what they are, and their applications.

Sentence Parsing

I've worked a little bit with sentence parses, along with drawing out some parses for my own understanding here. I must admit English wasn't my favorite subect, so I'm going to let PSGs, Dependency Parse Graphs, and SRL parses do the work for me.

Netscraping!

Read this summary to learn about scraping the web. You can access the scripts:

Text Classification!

It's easy to underestimate how just taking the frequency of each word in a data set would allow you to classify something about data. The classic example is whether an email is spam or not. But in a small experiement, I try and classify whether a line of text.. sounds like Rick from Rick and Morty.

Is Attention Explanation?

Maybe?

A Chatbot

This is a project I would love to revisit. If you would like to view my initial attempt at a chat bot, I have a report here. If you want to look at any scripts, they would be in project folder 7 on the repo!

Text Classification 2

This is just me attempting to use Keras on the Rick and Morty dataset. Turns out deep learning can't improve a bad dataset. Check it out if you want!

My Resume

While I'm still working on my portfolio, I'll be linking my resume here!

About

Repository for all content related to Karen Mazidi's Natural Language Processing course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published