U.S. Officials Travel Analysis

A textual analysis of the travel patterns for the US president and secretary using NMF, PCA, t-SNE, and TF-IDF. I found this dataset online when I was preparing a final project for my data science class and thought that these data might be interesting to look at on their own + I wanted to practice some of my data analysis skills :).

I scraped the raw dataset for the analysis from the U.S. Historian website using BeautifulSoup and cleaned the data which can be found in the tidy dataset. I primarily used scikit-learn to analyze the data and seaborn + plotly for visualizations.

A project report/analysis interpretations can be found here.

Project Organization

.
├── LICENSE
├── README.md
├── data
│   ├── README.md                 # Data dictionaries
│   ├── travel_processed.csv      # A tidy dataset with a processed textual component
│   ├── travel_raw.csv            # A raw scraped dataset
│   └── travel_tidy.csv           # A tidy dataset with an unprocessed textual component
├── external
│   ├── LICENSE                   # License for ctfidf.py
│   ├── README.md                 # Credit for ctfidf.py
│   └── ctfidf.py                 # A class for calculating class-based TF-IDF
├── models
│   ├── kmeans.joblib             # K-Means saved model
│   ├── nmf.joblib                # NMF saved model
│   ├── pca.joblib                # PCA saved model
│   ├── tfidf_features.joblib     # TF-IDF matrix
│   ├── tfidf_vectorizer.joblib   # TF-IDF scikit object
│   └── tsne.joblib               # t-SNE saved model
├── notebooks
│   ├── visualize_clusters.ipynb  # Visualizing PCA and t-SNE embeddings
│   ├── visualize_exp.ipynb       # Exploratory analyses and visualizations
│   └── visualize_tfidf.ipynb     # Visualizing TF-IDF results
└── src
    ├── data
    │   ├── preprocess.py         # Script for cleaning the textual portion of the data
    │   ├── scrape.py             # Script for scraping the raw data
    │   └── tidy.py               # Script for cleaning up the dates and separating locales
    └── models
        └── fit_models.py         # Script for fitting the models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

U.S. Officials Travel Analysis

Project Organization

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
external		external
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

evdkv/pres-travel-analysis

Folders and files

Latest commit

History

Repository files navigation

U.S. Officials Travel Analysis

Project Organization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages