Skip to content

This repository contains data science projects from my Prodigy Infotech internship, including data visualization, cleaning and EDA on the Titanic dataset, a decision tree classifier for the Bank Marketing dataset, and Twitter sentiment analysis.

Notifications You must be signed in to change notification settings

KeerthanaPalanikumar/Prodigy-Infotech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Internship ReadMe: Prodigy Infotech Data Science Tasks

Overview:-

This repository contains the tasks completed during my internship at Prodigy Infotech. Each task demonstrates a different aspect of data science, including data visualization, data cleaning, exploratory data analysis, machine learning, and sentiment analysis. The tasks use various datasets to showcase different techniques and methods commonly used in data science projects.

Task-01: Data Visualization

Objective:-

Created a histogram to visualize the distribution of a categorical or continuous variable, such as the distribution of ages or genders in a population.

Dataset:-

World Bank Population Data: https://data.worldbank.org/indicator/SP.POP.TOTL

Description:-

  • Loaded the population data from the World Bank.
  • Processed the data to extract the relevant categorical or continuous variable.
  • Created a bar chart or histogram to visualize the distribution.
  • Used Python libraries such as pandas for data manipulation and matplotlib/seaborn for visualization.

Task-02: Data Cleaning and Exploratory Data Analysis (EDA)

Objective:-

Perform data cleaning and exploratory data analysis on a dataset to explore relationships between variables and identify patterns and trends.

Dataset:-

Titanic Dataset from Kaggle: https://www.kaggle.com/c/titanic/data

Description:-

  • Loaded the Titanic dataset.
  • Cleaned the data by handling missing values, encoding categorical variables, and normalizing numerical variables.
  • Conducted EDA to explore the relationships between variables and identify patterns and trends.
  • Visualized the data using various plots (e.g., scatter plots, box plots, heatmaps).

Task-03: Decision Tree Classifier

Objective:-

Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data.

Dataset:-

Bank Marketing Dataset from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

Description:-

  • Loaded the Bank Marketing dataset.
  • Preprocessed the data by encoding categorical variables and splitting the data into training and test sets.
  • Built a decision tree classifier using scikit-learn.
  • Evaluated the classifier's performance using metrics such as accuracy, precision, recall, and F1-score.

Task-04: Sentiment Analysis

Objective:-

Analyze and visualize sentiment patterns in social media data to understand public opinion and attitudes towards specific topics or brands.

Dataset:-

Twitter Entity Sentiment Analysis Dataset from Kaggle: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis

Description:-

  • Loaded the Twitter sentiment analysis dataset.
  • Preprocessed the data by cleaning text, tokenizing, and vectorizing.
  • Analyzed sentiment patterns using natural language processing techniques.
  • Visualized the sentiment distribution and identified key trends.

Requirements

To run the scripts and reproduce the results, the following Python libraries are required:

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn

Conclusion

This repository showcases my data science skills through various tasks involving data visualization, cleaning, exploratory analysis, machine learning, and sentiment analysis. Each task demonstrates my ability to work with different datasets and apply appropriate techniques to extract meaningful insights.

About

This repository contains data science projects from my Prodigy Infotech internship, including data visualization, cleaning and EDA on the Titanic dataset, a decision tree classifier for the Bank Marketing dataset, and Twitter sentiment analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published