This repository contains the tasks completed during my internship at Prodigy Infotech. Each task demonstrates a different aspect of data science, including data visualization, data cleaning, exploratory data analysis, machine learning, and sentiment analysis. The tasks use various datasets to showcase different techniques and methods commonly used in data science projects.
Created a histogram to visualize the distribution of a categorical or continuous variable, such as the distribution of ages or genders in a population.
World Bank Population Data: https://data.worldbank.org/indicator/SP.POP.TOTL
- Loaded the population data from the World Bank.
- Processed the data to extract the relevant categorical or continuous variable.
- Created a bar chart or histogram to visualize the distribution.
- Used Python libraries such as pandas for data manipulation and matplotlib/seaborn for visualization.
Perform data cleaning and exploratory data analysis on a dataset to explore relationships between variables and identify patterns and trends.
Titanic Dataset from Kaggle: https://www.kaggle.com/c/titanic/data
- Loaded the Titanic dataset.
- Cleaned the data by handling missing values, encoding categorical variables, and normalizing numerical variables.
- Conducted EDA to explore the relationships between variables and identify patterns and trends.
- Visualized the data using various plots (e.g., scatter plots, box plots, heatmaps).
Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data.
Bank Marketing Dataset from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
- Loaded the Bank Marketing dataset.
- Preprocessed the data by encoding categorical variables and splitting the data into training and test sets.
- Built a decision tree classifier using scikit-learn.
- Evaluated the classifier's performance using metrics such as accuracy, precision, recall, and F1-score.
Analyze and visualize sentiment patterns in social media data to understand public opinion and attitudes towards specific topics or brands.
Twitter Entity Sentiment Analysis Dataset from Kaggle: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
- Loaded the Twitter sentiment analysis dataset.
- Preprocessed the data by cleaning text, tokenizing, and vectorizing.
- Analyzed sentiment patterns using natural language processing techniques.
- Visualized the sentiment distribution and identified key trends.
To run the scripts and reproduce the results, the following Python libraries are required:
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
This repository showcases my data science skills through various tasks involving data visualization, cleaning, exploratory analysis, machine learning, and sentiment analysis. Each task demonstrates my ability to work with different datasets and apply appropriate techniques to extract meaningful insights.