The Art of Analyzing Big Data - The Data Scientist’s Toolbox.
- Get a free API Key at https://www.kaggle.com/ (My account -> Create New API Token) See a tutorial at https://github.com/Kaggle/kaggle-api#api-credentials.
- Add your API token by changing this line:
api_token = {"username":"","key":""}
- Run ech ipynb file using Colab (https://colab.research.google.com/) or Jupiter.
Task: Select a reviews dataset and create a sentiment classifier which utilizes word embeddings. Evaluate this classifier. Try to improve your classifier by adding additional features.
Dataset: Courseras Course Reviews Dataset
Task: Select two books and construct networks of people and locations, i.e. each link is between a a person and a location. Visualize the network using Cytoscape or Gephi.
Dataset: Dickens
Book Name: Little Dorrit
Task: Select a country's' statistic from the World Development Indicators dataset (Please notice there are several files in the dataset, such as Indicators.csv). Then, create a choropleth map displaying how the selected statistics changed over time Create a short animation that displays how the chosen statistics changed over time
Dataset: World Development Indicators
Task: Select a short video with at least 3 persons and create a new movie from this video with a face tracker (each person’s face needs to be tracked by a rectangle of a different color). See, for example, the video in: https://github.com/ageitgey/face_recognition
Task: Select a collection of connected images. Create a graph of the links among images or objects in image. Use graph algorithms to discover interesting insights regarding the images.
Dataset: simpsons-characters
- Advanced SQL
- Pandas
- Matplotlib
- TuriCreate
- Seaborn
- Altair
- Networkx
- iGraph
- TuriCreate
- Cytoscape
- Gephi
- D3
- SpaCy
- Topic Modeling
- Word-Embedding (Word2Vec, BERT)
- Sentiment Analysis
- MongoDB
- KeplerGL
- AWS
- Microsoft Azure
- Google Cloud Platform
- Spark
- Hadoop
- Dask and MLlib