This is the beginning of my trajectory to become an awesome data scientist.
I have no degree on computer science and I'm not a "formal" software engineer. I have however a bachelor degree in Physics, so my math background is pretty solid.
I already know Python (thanks to MIT course on edX) but I have A LOT to improve, and I plan to learn Go too.
This is what I figured so far for my development:
- Course 1 - Neural networks and deep learning
- Course 2 - Improving deep neural networks
- Course 3 - Structuring machine learning projects
- Course 4 - Convolutional neural networks
- Course 5 - Sequence models
Summary here
- Chapter 1 - The Machine Learning Pipeline
- Chapter 2 - Fancy Tricks with Simple Numbers
- 3. Text Data: Flattening, Filtering, and Chunking
- 4. The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf
- 5. Categorical Variables: Counting Eggs in the Age of Robotic Chickens
- 6. Dimensionality Reduction: Squashing the Data Pancake with PCA
- 7. Nonlinear Featurization via K-Means Model Stacking
- 8. Automating the Featurizer: Image Feature Extraction and Deep Learning
- 9. Back to the Feature: Building an Academic Paper Recommender
There's a repo for that
- Some descriptive analytics for PGA index file
- Descriptive analytics for siva files on PGA according to some criteria
- Download siva files
- Examine siva files
- Use gitbase to query siva files
Following O'Reilly book
- Chapter 01 - Introduction
- Chapter 02 - A Crash Course in Python
- Chapter 03 - Visualizing Data
- Chapter 04 - Linear Algebra
- Chapter 05 - Statistics
- Chapter 06 - Probability
- Chapter 07 - Hypothesis and Inference
- Chapter 08 - Gradient Descent
- Chapter 09 - Getting Data
- Chapter 10 - Working with Data
- Chapter 11 - Machine Learning
- Chapter 12 - k-Nearest Neighbors
- Chapter 13 - Naive Bayes
- Chapter 14 - Simple Linear Regression
- Chapter 15 - Multiple Regression
- Chapter 16 - Logistic Regression
- Chapter 17 - Decision Trees
- Chapter 18 - Neural Networks
- Chapter 19 - Clustering
- Chapter 20 - Natural Language Processing
- Chapter 21 - Network Analysis
- Chapter 22 - Recommender Systems
- Chapter 23 - Databases and SQL
- Chapter 24 - MapReduce
- Chapter 25 - Go Forth and Do Data Science
Following "Introduction to Code As Data & Machine Learning On Code"
- Getting started with Babelfish
- Analyzing Git Repositories
- Getting started with gitbase & gitbase web
- MLonCode Pre-trained Models
- Training MLonCode Models
There's a repo for that
- Acquire data
- Analyze by describing data
- Analyze by pivoting features
- Analyze by visualizing data
- Wrangle data
- Model, predict and solve
- Logistic Regression
- KNN or k-Nearest Neighbors
- Support Vector Machines
- Naive Bayes classifier
- Decision Tree
- Random Forrest
- Perceptron
- Artificial neural network
- RVM or Relevance Vector Machine
Following the book
- To be developed
Following O'Reilly book
- To be developed
Following the book by Caleb Doxsey
- Chapter 01 - Getting Started
- Chapter 02 - Types
- Chapter 03 - Variables
- Chapter 04 - Control Sctructures
- Chapter 05 - Arrays, Slices and Maps
- Chapter 06 - Functions
- Chapter 07 - Structs and Interfaces
- Chapter 08 - Packages
- Chapter 09 - Testing
- Chapter 10 - Concurrency
- Chapter 11 - Next Steps
Concepts that I have no clue about and have to study/practice
- reflog
- git bisect
- binaries
- packfile
- namespace
- xpath
- testing
- SDK
- debug
- protobuf
- rpc