Unsupervised Machine Learning.
The purpose of this project is to perform Unsupervised Machine Learning models to discover any groupings, trends, or other information that could help us pitch cryptocurrencies to investors. A report that includes trending cryptocurrencies in the market as well as a classification plots will be provided at the end of the analysis.
The dataset will be preprocessed by following steps in order to perform PCA :
- Keep all the cryptocurrencies that are being traded.
- Drop the IsTrading column.
- Remove rows that have at least one null value.
- Filter the crypto_df DataFrame so it only has rows where coins have been mined.
- Create a new DataFrame that holds only the cryptocurrency names and use the crypto_df DataFrame index as the index for this new DataFrame.
- Remove the CoinName column from the crypto_df DataFrame since it's not going to be used on the clustering algorithm.
- Use the get_dummies() method to create variables for the two text features, Algorithm and ProofType, and store the resulting data in a new DataFrame named X.
- Use the StandardScaler fit_transform() function to standardize the features from the X DataFrame.
After completing Data Processing and Transforming steps above, an array of standardize the features will be as in Figure-1
Figure-1
PCA algorithm was applied to array in Figure 1
with three principal components then converted to a dataframe as in Figure-2
Figure-2
In this part, by utilizing Elbow Curve
we find out how many clusters will be best fit for this data set that is the value of K-Means
. The sharp turn at k=4
in Figure-3
indicates that 4 will be the number of clusters that our algorithm will use.
Figure-3
After using k=4
for K-Means model
,a new data frame was created by joining PCA and Crypto dataframes and a column names Class
added as showing in Figure-4
Figure-4
This 3D-Scatter in Figure-5
plot help us to see how classes was populated and distributed within 3 Principal Components.
Figure-5
The table in Figure-6
contains coins that currently tradable which is a subset of dataframe in Figure-4
Figure-6
Finally, the plot in Figure-7
is demonstrating the distribution of each cluster in 2D.
Figure-7
In conclusion, a ML algorithm can help us to understand whether there is similarities among different cryptocurrencies and if show similar behaviors in the market. Thus, we can provide with visuals to help investors to make better decision on this trending market.
- Data Source: crypto_data.csv
- Software/Languages: Jupyter Notebook- Google Colab, Python.
- Libraries: Scikit-learn, Pandas, Plotly. HvPlot.