Skip to content

Unsupervised machine learning models for the analysis of cryptocurrencies.

Notifications You must be signed in to change notification settings

aktugchelekche/Cryptocurrencies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cryptocurrencies

Unsupervised Machine Learning.

Overview of the Analysis

The purpose of this project is to perform Unsupervised Machine Learning models to discover any groupings, trends, or other information that could help us pitch cryptocurrencies to investors. A report that includes trending cryptocurrencies in the market as well as a classification plots will be provided at the end of the analysis.

Analysis

Preprocessing the Data for PCA

The dataset will be preprocessed by following steps in order to perform PCA :

  • Keep all the cryptocurrencies that are being traded.
  • Drop the IsTrading column.
  • Remove rows that have at least one null value.
  • Filter the crypto_df DataFrame so it only has rows where coins have been mined.
  • Create a new DataFrame that holds only the cryptocurrency names and use the crypto_df DataFrame index as the index for this new DataFrame.
  • Remove the CoinName column from the crypto_df DataFrame since it's not going to be used on the clustering algorithm.
  • Use the get_dummies() method to create variables for the two text features, Algorithm and ProofType, and store the resulting data in a new DataFrame named X.
  • Use the StandardScaler fit_transform() function to standardize the features from the X DataFrame.

After completing Data Processing and Transforming steps above, an array of standardize the features will be as in Figure-1

Screen Shot 2022-06-05 at 2 47 39 PM

Figure-1

Reducing Data Dimensions Using PCA

PCA algorithm was applied to array in Figure 1 with three principal components then converted to a dataframe as in Figure-2

Screen Shot 2022-06-05 at 2 53 14 PM

Figure-2

Clustering Cryptocurrencies Using K-means

In this part, by utilizing Elbow Curve we find out how many clusters will be best fit for this data set that is the value of K-Means. The sharp turn at k=4 in Figure-3 indicates that 4 will be the number of clusters that our algorithm will use.

Screen Shot 2022-06-05 at 3 05 11 PM

Figure-3

Results

Created a new DataFrame including predicted clusters and cryptocurrencies features.

After using k=4 for K-Means model,a new data frame was created by joining PCA and Crypto dataframes and a column names Class added as showing in Figure-4

Screen Shot 2022-06-05 at 3 15 38 PM

Figure-4

3D-Scatter with Clusters

This 3D-Scatter in Figure-5 plot help us to see how classes was populated and distributed within 3 Principal Components.

Screen Shot 2022-06-05 at 3 18 21 PM

Figure-5

Tradable cryptocurrencies.

The table in Figure-6 contains coins that currently tradable which is a subset of dataframe in Figure-4

Screen Shot 2022-06-05 at 3 26 19 PM

Figure-6

Scatter Plot for Tradable cryptocurrencies.

Finally, the plot in Figure-7 is demonstrating the distribution of each cluster in 2D.

Screen Shot 2022-06-05 at 3 34 10 PM

Figure-7

In conclusion, a ML algorithm can help us to understand whether there is similarities among different cryptocurrencies and if show similar behaviors in the market. Thus, we can provide with visuals to help investors to make better decision on this trending market.

Resources

  • Data Source: crypto_data.csv
  • Software/Languages: Jupyter Notebook- Google Colab, Python.
  • Libraries: Scikit-learn, Pandas, Plotly. HvPlot.

About

Unsupervised machine learning models for the analysis of cryptocurrencies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published