Skip to content

Latest commit

 

History

History
91 lines (64 loc) · 3.46 KB

README.md

File metadata and controls

91 lines (64 loc) · 3.46 KB

Data Curation & Preparation


📌 Requirements

  • Availability of historical & latest data
  • From credible and reliable source
  • Contains relevant variables for manipulation

Data Curation and Preparation

📑 More About Data Sources

  • Financial Data - Yahoo Finance / yfinance

    Scraped from yfinance - a Python library that fetches current and historical stock market price data from Yahoo Finance, and so much more.

    BTC-SearchTrend.csv (Partial)

    COLUMN DESCRIPTION TYPE
    DATE shown in YYYY-MM-DD format DatetimeIndex
    OPEN the price at the market start float64
    CLOSE the last price of the day float64
    HIGH the highest price on that day float64
    LOW the lowest price on that day float64
    VOLUME the number of shares traded int64

  • Search Trends Data - Google Trends / pytrends

    Pytrends is an unofficial Google Trends API that provides different methods to download reports of trending results from google trends.

    trend-2017_2021.csv

    COLUMN DESCRIPTION TYPE
    dtime shown in YYYY-MM-DD HH:MM:SS format datetime64[ns]
    bitcoin search percentile index for term 'bitcoin' int64
    cryptocurrency search percentile index for term 'cryptocurrency' int64
    isPartial true if the data point is complete for that particular date int64


🔬 Data Preparation

trend-2017_2021.csv

  • To conform to time-series data formats, we pivoted the data to split the dtime into separate columns of date and time

  • Generated daily average search percentile from the mean of grouped hourly entries by date

    • .groupby(['Date']) = perform the equivalent of SQL’s GROUP BY operation
    • .mean() = aggregate function to find the average

BTC-SearchTrend.csv

  • By-product of the inner join merger with trend-2017_2021.csv using the DATE value

    • .merge() = merge/join datasets
    COLUMN DESCRIPTION TYPE
    DATE shown in YYYY-MM-DD format DatetimeIndex
    OPEN the price at the market start float64
    CLOSE the last price of the day float64
    HIGH the highest price on that day float64
    LOW the lowest price on that day float64
    VOLUME the number of shares traded int64
    BITCOIN daily mean search percentile index for term 'bitcoin' int64
    CRYPTOCURRENCY daily mean search percentile index for term 'cryptocurrency' int64

BTC-SearchTrend dataset preview

Dataframe preview of BTC-SearchTrend.csv