Power BI + Python + Machine Learning Forecasting

My director asked one of my colleagues and I to try building a machine learning model to predict revenue, using either R or Python. The key idea was to spend time investing in research for building a scalable model that could be used for potential future engagements with clients to add greater value. This was a great challenge not only putting together concepts I taught myself in an online Python machine learning course, but also think strategically about our work and what we could help customers unlock with their information for better decision making.

Live Demo

Project Details

Power BI + Python + Machine Learning Forecasting

Details

In order to show clients the art of what was possible with machine learning and analysis, my colleague and I had to find datasets that would be funneled into the model to predict revenue, and we also needed raw revenue numbers to test our models against. I used Python as the base for this project, and found a handful of free economic sources to predict Walmart revenue data available via press releases. I first tested out all API and web scraping connections with Jupyter notebooks to ensure that data was being collected accurately per month, and tested merging all datasets to overlap on the same months. From there I branched into linear and KNN regression models and tested each for model strength and to display how accurate they were predicting Walmart's revenue based on various economic factors. The key challenge at the end was to combine my Python code with Power BI and it felt great merging my code within this environment to make refreshing easy, plus being able to quickly view how each model performed. I also included an exploratory tab to view the merged dataset and how the various factors stack up.

Economic datasets gathered with Python:

GDP from US Bureau of Economic Analysis
Price index from US Bureau of Economic Analysis
Gas prices from US Energy Information Administration
Annual commercial carbon-dioxide emissions from US Energy Information Administration
Population from US Census Bureau

Files included for view in this project:

Power BI + Python + Machine Learning Forecasting.pdf
01 - Revenue - Linear Regression.ipynb: Jupyter notebook containing linear regression ML code
02 - Revenue - KNN Regression.ipynb: Jupyter notebook containing KNN regression ML code

By the Numbers

2 months of development time
2 colleagues collaborated with
3 report pages
6 data sources
6 queries connected to data sources

Tools Used

Excel
Power BI
- DAX
- Power Query
Python

Data Engineering Pipeline

Data Model

What I learned

Below are some code snippets I'm proud of from this project:

Web scraper for getting GDP data from US Bureau of Economic Analysis

# Initial variables
dates = []
values = []

# Getting data from Bureau of Economic Analysis API, 1 year at a time
url = "https://apps.bea.gov/api/data"
parameters = {
    "UserID": "90974B6B-A7A5-4CBF-A6F9-E553BEEB0B25",
    "method": "getdata",
    "datasetname": "GDPByIndustry",
    "ResultFormat": "JSON",
    "Frequency": "Q",
    "Industry": "ALL",
    "TableID": "1",
    "Year": "ALL"
}

response = requests.get(url=url, params=parameters)
response.raise_for_status()
data = response.json()["BEAAPI"]["Results"][0]["Data"]

for d in data:
    if d["IndustrYDescription"] == "Gross domestic product":
        quarter = d["Quarter"]
        month = 0

        if quarter == "I":
            month = 1
        elif quarter == "II":
            month = 4
        elif quarter == "III":
            month = 7
        elif quarter == "IV":
            month = 10

        for x in range(3):
            date = str(month) + "/1/" + d["Year"]
            dates.append(date)
            values.append(float(d["DataValue"]))
            month += 1

# Make GDP dictionary, assign to a DataFrame, convert to csv
data_dict = {
    "Date": dates,
    "GDP_Value": values
}

df_gdp = pd.DataFrame(data_dict)

Setting up training and testing sets

x = df.drop(['Revenue_Value', "Date"], axis=1).values
y = df['Revenue_Value'].values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=101)

Plotting predictions vs actuals

g = sns.jointplot(x=y_test, y=predictions, data=None, kind='reg', joint_kws={'line_kws':{'color':'orange'}})
g.set_axis_labels('Actual', 'Predictions', fontsize=12)

Useful Resources

Running Python in Power BI - Handy guide from Microsoft regarding getting Python set up within Power BI, and to use as a data source
Python Visualizations in Power BI - Another Microsoft guide to run Python scripts on report pages to create visualizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Power BI + Python + Machine Learning Forecasting

Live Demo

Project Details

Details

By the Numbers

Tools Used

Data Engineering Pipeline

Data Model

What I learned

Useful Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

Power BI + Python + Machine Learning Forecasting

Live Demo

Project Details

Details

By the Numbers

Tools Used

Data Engineering Pipeline

Data Model

What I learned

Useful Resources