Skip to content

Latest commit

Β 

History

History
106 lines (61 loc) Β· 7.64 KB

File metadata and controls

106 lines (61 loc) Β· 7.64 KB

Standard Bank

Standard Bank Data Science Virtual Experience Programme

Three tasks make up this virtual experience programme.

  • 1:Β SQL for Data Scientists
  • 2: Data Science With Python
  • 3: Preparing to Present
  • 4: Putting It All Together

♦ Task 1 - SQL for Data Scientists

A relationship with relational databases

Here is the background information

Standard Bank is embracing the digital transformation wave and intends to use new and exciting technologies to give their customers a complete set of services from the convenience of their mobile devices.

As Africa’s biggest lender by assets, the bank aims to improve the current process in which potential borrowers apply for a home loan. The current process involves loan officers having to manually process home loan applications. This process takes 2 to 3 days to process, upon which the applicant will receive communication on whether or not they have been granted the loan for the requested amount.

To improve the process, Standard Bank wants to make use of machine learning to assess the creditworthiness of an applicant by implementing a model that will predict if the potential borrower will default on his/her loan or not, and do this such that they receive a response immediately after completing their application.

You will be required to follow the data science lifecycle to fulfil the objective. The data science lifecycle includes the following:

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Modeling

  • Evaluation

  • Deployment.

CRISP-DM

Here is task

The beginning of any data science project involves understanding the business and the data. Business understanding is focused on setting the business objectives, project assessment, defining the success criteria and project planning. Understanding the business is essential to any data science project. After this, the next phase is Data Understanding. To understand the data, we have to identify, collect, analyze and verify data quality from one or multiple data sources to accomplish the business goals. These tasks can be done in SQL. SQL is a crucial skill to have as a data scientist, as you will be required to work with structured data stored in relational databases.

To complete this task, answer the multiple-choice quiz. This multiple choice consists of daily tasks data scientists complete using SQL. Start the quiz by clicking 'Click here to start the task' in section 5 below. Please note there are 5 multiple choice questions to complete in this task. Please be patient as each question loads.

♦ Task 2 - Data Science With Python

AutoML vs bespoke

Here is the background information

Now that you understand the CRoss Industry Standard Process for Data Mining (CRISP-DM), and have an idea of the business needs, it is time to understand the data, prepare for modelling and, of course, train a model. As a data scientist, it is important to know the tools that are available to you to perform your daily tasks and know when to use each tool. The tools available have advanced to automated techniques but automated is not always better. Your manager has been approached by some cloud service provider and promises that automated is the way to go. He wants you to investigate if that is indeed true.

Here is task

In this task, you will make use of automated machine learning as well as traditional machine learning. The manager of the home loans department, who has provided us with sample data, wants to know a few things about the data. The questions about the data can be found in the notebook provided in the Resources section below.

The manager is also interested in understanding what machine learning really is with particular use case.

We will use Python and its extensive collection of libraries to derive valuable insights from the data, prepare the data and train machine learning models - the old fashioned way and in newer, automated ways. We have provided two datasets in the form of CSVs and Jupyter Notebook. The notebook has been annotated to assist you.

♦ Task 3 - Preparing to Present

Back to understanding the business

Here is the background information

As a data scientist, you will be required to present your findings. This tests just how well you understood the business problem and business objectives. The manager of the home loans department, like many other project sponsors you might have to present to, has a limited technical background. Her interest is business-oriented, so you should discuss your results in terms of the business problem with minimal technical details.

In this task, you will prepare to present not only to the manager of the home loans department but also to your manager/team.

Here is task

To cover the above considerations, we recommend you structure the presentation in a similar manner to the template provided in Resources.

Other considerations include the following:

  • Keep it simple by using business terms
  • Visuals make things easier to understand
  • Define any terms that might be too technical before (or right after) you use/mention them
  • Try to tell a story

Alongside the template, we recommend some more specific things:

  • Include the data science life cycle. Define and explain how it works as well as how you used it.
  • The project overview shows the manager that you understood the business and goal. With the hypothesis, it is recommended to say something like β€˜β€™With ML, we can do (insert what you have done) such that the (insert business objective) is met”.
  • The process overview is a high-level view of how the problem becomes a solution. This shows through visuals or points, how the business problem is solved using machine learning. Prototype, demo, high-level architectures or wireframes showing how the end user will engage with the solution usually works best.
  • In the data section, it is always a good idea to discuss high-level information about the data you were working with. This includes the size of the data and the data types.
  • With the analysis, only show two or three interesting insights (remember that visuals make things easier)
  • In the modelling section, you are sharing what you tried (remember not to get too technical; use the model you trained, and mention that AutoML was also used)
  • With evaluation, draw up comparisons between the two but remember that this might be new to the home loans department manager.
  • For recommendations, this is an opportunity to impress your team lead by giving your input on if you should go with AutoML or Bespoke ML.

Note that the sections in the template can be mixed up to tell your story and that not all the analysis needs to be included.

♦ Task 4 - Putting It All Together

Presenting your insights to a non-technical audience

Here is the background information

As a data scientist, you will be required to present your findings. This tests just how well you understood the business problem and business objectives. Don’t forget that the manager of the home loans department has a limited technical background and is more focused on the business aspect. As you present, make sure to focus your discussion on the business aspect and not so much on the technical details.

In this task, you will prepare to present your findings. You will be presenting to the manager of the home loans department and also your manager and team.

Here is task

Film a 5 to 10-minute video presentation outlining your findings from the previous task. Upload it below once done. Remember to not be too technical!