1.4 CRISP-DM

Notes

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model. Was conceived in 1996 and became a European Union project under the ESPRIT funding initiative in 1997. The project was led by five companies: Integral Solutions Ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company:

Business understanding: An important question is if do we need ML for the project. The goal of the project has to be measurable.
Data understanding: Analyze available data sources, and decide if more data is required.
Data preparation: Clean data and remove noise applying pipelines, and the data should be converted to a tabular format, so we can put it into ML.
Modeling: training Different models and choose the best one. Considering the results of this step, it is proper to decide if is required to add new features or fix data issues.
Evaluation: Measure how well the model is performing and if it solves the business problem.
Deployment: Roll out to production to all the users. The evaluation and deployment often happen together - online evaluation.

It is important to consider how well maintainable the project is.

In general, ML projects require many iterations.

Iteration:

Start simple
Learn from the feedback
Improve

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke

Navigation

Machine Learning Zoomcamp course
Lesson 1: Introduction to Machine Learning
Previous: Supervised Machine Learning
Next: Model Selection Process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04-crisp-dm.md

04-crisp-dm.md

1.4 CRISP-DM

Notes

Navigation

Files

04-crisp-dm.md

Latest commit

History

04-crisp-dm.md

File metadata and controls

1.4 CRISP-DM

Notes

Navigation