CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model. Was conceived in 1996 and became a European Union project under the ESPRIT funding initiative in 1997. The project was led by five companies: Integral Solutions Ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company:
- Business understanding: An important question is if do we need ML for the project. The goal of the project has to be measurable.
- Data understanding: Analyze available data sources, and decide if more data is required.
- Data preparation: Clean data and remove noise applying pipelines, and the data should be converted to a tabular format, so we can put it into ML.
- Modeling: training Different models and choose the best one. Considering the results of this step, it is proper to decide if is required to add new features or fix data issues.
- Evaluation: Measure how well the model is performing and if it solves the business problem.
- Deployment: Roll out to production to all the users. The evaluation and deployment often happen together - online evaluation.
It is important to consider how well maintainable the project is.
In general, ML projects require many iterations.
Iteration:
- Start simple
- Learn from the feedback
- Improve
The notes are written by the community. If you see an error here, please create a PR with a fix. |