CRISP-DM: A Data Scientist’s Secret Weapon
5 reasons why CRISP-DM should be in your Data Science toolbox
If you’ve ever been a new data scientist, you might have been in this situation:
You are meeting with people from the ‘business side’ and they are introducing you to a problem that they think can be solved with data. They ask if you can build a model for them and what might it look like.
Or maybe they don’t care what the model looks like, but they would also like you to also build a web app — how long will that take?
Or maybe they are used to the game, they know that timelines in data science are not straightforward and they just send you off with a dataset and a vague instruction of “we think this dataset might be interesting, can you pull some insights from it?”
Nine times out of 10, in these initial discovery meetings, my manager used to pump the brakes and pull up his favorite slide (which he might as well have laminated and kept in his pocket).
It was the CRISP-DM diagram.
Enter: CRISP-DM
It’s not a new idea: According to the Wikipedia article on CRISP-DM, the Cross Industry Standard Process for Data Mining (CRISP-DM), was developed between 1996 and 1999 by a consortium of companies with funding from the European Union. With over two decades of diverse industry usage, you can rest assured it’s a well-tested tool.
Specifically, it’s a tool that breaks the data mining process into a framework of 6 well-defined phases:
- Business Understanding
- Data Exploration
- Data Prep
- Data Modelling
- Model Evaluation
- Deployment
(Read my very brief intro to CRISP-DM here.)
If you don’t have a manager watching your back and saving you from one of the biggest pitfalls of data science projects — Unrealistic Expectations — then, I would like to share with you some reasons to adopt this secret weapon.
1. CRISP-DM is an easy-to-use, step-by-step framework
If you use the CRISP-DM diagram as your guide, you will always know where you are and where you will be going next. If you are in the beginning stages of a project, you can look at the diagram and know that you need to be focused on understanding the datasets and the business problem-not on modeling, evaluation, and deployment.
The diagram really helps paint an easy-to-use picture for both the data professional and the business stakeholders about which step of the project we are on and where we are going next, like a map.
2. The CRISP-DM framework is flexible
The beauty of CRISP-DM is that it is not intended to be a linear workflow-you don’t have to go from one phase to the next-you can always return to a previous phase and iterate between phases as needed.
For example, if you have moved from Business Understanding →Data Understanding → Data Preparation, and you are now in the Data Modeling phase, chances are that you will run into questions that require you to go back to stakeholders and clarify the Business Problem or talk to subject matter experts and understand more about the data.
This is the iterative nature of a data project and it is to be expected. By showing the interconnected arrows on the diagram, this back-and-forth process becomes the expectation for the data professional and for the business stakeholders.
3. CRISP-DM can be applied throughout the whole data science project lifecycle
No matter how mature your organization is, no matter how complicated your data science project, or how big your dataset is: CRISP-DM does not discriminate.
It is a very widely applicable tool no matter where in the project lifecycle you are.
Are you picking up a project that someone else started — No Problem.
Are you working on a brand new pilot project — Perfect!
Are you iterating on a model that’s been in production for a year — it’s all fair game :)
4. CRISP-DM can fit within your current project management methodology
Whether you are using Agile, Scrum, Kanban or some other project management (PM) methodology, you can still use CRISP-DM. You can create tasks, sprints, etc. aligned with the phase of CRISP-DM you are focused on. You can benefit by combining PMt methods with CRISP-DM because they will give you a tool to measure how much time is spent on each phase, how many times you iterate through phases on a project. By using PM tools (e.g., JIRA), you can gather metrics and analyze your data science process over time.
5. CRISP-DM provides a systematic approach to communicating about your project
This probably should have been the number one reason to use CRISP-DM. When you have a standardized workflow and you can clearly show stakeholders and managers where you are in a data science project, everyone is benefited. The business side can understand at a high level what part of the problem you are working on or struggling with.
Have you been stuck in the data understanding phase for a while? Is there a lack of documentation or a need for additional subject matter experts to weigh in?
Have you reached the evaluation phase and aren’t getting the desired results? Maybe the whole team needs to revisit the data understanding phase and determine if the organization has enough of the right data to achieve success.
I have used CRISP-DM as an outline for presentations to communicate project overviews and status updates. The consistency benefits everyone: it’s easy for me to create the deck and it’s easy for stakeholders to consume and ask meaningful questions.
When everyone has the same process in mind, it enhances communication and collaboration between the data workers and the decision-makers.
Conclusion
TL;DR
Here are some parting thoughts about how and why to make CRISP-DM your secret weapon:
- Use it to communicate effectively about where you are in the data science life cycle.
- Organize your slide deck around the CRISP-DM process to craft clear and concise presentations.
- Use it to help prioritize the mission-critical steps of Business Understanding and Data Understanding.
- Upskill your stakeholders in the framework so they understand where you are at in the process or why you might be circling back to a previous phase.
- Begin thinking about your tasks in terms of CRISP-DM in order to prioritize and stay on track
- Take notes organized by CRISP-DM phases, then when you iterate through the phases again, you can revisit your notes for that phase.
Hope this helps convince you to add CRISP-DM to your toolbox, or to inspire you to re-visit it for those already familiar with the process! Please leave a comment and let me know: Do you use CRISP-DM? Have you found problems with the process? Have you adapted it in a unique way? Do you want to learn more?
Originally published at https://datalabnotes.com on December 5, 2022.