Various process models and frameworks such as CRISP-DM, TDSP, Domino Data Labs Lifecycle, or Data Driven Scrum describe how to execute a data science project. While useful, such models do not explicitly explain how to communicate with stakeholders on what they care most about: what deliverables will they get through a project lifecycle.
In pre-project phases, the data science team (or the team lead) should work with the stakeholders to define, refine, and prioritize both end-product and interim deliverables. Then, the data science development team should focus on understanding the dependencies and glean insight into what is and is not feasible. After this, the team should organize the deliverables into a flow diagram that explains the likely overall flow of project deliverables. Even though the deliverables might change (especially if the team is using an agile framework), providing an initial set of expectations is an important step in helping to ensure effective communication between the stakeholders and the data science development team. For example, consider a customer churn prediction project for the customer retention department in a company with Platinum, Gold, Silver, and Free customer subscriptions. The flow diagram might look like…
You might be saying…But wait! Business Understanding or Data Preparation aren’t explicitly called out.
That’s the point.
Understanding and preparing data sets is not of direct value to the stakeholders. So instead of calling these technical phases out, focus on defining, delivering, and communicating byproducts of these processes that the stakeholders would value.
Phase I: Exploration
Focuses on Business Understanding and Data Understanding from CRISP-DM
The stakeholders explain what they wish they knew, that might be hidden in the data set. The data scientists investigate these requests and find interesting insights on their own. These findings are then shared with the stakeholders. This exploratory iteration is repeated until the data scientists have a strong enough understanding of the data and the business problem. At the end of this phase, the team should collectively agree to stop the project if it does not seem promising.
Phase II: MVP
Focuses on Data Preparation, Modeling, and Evaluation from CRISP-DM. MVP = Minimal Viable Product
This phase’s objective is to test: 1) whether the data scientists can sufficiently model the question, and 2) whether the business finds enough value in the model output. To accomplish this, projects will typically loop through a series of steps multiple times, with the focus on each iteration doing an analysis, using the analysis and then understanding if the analysis was useful (similar to SKI’s create, observe, analyze steps). For example, the team might:
- Train and test a scaled-down model that focuses on the most important subset area of stakeholder interest
- Design an experiment by setting up experimental and test groups
- Score the new data sets and deliver the results in a means as simple as possible, such as manually delivering a file into a database
- Conduct A / B testing, ideally in the wild on a limited set of real customers and systems
- Collect and analyze the results
If the data scientists can demonstrate the effectiveness of their model and the stakeholders agree to use the output to inform business decisions, then proceed to the Future Phases. The team might loop back and do additional iterations or the project might postpone or cancel doing additional analysis if the project no longer looks feasible or worthwhile.
At the start of a project, avoid calling this Phase III because the reality is that once the core model is shown to be of value, the project likely opens into numerous paths and numerous phases. As you exit Phase II, define the next project phase. Common options, in no particular order, are:
- Automate the system (Deployment from CRISP-DM)
- Automate reporting and visualization (Deployment and Evaluation)
- Improve the existing model or extend the model to broader use cases (which might kick off a whole new cycle of analysis)
Key Items to communicate with the Stakeholders
It is important that the stakeholders understand that the diagram is a simplification of the real process. For example, thye should understand that:
- The diagram is not static. Rather, project progression will shift based on both the lessons learned from the data and on changing stakeholder needs.
- Each iteration might spawn new analyses (or desired features). For simplicity, only a few key potential iterations were discussed.
- Checkpoints are built-in with each deliverable. For simplicity, only two major ones are called out.
Define upfront your best guess as to how the project may progress, communicate this visually and in conversation, and update the project roadmap according to changing business needs and lessons learned along the way.
Data Science Process Alliance: Given the requests we’ve had for training, Jeff and I have helped launch the DSPA which can help you learn effectively how to develop and execute effective data science project roadmaps.
Learn more about Workflows: This post is part of a post series where you can: