The Domino Data Science Life Cycle is a modern life cycle approach. Domino Data Lab, a Silicon Valley vendor that provides a data science platform, crafted its data science project life cycle framework in a 2017 whitepaper. The paper wraps its life cycle around goals, challenges, diagnoses, system recommendations, and role definitions.
We’ll focus on the core life cycle outlined by Domino which:
- integrates agile principles with data science
- acknowledge the multiple team roles of a data science project
- extends the core data and modeling phases to first focus on the business problem and to finish with deployment or even operations
Data Science Life Cycle Training
There are several life cycles for data science. Knowing which one to use and how to integrate it with a collaboration framework can be challenging. Master your overall data science projects with our consulting services.
Overall Life Cycle Principles
Domino’s data science life cycle is founded on three guiding principles:
- “Expect and embrace iteration” but “prevent iterations from meaningfully delaying projects, or distracting them from the goal at hand”
- “Enable compounding collaboration” by creating components that are reusable in other projects
- “Anticipate auditability needs” and “preserve all relevant artifacts associated with the development and deployment of a model”
Six Stages of the Methodology
The core life cycle itself splits a project into six iterative stages that mirror those of CRISP-DM.
Domino Data Lab’s Life Cycle Methodology. See the last page of their whitepaper for a full-scale version.
I: Ideation
The initial phase puts the “problem first, not data first” by defining the underlying business problem and conducting business analysis activities such as current state process mapping, project ROI analysis, and upfront documentation.
It also incorporates common agile practices including developing a stakeholder-driven backlog and creating deliverable mockups. IT and engineering are looped in early and models might be baselined with synthetic data. The phase ends with a project kick-off.
Ideation mirrors the business understanding phase from CRISP-DM.
II: Data Acquisition and Exploration
Data science teams should identify data sources with help from stakeholders who can provide leads based on their intuition. Decisions are made to capture data or buy data from vendors. Exploratory data analysis is conducted, and the data is prepared for both the current project modeling and as re-usable components for future projects.
This phase incorporates many elements from the data understanding and data preparation phases of CRISP-DM.
III: Research and Development
Similar to the core modeling phase of CRISP-DM or any other data science process, this phase iterates through hypothesis generations, experimentation, and insight delivery.
True to agile principles, Domino recommends starting with simple models, setting a cadence for insight deliveries, tracking business KPIs, and establishing standard hardware and software configurations.
IV: Validation
This phase focuses on both business and technical validations and loosely mirrors the evaluation phase from CRISP-DM. True to its principle to “enable compounding collaboration”, Domino stresses the importance of ensuring reproducibility of results, automated validation checks, and documentation. The main goal of this phase is to “ultimately receiving sign-off from stakeholders”.
V: Delivery
This is when models become products. Deployment, A/B testing, test infrastructure, and user acceptance testing, similar to those of any software project, are in this phase. Domino recommends additional considerations such as preserving links between deliverable artifacts, flagging dependencies, and developing a monitoring and training plan.
The deployment phase of CRISP-DM is split between this phase and the last one.
VI: Monitoring
Given models’ non-deterministic nature, Domino recommends monitoring techniques that extend beyond standard software monitoring practices. For example, consider using control groups in production models so that you can continually monitor model performance and value creation to the organization. Moreover, automatic monitoring of acceptable output ranges can help identify model issues before they become too pervasive.
Evaluation and Comparison
Domino Life Cycle vs. CRISP-DM
Presented in a practical 25-page whitepaper as a series of coordinated best practices, Domino’s data science life cycle is significantly easier to read than CRISP-DM’s small-print 76 page guide.
Domino’s model is not as prescriptive as CRISP-DM in defining a technical methodology that spells out each individual step but rather is more informative to guide a team toward better performance.
It incorporates a team-based approach that overcomes one of the major shortcomings of CRISP-DM that implicitly assumes the project is executed by an individual or small team.
Moreover, its more modern view (2017) provides several guidelines that were not conceptualized in CRISP-DM (1999). Most notably, it leverages several agile practices such as short iterative deliveries, close stakeholder management, and a product backlog that are commonplace today.
Domino Life Cycle vs. TDSP
Perhaps the most well-known modern data science life cycle is Microsoft’s Team Data Science Process.
Compared to Microsoft, Domino does not provide the sets of re-usable templates and various artifacts that are on Microsoft’s Git Hub pages. However, this may not be a major factor as most teams would have to customize Microsoft’s artifacts anyway.
More importantly, Domino takes a more comprehensive approach in dedicating the end of its life cycle toward operations focus which is mentioned but not extensively detailed in Microsoft’s life cycle.
Domino’s life cycle should not be viewed as mutually exclusive with CRISP-DM or TDSP. Rather its “best practices” approach with “a la carte” elements could augment these or other methodologies as opposed to replace them.
Final Thoughts
Ad hoc teams or other teams with broken project management processes could use Domino’s approach as a good starting point to conceptualize a comprehensive modern project management methodology that effectively integrates data science, software engineering, and agile approaches.
However, the guide provides several straightforward recommendations that any team could benefit from.
And whether you adopt a Domino-inspired life cycle or another life cycle, be sure to an agile collaboration approach so that you can effectively manage the overall data science process.
Learn More
Lessons from 20 Data Science Team: I interviewed Mac Steele from Domino’s product team who authored much of the whitepaper. Learn his insights in this post.
Data Science Methodology Guide: There are a lot options for crafting a data science process. Explore this guide as a jumping point to understand various options and how to combine them.