Data Science Process Choices

Data Science Process Frameworks

Most IT project management sites focus on software engineering. To our knowledge, there is no comprehensive guide focused on data science.

So gathering information from interviews, our industry experience, and third-party sources, we have developed a project management guide dedicated to data science with the goal to arm practitioners with a better understanding of various project management methodologies used by data science teams and how these might apply to their data science teams.

The core content describes the most common approaches for data science, evaluates their fit, and provides recommendations for their use.

Approach Description Strengths Challenges Best For…
Traditional Approaches
Ad Hoc • Just do it! • Quick to get started • Not scalable
• High re-work risk
• Difficult for teams
• Small, one-off projects done by one person (Rarely a fit for data science)
Waterfall • Set your plan up-front in detail, lock it in, and follow your plan • Simple, well-organized, and easily understood
• Matches traditional corporate culture
• Inflexible
• Not suitable for data discovery processes
• Delayed testing phases increase risk
• Heavy documentation
• When requirements and technology are known and aren’t likely to change (Rarely a fit for data science)
CRISP-DM • Break data science process into six iterative phases • Natural process for data science
• Easy to use
• “De facto” process w/ long track record
• Does not prescribe teamwork processes
• No update since 1990s
• Phased approach like waterfall
• Individuals or small teams
• Teams looking for an established practice
• Use with agile processes
Agile Approaches
Scrum • Develop potentially shippable increments during short, iterative cycles
• Empower teams
• Adaptive
• Strong customer feedback loop
• Builds sense of team ownership
• Challenges cultural norms
• Adhering to sprint time-boxing
• Challenging to implement
• Agile teams who need discipline provided by fixed time cycles
• Radical innovation cultures
Kanban • Visualize workflow
• Decrease cycle times and work in progress
• Implement small, continuous changes
• Very flexible
• Easy to use
• Improves coordination
• Does not prescribe customer interaction
• Kanban columns tricky for data science
• Teams transitioning to agile
• Process-oriented teams who don’t need many prescribed practices
Hybrid Approaches
Bimodal: Waterfall-agile • Combine best practices from waterfall and agile • Can be tailored to specific team needs • Often poorly implemented
• Negative reputation
• Specific situations like highly-regulated projects that require some waterfall elements
Research & Development • Treat data science as “research”
• Once problem is understood, then transition to “development”
• Does not try to force a methodology onto data science
• Comfortable for data scientists
• Difficult to monitor
• High trust and discipline required
• Could suffer from being too ad hoc
• Mature teams who don’t need heavy oversight
• Research-focused teams needing freedom
Emerging Approaches
TDSP • Combine CRISP-DM and Scrum practices and tailor to data science • Comprehensive open-source documentation that defines processes, templates, and team roles • Not yet publicly vetted with success track record • Medium-large projects whose teams seek a well-defined process to follow
Domino Data Lab • Combine CRISP-DM and general agile practices and tailor to data science • Similar to TDSP but with less role and template definition • Not yet publicly vetted with success track record • Medium-large projects whose teams seek a process flow without full definition

Next: Traditional Approaches >

Training and Certification

Data Science Process Alliance

Combining data science process research with industry-leading agile practices, the Data Science Process Alliance is the leading data science process membership, training and certification organization.

Master data science projects through the Data Science Team Lead course or explore our consulting services.