No Results Found
The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.
The field of data science has matured greatly in the past decade. And yet, teams often struggle to apply an appropriate data science methodology and team-based collaboration framework. Consider the following three issues:
Ad hoc processes focus on delivering a specific implementation without concern for broader impact or repeatable processes. In short, you can just “wing it”.
This approach may work well for one-off, smaller, and low-impact projects. Think of a toy side project or an academic exercise.
Yet, the appropriate use cases for ad hoc in the real world are becoming less frequent. Unfortunately, many people still just result to Ad Hoc.
Approach | Description | Strengths | Challenges | Best For… |
---|---|---|---|---|
Ad Hoc | Just do it! | • Quick to start | • Not scalable • High rework risk • Difficult for teams |
• Small one-off project by one person |
A data science life cycle (also known as a data science methodology) describes the step-by-step approach you take to deliver a project. Data scientists (even if they have not explicitly studied various methodologies) intuitively understand these steps. Documenting them can help increase repeatably and prevent you from forgetting a step. This is increasingly important in the world of distributed teams that extend beyond data science to areas such as legal or business.
There are dozens of different defined data science methodologies. This guide explores the most well-known.
Approach | Description | Strengths | Challenges | Best For… |
---|---|---|---|---|
Waterfall | Plan your work. Work your plan | • Easily understood • Matches traditional corporate culture |
• Inflexible • Delays testing • Documentation heavy |
• Avoid for data science |
KDD | 5 Phases from Selection to Evaluation | • Decent explanation of core data mining technical project | • Outdated • Ignores teams • Many same shortcomings as Waterfall • Ignores biz understanding & deployment |
• “Toy” projects with a well-defined scope that don’t need productized |
SEMMA | 5 Phases from Sample to Assess | • Decent explanation of core data mining technical project | • Outdated • Ignores teams • Many same shortcomings as Waterfall • Ignores biz understanding & deployment |
• “Toy” projects with a well-defined scope that don’t need productized |
CRISP-DM | 6 Phases from Business Understanding to Deployment | • Well-known • More comprehensive than KDD, SEMMA • Defined guide |
• Outdated • Ignores teams • Many same shortcomings as Waterfall |
• Teams looking for an established practice |
TDSP | Combines CRISP-DM and Scrum practices | • Comprehensive open-source documentation | • Includes Agile concepts • Strong team focus |
• Teams looking to “modernize” CRISP-DM |
Domino | Combines CRISP-DM and Agile practices | • Visual roadmap with clear flow and decision points • Includes practical tips |
• More of a concept as opposed to a fully vetted approach | • Teams looking to “modernize” CRISP-DM |
Others | Lesser-known life cycles | • Each includes a novel viewpoint | • Not well-known or vetted | • Good “food for thought” |
Agility has taken over the software engineering world. Yet, it gets a mixed review for data science.
However, Agile and data science should go hand-in-hand. Don’t focus too much on the specific approach, but rather start with the fundamental principles your team aspires. From there, build a framework on top of it that defines how you can sustain team collaboration while also being flexible enough to shift the project’s focus.
Here are three agile frameworks that you can consider. Kanban is borrowed from manufacturing. Scrum from software. And Data Driven Scrum was designed specifically data science.
Approach | Description | Strengths | Challenges | Best For… |
---|---|---|---|---|
Kanban | Visualize flow. Minimize work-in-progress. | • Simple • Combines well with other frameworks • Maximizes throughput • Minimizes waste |
• Least definitive • Lots of ambiguity |
• Starting with a solid core set of principles and building a framework on top of it |
Scrum | Well-known Agile approach focused on fixed-length iterations | • Quick, incremental value focus • Well-defined feedback loop • Strong team focus |
• Time-boxing can be restrictive • Often poorly implemented • Management might get in the way |
• Agile teams who need discipline provided by fixed time cycles • Radical innovation cultures |
Data Driven Scrum | Agile framework specifically designed for data science teams | • Most of same benefits of Scrum and Kanban •Caters to experimentation • Relaxes Scrum pain points |
• Not as vetted as Scrum • Adds challenges of managing concurrent iterations |
• Teams with strong experimental culture • Data science teams that struggled with Scrum |
The reality is that you can mix and match various approaches to design a comprehensive methodology that best suits your team, projects, and organizational needs.
This guide highlights two such hybrid approaches — each serving different use cases.
Approach | Description | Strengths | Challenges | Best For… |
---|---|---|---|---|
Waterfall-Agile | Attempts to combine best of Agile and waterfall | • Allows for some flexibility while catering to broader constraints | • “Best of both worlds” can water down the advantages from either | • Highly-regulated projects that require rigid administrative processes |
Research & Development | Combines open-research phases followed by structured development | • Gives flexibility for open-ended research • Adds structure when needed to coordinate deliverables |
• Difficult to monitor • Can suffer from ad hoc chaos • High trust and discipline required |
• Mature teams who don’t need heavy oversight • Research-focused teams needing freedom |
The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.
Data science is unique. It's time to start managing it as such.
Get the jumpstart guide to manage your next project better.
Plus get monthly tips in data science project management.