Team Data Science Process (TDSP)

What is TDSP?

If you combine Scrum and CRISP-DM, you would get something that looks like Microsoft’s Team Data Science Process. Launched in 2016, TDSP is “an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently.” (Microsoft, 2020 ).

This is a modern data science life cycle that incorporates the team aspect of executing projects.

“TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives.”Microsoft, 2020

Combining modern software and agile practices with the data science lifecycle, TDSP is comprehensive with four major components:

  • A data science lifecycle definition
  • A standardized project structure
  • Recommended infrastructure and resources
  • Recommended tools and utilities

It comes as no surprise that these two often leverage Microsoft Azure; however, a team could use other tech stacks and still adhere to TDSP.

TDSP Training

If you are interested in learning how to use TDSP (and other frameworks) to deliver data science projects, explore the individual and corporate training options at the Data Science Process Alliance.

TDSP Life Cycle

Microsoft Team Data Science Process
Team Data Science Lifecycle (Based on Microsoft, 2020)

Although the lifecycle graphic looks quite different,  TDSP’s project lifecycle is like CRISP-DM and includes five iterative stages:

  1. Business Understanding: define objectives and identify data sources
  2. Data Acquisition and Understanding: ingest data and determine if it can answer the presenting question (effectively combines Data Understanding and Data Cleaning from CRISP-DM)
  3. Modeling: feature engineering and model training (combines Modeling and Evaluation)
  4. Deployment: deploy into a production environment
  5. Customer Acceptance: customer validation if the system meets business needs (a phase not explicitly covered by CRISP-DM)

Team Definition

TDSP addresses the weakness of CRISP-DM’s lack of team definition by defining six roles:

  • Solution architect
  • Project manager
  • Data engineer
  • Data scientist
  • Application developer
  • Project lead

Microsoft defines the relevant tasks and artifacts for many of the team roles during each phase of the project life cycle.

Standardized Resources

Microsoft also provides standardized project documents such as project charters and data reports, infrastructure and resources for data science projects, and tools and utilities for project execution. The use of some of these artifacts is also mapped to the five phases.

Evaluation

Pros

  • Agile: Emphasizes the need for incremental deliverables.
  • Familiar: The product backlog, features, user stories, bugs, Git versioning, and sprint planning are familiar to those used to common software practices.
  • Data Science Native: TDSP acknowledges that data science and software engineering are different, and is built for data science teams working on production-bound projects.
  • Flexible: TDSP can be implemented as it is defined or in conjunction with other approaches such as CRISP-DM.
  • Thorough: Because of its rich team focus and detailed documentation, TDSP is arguably the most mature CRISP-derived project management approach. It is conceptually similar to Domino Data Lab’s Lifecycle but is more detailed.
  • Free Templates: Go to Microsoft Azure’s GitHub repository to get started.

Cons

  • Fixed Sprints: TDSP leverages fixed-length planning sprints which many data scientists struggle with.
  • Some Inconsistencies: Not all of Microsoft’s documentation is consistent.

Bottom Line

TDSP is a good option for data science teams who aspire to deliver production-level data science products. It may not be appropriate for one-team data scientists or for projects without a production goal.

Learn More

55 Min Video from Microsoft Machine Learning and Data 2016 Science Conference

<Previous: Emerging Approaches | Next: Domino Data Lab’s Lifecycle Methodology >

References

Become a Data Science Leader

Master the skills and gain the confidence to deliver data science projects and to lead data teams. Grow with the Data Science Process Alliance’s consulting and certification programs.