Team Data Science Process (TDSP)

What is TDSP?

If you combine Scrum and CRISP-DM, you would get something that looks like Microsoft’s Team Data Science Process. Launched in 2016, TDSP is “an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently.” (Microsoft, 2020 ).

Microsoft explains that “TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives.”

Combining modern software and agile practices with the data science lifecycle, TDSP is comprehensive with four major components:

  • A data science lifecycle definition
  • A standardized project structure
  • Recommended infrastructure and resources
  • Recommended tools and utilities

It comes as no surprise that these two often leverage Microsoft Azure; however, a team could use other tech stacks and still adhere to TDSP.

Project Lifecycle

Microsoft Team Data Science Process (TDSP)
Microsoft Team Data Science Lifecycle (Microsoft, 2020)

Although the lifecycle graphic looks quite different,  TDSP’s project lifecycle is like CRISP-DM and includes five iterative stages:

  1. Business Understanding: define objectives and identify data sources
  2. Data Acquisition and Understanding: ingest data and determine if it can answer the presenting question (effectively combines Data Understanding and Data Cleaning from CRISP-DM)
  3. Modeling: feature engineering and model training (combines Modeling and Evaluation)
  4. Deployment: deploy into a production environment
  5. Customer Acceptance: customer validation if the system meets business needs (a phase not explicitly covered by CRISP-DM)

Team Definition

TDSP addresses the weakness of CRISP-DM’s lack of team definition by defining six roles:

  • Solution architect
  • Project manager
  • Data engineer
  • Data scientist
  • Application developer
  • Project lead

Microsoft defines the relevant tasks and artifacts for many of these roles during each phase of the project lifecycle (Microsoft, 2020):

TDSP - Data Science Teams
Team Roles mapped to each lifecycle phase (Microsoft, 2020). See larger version on Microsoft site

Standardized Resources

Microsoft also provides standardized project documents such as project charters and data reports, infrastructure and resources for data science projects, and tools and utilities for project execution. The use of some of these artifacts is also mapped to the five phases.

Evaluation

Pros

  • Agile: Emphasizes the need for incremental deliverables.
  • Familiar: The product backlog, features, user stories, bugs, Git versioning, and sprint planning are familiar to those used to common software practices.
  • Data Science Native: TDSP is designed specifically for data science teams working on production-bound projects.
  • Flexible: TDSP can be implemented as it is defined or in conjunction with other approaches such as CRISP-DM.
  • Thorough: Because of its rich team focus and detailed documentation, TDSP is arguably the most mature CRISP-derived project management approach. It is conceptually similar to Domino Data Lab’s Lifecycle but is more detailed.
  • Free Templates: Go to Microsoft Azure’s GitHub repository to get started.

Cons

  • Fixed Sprints: TDSP leverages fixed-length planning sprints which many data scientists struggle with.
  • Some Inconsistencies: Not all of Microsoft’s documentation is consistent. Looks like someone needs to version control it!

Bottom Line

TDSP is a good option for data science teams who aspire to deliver production-level data science products. It may not be appropriate for one-team data scientists or for projects without a production goal.

<Previous: Emerging Approaches | Next: Domino Data Lab’s Lifecycle Methodology >

References