Data Driven Scrum
Well-known agile frameworks often fail to accomodate the unique needs of data science projects.
However, Data Driven Scrum™ (DDS) is an agile framework specifically designed for data science teams. DDS provides a continuous flow framework for agile data science by integrating the structure of Scrum with the continuous flow of Kanban.
What is Data Driven Scrum?
This new data driven agile approach combines many agile principles with data science life cycle. Specifically, it leverages many of the same underlying practices of Scrum and Kanban and applies them to support data science teams.
DDS can be viewed as a specific instantiation of Scrum with two notable exceptions:
- The most important exception is that the Scrum Guide requires all iterations (sprints) to be of equal length in time. However, iterations in DDS vary in duration to allow a logical increment of work to be done in one iteration (rather than defining the amount of work that can be done in a specific unit of time).
- The other notable exception is that retrospectives and item reviews are not done at the end of every iteration, but rather, on a frequency the team deems appropriate.
DDS also adheres to the Kanban principles (e.g., there is a Kanban board, teams need to limit WIP, and work items flow across the board). However, the framework provides more structure than defined by Kanban, such as defined iterations as well as a more defined framework (ex. roles and meetings). Having a more clearly defined process that leverages agile best practices, will enable teams to implement the process in a more consistent and repeatable manner.
The DDS Guide is the definitive guide to DDS. However, below we’ll explore the key concepts of the DDS framework.
DDS Training for Data Science Teams
If you are interested in learning how to use Data Driven Scrum (or Scrum or Kanban) to deliver data science projects, explore individual training and corporate consulting options through the Data Science Process Alliance.
Key Tenets of Data Driven Scrum
- Agile is Iterative Experimentation
Agile is intended to be a sequence of iterative experimentation and adaptation cycles.
- Iterations are Capacity-Based
Teams work iteratively on a given set of items until they are done (no inflexible deadlines).
- Focus on Create, Observe, Analyze
Each iteration always follows three core steps: Create something, observe its performance, and analyze the results.
- Easily Integrate with Scrum
DDS’s interfaces can be seamlessly integrated within a traditional Scrum-based organization.
Data Driven Scrum differences from Traditional Scrum
- Functional Iterations
DDS iterations have unknown and varying length iterations (as compared to traditional Scrum sprints, which have fixed-time durations). This enables iterations that might make sense to be shorter or longer than average (e.g., an iteration might be shorter than normal due to being able to learn from a quick / short experiment).
- Uncertain Task Duration
Unlike traditional Scrum (which requires accurate task estimations to know what can fit into a sprint), DDS naturally accommodates tasks that are difficult to estimate (and task estimation is often difficult within a data science context).
- Collective Analysis
The entire team focuses on creating, observing and then analyzing an hypothesis, analysis or feature (often in traditional scrum, this analysis is done by the product owner outside of the codified process).
- Iteration-Independent Meetings
Retrospectives and item reviews and not done at the end of every iteration (as is done in traditional Scrum), but rather, on a calendar-based frequency the team deems appropriate.
Similarities with Traditional Scrum
- Similar Roles
Just like traditional Scrum, each DDS team is a group of up to about ten people, one of whom is the product owner, and one of whom is the process expert.
- Similar Events
Just as in traditional Scrum, there is a daily stand-up, as well as Iteration and Retrospective Reviews.
- Similar Process to create and prioritize Items
Just like traditional Scrum, items are created, prioritized and viewed on a task board.
An Example Project
A Data Science team was working on a project to analyze a large data set of customer survey responses for a client. The initial requirements for the project were very high level. Specifically, the team had a goal of “helping the management team understand the customer surveys and what drives customer satisfaction.” Hence, the team had to refine their goals (requirements) as they incrementally understood the data and what might be possible, in terms of actionable insight generated via data analytics.
To do the analysis, the team was required to leverage many typical data science techniques, such as descriptive statistics, machine learning algorithms and geographic information analysis.
A team was comprised of a product owner, a Process Expert and four DDS team members. The Process Expert and the product owner were part-time within each team. As expected, during the project, the Process Expert helped the team adhere to the DDS framework.
The DDS team worked collectively to determine:
- What specifically needed to be done during an iteration?
- What data should specifically be observed and analyzed?
- What would be required to collect and analyze the information generated from that iteration?
When grooming and prioritizing, the team estimated how much effort was required to run a specific experiment (i.e., perform one cycle of create, observe and then analyze). This estimation was done at a high level (with high, medium and low estimates). Then, during product backlog selection, the team collectively reviewed their product backlog items to come up with a specific experiment to run.
An example item on the team’s product backlog was to explore customer satisfaction by age. This task was broken down to explore age via overall customer satisfaction, as well as satisfaction by geography (e.g., per each state in the United States). The team determined that the item required four tasks on the board:
- Two related to data munging
- One to calculate customer satisfaction across different loyalty levels by age
- One to explore customer satisfaction by age from a geographic basis.
This experiment (item) was prioritized as important because the team hypothesized that age might be an important characteristic of customer satisfaction.
Furthermore, based on previous experiments (iterations), loyalty level was deemed to have been potentially interesting. Once it was clear how the team was going to create, analyze and observe their experiment, the team began their iteration.
During this iteration (and all other iterations), the team’s board was defined with the following columns (“to do”, “in progress”, “validate”, “done”). The teams used these columns since there was the belief ensuring the validation of the task was to be done for all tasks. Each day, the team had their daily standup to identify issues and roadblocks. Note that, due to a variety of logistical issues, this was not always done via a face-to-face meeting. This specific iteration took 1.5 days.
Iteration Review Meeting
Since the team had agreed to have an iteration review meeting every two weeks, once the iteration had been completed, at their next iteration review meeting, the team discussed their findings and came to a consensus on some possible next experiments which were then added to the product backlog. The team also discussed the results of another iteration, which had taken 7 days to complete.
Results from Retrospective
For this team, retrospectives occur monthly. The team collectively agreed that, in order to be clear if a task was focused on create, observe of analyze, the type of task was explicitly color-coded for future iterations.
- Understand data driven agile
- Review the general concept of agile data science
- Explore training and certification on all of these (and other) data science processes with the Data Science Process Alliance