Data Driven Scrum
Data Driven Scrum™ (DDS) is an agile framework specifically designed for data science teams. In short, DDS aims to improve a data science team’s collaboration and communication.
The Data Science Process Alliance created Data Driven Scrum to address the fact that other well-known agile approaches (such as often Scrum and Kanban) often fail to accommodate the unique needs of data science projects.
Achieving Agility Via 3 Key Concepts
There are three key concepts that we believe enable a team to gain the benefits of agility within a data science project. This agility helps to ensure that a team focuses on their highest priority task while also enabling future tasks to be reprioritized as needed. These three agile concepts are:
- Agile is intended to be a sequence of iterative experimentation and adaptation cycles.
- The goal of each cycle should be to have an idea or experiment in mind, to build it, and then to observe the analysis, and then to analyze those observations to create the next idea or experiment.
- Going from an initial idea, through implementation, and the analysis of the results should be the basis for an iteration. The completion of the empirical process should mark the end of that iteration (not a predetermined number of elapsed hours).
Using DDS: A High Level Flow of Work
First, DDS teams brainstorm possible questions to answer (or experiments to perform).
Then, the team prioritizes those questions, picking the highest priority item to work on as a team. This includes identifying the data to use and the models that need to be created.
Next, the team collectively interprets the results of the work.
Based on the results, the team deploys the results and prioritizes future work.
Similar to Scrum, there are three key roles in DDS:
- Product Owner: The Product Owner in DDS is the empowered central point of product leadership (“voice of the client”) – the person who decides on the Product Increments, prioritizes which features and functionality to build, the order in which to build them, and what aspects of them to observe and analyze. In short, the Product Owner owns the Backlog and prioritizes its Items, ensuring that each Item is clearly defined, and that the upcoming work and priorities of the team are visible and transparent.
- Process Expert: The Process Expert acts as a coach, facilitator, impediment remover. The Process Expert also helps the team understand and embrace the DDS values, principles, and practices to aid the organization in obtaining exceptional results from applying DDS.
- DDS Team Members: Similar to Scrum, each DDS team is a group of typically three to nine people. The DDS team is comprised of a cross-functional collection of DDS Team Members (ex. Data Scientists, Software Engineers, …) that have all the skills needed to create artifacts (ex. models) to answer the questions / experiments (i.e., to design, build, test and deploy the desired product). Both the Product Owner and the Process Expert are part of the DDS Team and may contribute to creating, observing and analyzing throughout an iteration. The team self-organizes to determine the best way to accomplish the goal defined by the Product Owner.
DDS has four artifacts that describe the work to be done:
- Item: An Item may take a variety of forms such as “user stories”, “experiments”, or “testable hypotheses” as popularized by XP and Lean. In data science, Items are typically questions that the team needs to answer or hypotheses to evaluate.
- Backlog: The Backlog is a prioritized list of Items (i.e., work to be prioritized).
- Item Breakdown Board: The Item Breakdown Board (IBB) is the place where each Item (in the Backlog) is broken down into tasks. Items on the backlog are broken down into their component tasks prior to being worked on by the team, and each Item has its own IBB. This work to create the IBB is done during backlog refinement. For each item, there should be at least one:
- Create task
- Observe task
- Analyze task
- Task Board: The Task Board is a visual representation of the Item(s) currently in progress. For work on an Item to be started (i.e., being worked on by the team), the tasks for that item are moved from the IBB to the Task Board. These tasks are displayed on the Task Board, typically in the ‘to do’ column. The Task Board has several additional columns (at a minimum, ‘to do’, ‘in progress’, ‘done’) and each task flows across the board, thus visually showing work being done within the team. The team strives to complete the tasks on the Task Board as soon as possible.
The current status of these items is always visually represented on the Task Board and the iteration is completed when all the tasks for that item are in the ‘done’ column. Note that the Product Owner must agree that the tasks in the ‘done’ column are actually done (a simple way to achieve this is to add another column ‘confirmed done’ to the task board). As with Kanban, to facilitate task throughput, each team defines a maximum number of tasks within a single column, which is known that column’s Work-In-Progress limit.
Learn how to use DDS and master the skills to help lead data science projects. Grow with the Data Science Process Alliance’s training and certification programs.
DDS activities define the work to be done by the team. These activities are not time-boxed, but rather, are focused on defining and then executing iterations. Below is an explanation of these concepts:
- Backlog Refinement: In addition to the DDS Team working on one or more iterations, the team also spends time evaluating the Backlog Items so they can be prioritized. This evaluation includes:
- A relative estimate of the value of the Item when completed
- A relative estimate of the effort required to complete the Item
- A relative estimate of the probability of success to create the Item
As part of the refinement process, the team defines the relative unit of measures, decided upon by the team. This estimation could be a T-Shirt sized (large, medium, small) or a number representing an estimate of the relative value, effort and probability of success of the Items. Note that the effort estimation is used to help prioritize backlog items, but not define what is part of an iteration (e.g., if two items deliver the same value but one is deemed a “small” effort and one is a “large” effort, the team might select the smaller level of effort item).
While the product owner owns the prioritization process, the other members of the team typically budget 5% to 10% of their total capacity to assist the product owner with backlog refinement (e.g., breaking an item into two smaller, but still useful, items, clarifying or simplifying an item, providing effort estimations, etc).
- Prioritization of the Backlog: The team explores the Items in their Backlog by providing high level estimates of: (1) the value of the work, (2) the amount of work (team effort), and (3) the probability of success of that work. The Product Owner, with input from the stakeholders and the other team members, is responsible for maintaining the Backlog, which evolves and changes throughout the project. The Product Owner uses this information to prioritize and select the Items to be answered (i.e., what to do during the iteration).
- Iterations: An Iteration is a collection of one or more backlog items. The goal of each Iteration should be to allow a logical chunk of work to be released in a coherent fashion.
Every iteration includes the work to create an artifact that answers a question (ex. a model), the work to observe that artifact (ex. how the model performs on test data) and the team’s analysis of those observations. The information gained from the iteration must have value derived either from the artifact that is created, or the analysis of the task completed.
- Iteration Duration: Each iteration is capability-based (not time-boxed calendar events). Furthermore, each iteration should aim to be a minimally viable set of work that can deliver value and allows the given lean hypothesis to be tested, and should not last more than one month, but can be as short as the team wants (e.g., one day).
An iteration completes when the work required to answer the question has finished (i.e., not a specific date). Each iteration enables teams to quickly answer a question (validate or reject a lean hypothesis), hence iterations facilitate agility. Learning from the current iterations helps prioritize future iterations.
Note that since the iteration is capability-based and is the minimally viable set of items that can deliver value. Answering multiple questions in a single iteration is generally only desirable in the case that the associated hypothesis or observable data overlap. However, the observe task within an iteration might take some time (e.g., data collection of a model or artifact). In this situation, it often makes sense for the next iteration to start (i.e., there can be multiple iterations executing in parallel).
- Product Increments: A high level goal for the team to achieve in a fixed amount of time (ex. 3 months) using multiple iterations is known as a Product Increment. Increments help teams prioritize iterations within the increment and set expectations with clients.
There are four regularly occurring events (i.e., the events occur on a calendar basis, not based on the completion of an iteration). These events, which are facilitated by the DDS Process Expert, helps the team stay coordinated. In short, these events help to plan iterations via backlog item selection, to review iteration results via iteration reviews (and learn for future iterations), to reflect on how to improve a team’s process via retrospectives, and to understand potential roadblocks in the iteration via daily meetings).
These events are described in more detail below:
- Backlog Item Selection: occurs when the team has capacity to start a new iteration (e.g., when a previous iteration has completed, or when the in-progress iteration does not require full-time focus, usually during the “observe” phase). Teams may have multiple iterations in progress simultaneously but should prioritize finishing an in-progress iteration over starting a new one whenever practical. The team reviews the prioritized Backlog Items (that have been updated via refinement) and selects the top Item(s) that will now be the team’s focus.
- Daily Meeting: occurs each workday, when the team meets for a 15-minute inspect-and-adapt activity. An important goal of this meeting is to help a self-organizing team better manage the flow of its work (ex. helping a team member get past an issue). Just as with Scrum Standups, a common approach for conducting this meeting is for team members to share with each other what they did yesterday, what they are planning to do today, and what obstacles they need to overcome.
- Iteration Review: occurs on a regular and repeating basis and is scheduled by the product owner. Reviews might be weekly and are calendar based to account for the fact that there might be several iterations per week, and there would be diminishing returns if iteration reviews occurred on a daily (or more frequently) basis. They would also be logistically difficult to schedule if they were needed on an ad hoc basis. The review is intended to foster conversation about completed functionality and the observations and analysis that the team has generated regarding the performance of the completed iteration(s). Participants include the team, stakeholders, customers, and anyone else interested in the outcome of the project.
A successful review results in bidirectional information flow. The people who aren’t on the team get to sync up on the project effort, the observed product performance, and the team’s analysis of that performance. At the same time, in addition to getting feedback on the currently delivered iteration, the team can get suggestions from the other attendees for potential features, metrics and experiments for future iterations. Furthermore, during this meeting, the group discusses the prioritization of the backlog items (since, for example, the insights gained might suggest a change in item priority or the creation of new items). At the end of the review, the tasks on the board relating to the discussed and now completed item(s) are archived.
- Retrospective: occurs at regular intervals (ex. once a month) and is a time to inspect and adapt the process. In the spirit of continuous improvement, the team comes together to discuss what is and is not working with the current process and associated technical practices. The goal is to help a good DDS team become great. At the end of a retrospective, the team should have identified and committed to a practical number of process improvement actions that will be undertaken by the team going forward.
Conceptual Flow of a DDS Project
The following diagram shows the flow of work during a DDS project
Data Driven Scrum Comparison
How DDS is Different from Traditional Scrum
- Functional (Capability-Based) Iterations
The most important difference between DDS and Scrum is that theScrum Guide requires all iterations (sprints) to be of equal length in time. However, iterations in DDS vary in duration, so as to allow a logical chunk of work to be done in one iteration(rather than defining the amount of work that can be done in a specific unit of time). In other words, DDS iterations have unknown and varying length iterations (as compared to traditional Scrum sprints, which have fixed-time durations) and can be shorter or longer than an “average” iteration (e.g., an iteration might be shorter than normal due to being able to learn from a quick / short experiment).
- Uncertain Task Duration
Since DDS iterations are capability based, DDS teams are not forced to estimate what can be completed in 1 or 2 weeks. Hence, unlike traditional Scrum (which needs estimations that are accurate enough to know what can fit into a sprint), DDS naturally accommodates tasks that are difficult to estimate (since task estimation is often difficult within a data science context). Hence, DDS does not require the team to generate accurate detailed task estimations. However, high-level “T-Shirt” level of effort estimates are typically needed for prioritizing the potential tasks to be done.
- Collective Analysis
In many Scrum implementations, observing, analyzing and reacting to feedback is solely the responsibility of the Product Owner. This part of the product owner’s job largely falls outside of the codified process. Collecting and analyzing well-chosen data and drawing appropriate conclusions is a crucial part of the process. By building the observing and analyzing steps directly into the core DDS workflow, DDS helps teams make better data-driven decisions. Specifically, the entire team focuses on creating, observing and then analyzing an hypothesis, analysis or feature (often in traditional scrum, this analysis is done by the product owner outside of the codified process).
- Decoupling meetings from an iteration
Since an iteration could be very short (ex. one day for a specific exploratory analysis), meetings (such as a retrospective to improve the team’s process) should be based on a calendar-based frequency the team deems appropriate (i.e., not linked to the end of each iteration as is done in traditional Scrum). In short, retrospectives and item reviews are not done at the end of every iteration, but rather, on a frequency the team deems appropriate.
- Overlapping Iterations are much more common in DDS
An observe task might take time (ex. data collection). Since DDS has capability-based iterations, it is easy to start the next iterations (and pause that next iteration when the observe task has completed).
How DDS builds on Traditional Scrum
Just like traditional Scrum, each DDS team is a group of up to about ten people, one of whom is the product owner, and one of whom is the process expert.
Just as in traditional Scrum, there is a daily stand-up, as well as Iteration and Retrospective Reviews.
- Process to create and prioritize Items
Just like traditional Scrum, items are created, prioritized and viewed on a task board.
- Focus on Agility
Both DDS and Scrum have a focus on enabling team to be agile via small iterations
How DDS Builds on Kanban
While there is no one official Kanban guide, teams that use Kanban typically follow two key Kanban principles. DDS adheres to these key Kanban principles:
- Visualize workflow: Seethe work on a board with cards to represent the work to be done and in progress, via the use of board columns (such as “to do”, “doing” and “done).
- Limit work in progress (WIP): Set a limit on how much work can be in progress at one time in each column. In other words, how many tasks can be in each column at a given time. This ensures that cards are moving smoothly across the board as and when the team are ready for them.
As the creator of Data Driven Scrum, the Data Science Process Alliance is the definitive source on the new framework. To master DDS, become Data Science Team Lead certified. Or dive into the following posts:
- 5 Key Questions when using Data Science Scrum
- What is the Data Science Process?
- What is Agile Data Science?
- Is Agile a fit for Data Science?
Learn to implement Agile Data Science
The tactics to successfully implement agile in data science are not the same as in software. As a data science leader, you should understand these nuances and know how to implement DDS (and more generally, agility) in the context of data science.