Therefore, to effectively implement data driven agile projects, the Data Science Process Alliance created an alternative framework called DDS (Data Driven Scrum) that is designed with data science in mind.
Data Driven Agile Challenges with Existing Approaches
To be data driven means to let the data inform (or drive) the outcomes. Meanwhile, agility means a lot of different things to a lot of people. But broadly, agility is a set a principles that guides a team to rapidly iterate toward a solution.
Unfortunately, combining these concepts for a “data driven agile” approach is challenging and existing frameworks might not get you there.
In Scrum, the product owner constantly updates a wish list of potential product features called the product backlog. The development team works to deliver top-priority items from the backlog in short, iterative fixed-length time periods called sprints. Meanwhile, the scrum master facilitates the overall process as a servant leader. At the end of each sprint, the work output should be in a usable form. The team demonstrates this at sprint review and conducts a retrospective meeting to improve processes.
One key challenge of using a sprint-based framework within a data science context is the fact that task estimation is unreliable. In other words, if the team can not accurately estimate task duration (ex. how long with a specific exploratory analysis take), the concept of a sprint, and what can get done within a sprint is problematic.
Another key challenge is that Scrum’s fixed-length sprints can be problematic. Even if a team could estimate how long a specific analysis might take, having a fixed-length sprint might force the team to define an iteration to include unrelated work items (as well as delay the feedback from an exploratory analysis), which could help prioritize new work. In short, a sprint does not allow smaller (or longer) logical chunks of work to be completed and analyzed in a coherent fashion.
Kanban is a light-weight set of principles that focuses on making work visible to help facilitate communication and collaboration. Kanban’s central focus is a highly-visual Kanban board which represents each life cycle phase as a column. The team works to ensure that too many items do not pile up in a single column, per their work-in-progress (WIP) limits. WIP limits help identify and resolve bottlenecks, minimize the adverse impacts of task switching, and reduce cycle times.
Despite the benefits of using Kanban, there are also challenges to using Kanban. In general, these challenges include the lack of organizational support and culture, lack of training and the misunderstanding of key concepts. Specifically, Kanban does not define project roles nor any process specifics. The freedom Kanban provides (such as letting teams define their own process for prioritizing tasks) can be part of the challenge in implementing Kanban. While this lack of process structure can be a strength (since the lack of a specified process definition allows teams to implement Kanban within existing organizational practices), it can also mean that every team could implement Kanban differently. In other words, a team that wants to use Kanban needs to figure out its own processes and artifacts.
Data Driven Scrum
Key Tenets of the New Framework
As a starting point, there are three key concepts that we believe should drive a lean agile data science project. In following these tenets, teams will focus on their highest priority tasks (i.e., trying to ensure that the time spent during an iteration goes towards work that was actually required to run the given experiment / iteration) while enabling tasks to be reprioritized as needed.
These three tenets are:
- Agile is intended to be a sequence of iterative experimentation and adaptation cycles.
- The goal of such cycles should be to have an idea or experiment in mind, to build it, and then to observe the analysis, and then to analyze those observations to create the next idea or experiment.
- Going from an initial idea, through implementation, and the analysis of the results should be the basis for an iteration. The completion of the empirical process should mark the end of an iteration (not a predetermined number of elapsed hours).
With these tenets in mind, DDS defines and adheres to the following three principles:
- Allow capability-based iterations – it might be that sometimes it makes sense to have an iteration that lasts one day, and other times, for an iteration last three weeks (ex. due to how long it takes to acquire / clean data or how long it takes for an exploratory analysis). The goal should be to allow logical chunks of work to be released in a coherent fashion.
- Decoupling meetings from an iteration – since an iteration could be very short (ex. one day for a specific exploratory analysis), meetings (such as a retrospective to improve the team’s process) should be based on a logical time-based window, not linked to each iteration.
- Only require high-level item estimation – In many situations, defining an explicit timeline for an exploratory analysis is difficult, so one should not need to generate accurate detailed task estimations in order to use the framework. But, high-level “T-Shirt” level of effort estimates can be helpful for prioritizing the potential tasks to be done.
The DDS Framework
Data Driven Scrum supports lean iterative exploratory data science analysis, and acknowledges that iterations will vary in length due to the phase of the project (collecting data vs creating a machine learning analysis).
DDS defines an agile lean process framework that leverages some of the key concepts of Scrum as well as the key concepts of Kanban, but differently than Scrumban (which as is more of Kanban within a Scrum Framework and hence, Scrumban implements Scrum sprints, which as previously noted, introduces several challenges for the project team).
In short, DDS teams use a Kanban-like visual board and focus on working on a specific item or collection of items during an iteration, which is task-based, not time-boxed. Thus, an iteration more closely aligns with the lean concept of pulling tasks, in a prioritized manner, when the team has capacity. Each iteration can be viewed as validating or rejecting a specific lean hypothesis.
Specifically, an iteration is defined by the following three steps:
- Create: A thing or set of things that will be created, put into use with a hypothesis about what will happen.
- Observe: A set of observable outcomes of that use that will be measured (and any work that is needed to facilitate that measurement).
- Analyze: Analyzing those observables and create a plan for the next iteration
Finally, DDS has a well-defined set of roles, artifacts, and meetings/events, which are similar to Scrum (but different with respect to meeting timing) and are explained in detail at the DDS web site.
Data Science Process Alliance: As the creator of Data Driven Scrum, the DSPA includes in-depth training of this framework in its:
- What is agile data science?
- Is agile a fit for data science?
- 10 ways to manage data science projects (Part II – Agile)
Other Agile Frameworks: