The Need for a New Agile Framework
Scrum Challenges: One key challenge of using a sprint-based framework within a Data Science context is the fact that task estimation is unreliable. In other words, if the team can not accurately estimate task duration (ex. how long with a specific exploratory analysis take), the concept of a sprint, and what can get done within a sprint is problematic. Another key challenge are that Scrum’s fixed-length sprints can be problematic in that even if a team could estimate how long a specific analysis might take, having a fixed length sprint might force the team to define an iteration to include unrelated work items (as well as delay the feedback from an exploratory analysis), which could help prioritize new work. In short, a sprint does not allow smaller (or longer) logical chunks of work to be completed and analyzed in a coherent fashion.
Kanban Challenges: Despite the benefits of using Kanban, there are also challenges to using Kanban. In general, these challenges include the lack of organizational support and culture, lack of training and the misunderstanding of key concepts. Specifically, Kanban does not define project roles nor any process specifics, the freedom Kanban provides can be part of the challenge in implementing Kanban. While this lack of process structure can be a strength (since the lack of a specified process definition allows teams to implement Kanban within existing organizational practices), the lack of process definition can also mean that every team could implement Kanban in a different way. In other words, a team that wants to use Kanban needs to figure out their own processes and artifacts.
TDSP: While potentially useful for some data science projects, newer frameworks such as TDSP leverages the concept of Scrum sprints. As such, Scrum’s sprint challenges are applicable to TDSP. In addition, since TDSP also leverages a CRISP-DM like approach, these types of frameworks focus on providing an overall set of guidelines of what should be done during a data science project, but not a framework on how to work through that the different tasks (e.g., when should a team “loop back”). Thus, TDSP and similar approaches do not address how a team should iterate through a series of analyses (experiments) to better understand the data and provide actionable insight to their client.
Key Tenets of the New Framework
As a starting point, there are three key concepts that should drive a lean agile data science project:
1. Agile is intended to be a sequence of iterative experimentation and adaptation cycles.
2. The goal of such cycles should be to have an idea or experiment in mind, to build it, and then to observe the analysis, and then to analyze those observations to create the next idea or experiment.
3. Going from an initial idea, through implementation, and the analysis of the results should be the basis for an iteration. The completion of the empirical process should mark the end of an iteration (not a predetermined number of elapsed hours).
In following these tenets, teams will focus on their highest priority tasks (i.e., trying to ensure that the time spent during an iteration goes towards work that was actually required to run the given experiment / iteration) while enabling tasks to be reprioritized as needed.
With these tenets in mind, a lean agile framework should have the following principles:
- Allow capability-based iterations – it might be that sometimes it makes sense to have an iteration that lasts one day, and other times, for an iteration last three weeks (ex. due to how long it takes to acquire / clean data or how long it takes for an exploratory analysis). The goal should be to allow logical chunks of work to be released in a coherent fashion.
- Decoupling meetings from an iteration – since an iteration could be very short (ex. one day for a specific exploratory analysis), meetings (such as a retrospective to improve the team’s process) should be based on a logical time-based window, not linked to each iteration.
- Only require high level item estimation – In many situations, defining an explicit timeline for an exploratory analysis is difficult, so one should not need to generate accurate detailed task estimations in order to use the framework. But, high-level “T-Shirt” level of effort estimates can be helpful for prioritizing the potential tasks to be done.
The DDS Framework
DDS (Data Driven Scrum) is a new agile framework that was designed with data science in mind. Specifically, DDS supports lean iterative exploratory data science analysis, while acknowledging that iterations will vary in length due to the phase of the project (collecting data vs creating a machine learning analysis).
DDS defines an agile lean process framework that leverages some of the key concepts of Scrum as well as the key concepts of Kanban, but differently than Scrumban (which as is more of Kanban within a Scrum Framework and hence, Scrumban implements Scrum sprints, which as previously noted, introduces several challenges for the project team).
In short, DDS teams use a visual board and focus on working on a specific item or collection of items during an iteration, which is task-based, not time boxed. Thus, an iteration more closely aligns with the lean concept of pulling tasks, in a prioritized manner, when the team has capacity. Each iteration may be viewed as validating or rejecting a specific lean hypothesis. Specifically, an iteration is defined by the following three steps:
- Create: A thing or set of things that will be created, put into use with a hypothesis about what will happen.
- Observe: A set of observable outcomes of that use that will be measured (and any work that is needed to facilitate that measurement).
- Analyze: Analyzing those observables and create a plan for the next iteration
Finally, DDS has a well-defined set of roles, artifacts, and meetings/events, which are similar to Scrum (but different with respect to meeting timing) and are explained in detail at the DDS web site (www.datadrivenscrum.com)