Many data scientists and data science project sponsors want to use an agile data science approach to help generate actionable insight from the data. But what does this actually mean? How can (or should) a team (or individual) “do” agile data science?
In short, what are some foundational goals that should be achieved when using an agile data science approach?
One approach is to use an existing agile framework that has been used in software development. For example, a team might use Kanban or Scrum as their framework to achieve agility. While using some sort of agile framework certainly makes sense, it is helpful to take a step back, before selecting an agile framework, and think a bit about what agility means within a data science context.
In short, before selecting an agile framework, it is helpful if the team (or team leader) thinks through what are the goals of the project and how the use of an agile data science approach could help achieve those goals. By doing this, it will help identify an appropriate agile data science framework, or at least the characteristics of the key desired agile data science framework.
With this in mind, below are three key foundational goals for any agile data science effort.
- Use iterations via defined experiments: Although the concept of an iteration is commonly understood as a foundational element within many agile frameworks, and that an iterative approach helps to achieve agility, it is not obvious how to define an iteration (or what is an iteration) within a data science project. By framing an iteration as an experiment, the team (and the client) can better understand the exploratory nature of data science, and that an iteration is more about finding value in the data, as opposed to implementing a well-defined set of capabilities.
- Keep the experiments as small as possible: Each experiment should yield insight, even if this insight is that a certain variable is not helpful in trying to generate actionable insight. So, the insight from one experiment should be used to help define and prioritize future experiments. With this in mind, each experiment should be as short as possible, so “wasted” work is minimized and the most interesting experiments, based on the results of past experiments, are prioritized appropriately.
- Get feedback on the results of the experiments to help prioritize future experiments: One of the keys to prioritizing future experiments is to discuss the results of the experiments with the user/client. In this way, discussions on what might be useful actionable insight can help drive the definition and prioritization of future experiments.
In other words, when trying to achieve agility within a data science project, one should create, observe and then analyze each experiment (iteration), and the results of these experiments should be discussed with the extended team to help define and prioritize new experiments.
Note that these three goals do not dictate the use of one specific agile framework. So, a team could use, for example, Scrum or Kanban, to achieve these foundational goals. Of course, if one was to use Scrum, the team would need to be able to effectively fit one or more experiments within a sprint (which is often difficult due to the uncertainty in how long the experiment will take ). On the other hand, if the team uses Kanban, the team needs to define an appropriate set of policies to ensure there is effective communication across the extended team – one such set of policies have been defined within the structured Kanban iteration (SKI) framework.