Is Agile a Fit for Data Science?

As explained in the previous post, much of the debate on agile’s potential fit for data science focuses on the use of a specific framework (such as Scrum), and the associated processes and artifacts such as story pointing, burn down charts, or sprint lengths. Unfortunately, this drowns the argument into details that ignore agile for what it is: a philosophy that is defined by four values and twelve principles.

Ten Factors to consider if Agile is a Fit

So to assess agile’s fit for data science, first review the Agile Manifesto and ask if data science should ascribe to these values and principles. Given the breadth of data science problems and environments, there is not a definitive yes/no answer. Therefore to help determine agile’s fit for a specific data science project and team, consider the following five People Factors and five Project Factors that are inspired by the Manifesto:

People Factors

Factor

Would Benefit from Agile

Might not Benefit
Stakeholder engagement Stakeholders are open to provide frequent feedback throughout the
project.
Stakeholders are not very accessible.
Management culture High-trusting environment where management encourages teams to self-organize. More traditional hierarchical “command and control” management style.
Team members culture Team members aspire to self-manage and are open to shift direction as
required.
Team members prefer to follow a set plan (prescriptive approach) or are opposed to following processes (ad hoc).
Team cross-functionality The project team is cross-functional with (most of) the skillsets
needed to develop the solution.
The project team heavily relies on external dependencies in order to
complete the solution.
Team size Small (roughly 9 max). For projects requiring more team members, they
can be divided into sub-teams with dedicated staff to manage inter-team
coordination.
Larger teams. One could argue very small teams (3 or less) are also less
likely to benefit from agile given the reduced challenges in coordination and
communication.



Project Factors

Factor

Would Benefit from Agile

Might not Benefit
Delivery scheduling Stakeholders would benefit from a series of incremental partial
deliverables.
Stakeholders only benefit from the delivery of a fully functioning
system.
Business Requirements Requirements are ambiguous upfront and/or are likely to change prior
to project closure.
The requirements are well-defined upfront; minimal changes may occur.
Solution space complexity The solution is complex and has not been tried before by the team. It
is discovered throughout the project, often through research and proof of
concepts.
The solution has already been solved in similar context and/or is relatively
straightforward. It can be (nearly) fully designed prior to any
implementation effort.
Documentation Documentation is intended to be sufficient but not too extensive. Management, regulatory agencies, corporate policies, or stakeholders
dictate detailed documentation.
Contract engagement A project without legal formalities or with a flexible contract
(like time and materials).
A contract-bound relationship that defines a fixed term upfront,
particularly between parties without a high trust relationship.



Some factors are strongly influenced by data science while
others are generally more dependent on environmental factors that are not directly
related to data science. Specifically, the People Factors and the last two
Project Factors are more impacted by the organizational environment and external
influences than data science. On the other hand, data science projects tend to support the case for agile in the first three Project Factors:

  • Data science project stakeholders often benefit from a series of partial deliverables. Example: 1st deliverable – descriptive statistics and insights; 2nd – a model on a subset of data; 3rd – a fully functioning model; 4th – an application that runs the model and delivers output daily.
  • Business requirements are likely to change, arguably more often than in software contexts. Why? Because business stakeholders can generally envision software solutions more clearly than data science solutions because they often know what is possible with software but not from data science. Thus, their concept of what is needed will likely shift as they discover what is (not) possible.
  • The solution space of many data science problems is very complex, typically more so than in software because software engineers generally know in advance whether a system can be built but even experienced data scientists are often presented with a problem that they don’t know if they can solve until they try. Experimentation and proof of concepts are often built into the process to uncover possible solution options.

Two Contrasting Examples

Consider a corporate data science team tasked to answer, “How
likely is each customer to churn next year?” There is probably an engaged
stakeholder (in this example, the customer retention department) who would
benefit from interim deliverables (perhaps insights from descriptive statics in
the first week and a partially functioning model for a subset of customers in
the second week) and who might change requirements (“Can you focus on just
profitable customers who are likely to churn next three months instead?”). The
solution will need to be discovered through experimentation on various
combinations of feature inputs and algorithms. Agile is built for such
circumstances.

In a contrasting example, academic research projects are less likely to benefit from agile. A typical project might focus on a single final deliverable (like a publication), might not have an engaged stakeholder (journal review boards won’t look at a paper-in-progress), and might not need daily collaboration of team members (if there are any). Such academic data science projects could still benefit from certain aspects of agile but perhaps not an overall agile framework.   

Bottom Line

So, is agile a fit for data science? Um, well, it depends. Agile’s
fit for data science is highly dependent on both People Factors and Project
Factors
—most of which are likely strongly influenced by factors not
directly related to data science itself. However, agile is at least a partial
fit for most data science projects given its ability to satisfy stakeholders
early, to handle changing business requirements, and to help uncover the
solution space through a series of short experiments.