As explained in the previous post, much of the debate on agile’s potential fit for data science focuses on the use of a specific framework (such as Scrum), and the associated processes and artifacts such as story pointing, burn down charts, or sprint lengths. Unfortunately, this drowns the argument into details that ignore agile for what it is: a philosophy that is defined by four values and twelve principles.
Ten Factors to consider if Agile is a Fit
So to assess agile’s fit for data science, first review the Agile Manifesto and ask if data science should ascribe to these values and principles. Given the breadth of data science problems and environments, there is not a definitive yes/no answer. Therefore to help determine agile’s fit for a specific data science project and team, consider the following five People Factors and five Project Factors that are inspired by the Manifesto:
|Factor||Would Benefit from Agile||Might not Benefit|
|Stakeholder engagement||Stakeholders are open to provide frequent feedback throughout the project.||Stakeholders are not very accessible.|
|Management culture||High-trusting environment where management encourages teams to self-organize.||More traditional hierarchical “command and control” management style.|
|Team members culture||Team members aspire to self-manage and are open to shift direction as required.||Team members prefer to follow a set plan (prescriptive approach) or are opposed to following processes (ad hoc).|
|Team cross-functionality||The project team is cross-functional with (most of) the skillsets needed to develop the solution.||The project team heavily relies on external dependencies in order to complete the solution.|
|Team size||Small (roughly 9 max). For projects requiring more team members, they can be divided into sub-teams with dedicated staff to manage inter-team coordination.||Larger teams. One could argue very small teams (3 or less) are also kess likely to benefit from agile given the reduced challenges in coordination and communication.|
|Factor||Would Benefit from Agile||Might not Benefit|
|Delivery scheduling||Stakeholders would benefit from a series of incremental partial deliverables.||Stakeholders only benefit from the delivery of a fully functioning system.|
|Business Requirements||Requirements are ambiguous upfront and/or are likely to change prior to project closure.||The requirements are well-defined upfront; minimal changes may occur.|
|Solution space complexity||The solution is complex and has not been tried before by the team. It is discovered throughout the project, often through research and proof of concepts.||The solution has already been solved in similar context and/or is relatively straightforward. It can be (nearly) fully designed prior to any implementation effort.|
|Documentation||Documentation is intended to be sufficient but not too extensive.||Management, regulatory agencies, corporate policies, or stakeholders dictate detailed documentation.|
|Contract engagement||A project without legal formalities or with a flexible contract (like time and materials).||A contract-bound relationship that defines a fixed term upfront, particularly between parties without a high trust relationship.|
Some factors are strongly influenced by data science while others are generally more dependent on environmental factors that are not directly related to data science. Specifically, the People Factors and the last two Project Factors are more impacted by the organizational environment and external influences than data science. On the other hand, data science projects tend to support the case for agile in the first three Project Factors:
- Data science project stakeholders often benefit from a series of partial deliverables. Example: 1st deliverable – descriptive statistics and insights; 2nd – a model on a subset of data; 3rd – a fully functioning model; 4th – an application that runs the model and delivers output daily.
- Business requirements are likely to change, arguably more often than in software contexts. Why? Because business stakeholders can generally envision software solutions more clearly than data science solutions because they often know what is possible with software but not from data science. Thus, their concept of what is needed will likely shift as they discover what is (not) possible.
- The solution space of many data science problems is very complex, typically more so than in software because software engineers generally know in advance whether a system can be built but even experienced data scientists are often presented with a problem that they don’t know if they can solve until they try. Experimentation and proof of concepts are often built into the process to uncover possible solution options.
Two Contrasting Examples
Consider a corporate data science team tasked to answer, “How likely is each customer to churn next year?” There is probably an engaged stakeholder (in this example, the customer retention department) who would benefit from interim deliverables (perhaps insights from descriptive statics in the first week and a partially functioning model for a subset of customers in the second week) and who might change requirements (“Can you focus on just profitable customers who are likely to churn next three months instead?”). The solution will need to be discovered through experimentation on various combinations of feature inputs and algorithms. Agile is built for such circumstances.
In a contrasting example, academic research projects are less likely to benefit from agile. A typical project might focus on a single final deliverable (like a publication), might not have an engaged stakeholder (journal review boards won’t look at a paper-in-progress), and might not need daily collaboration of team members (if there are any). Such academic data science projects could still benefit from certain aspects of agile but perhaps not an overall agile framework.
So, is agile a fit for data science? Um, well, it depends. Agile’s fit for data science is highly dependent on both People Factors and Project Factors—most of which are likely strongly influenced by factors not directly related to data science itself. However, agile is at least a partial fit for most data science projects given its ability to satisfy stakeholders early, to handle changing business requirements, and to help uncover the solution space through a series of short experiments.