Conceptually Agile and data science are a great match. Most notably, both Agile and data science emphasize the same underlying concept that you build something, you learn from what you delivered, and then you improve it based on what you learn.
And yet the results of implementing an Agile data science framework are mixed. Usually, this is because teams struggle to understand how to properly Agile.
To help teams more effectively deliver results, this post explores:
- What is Agility?
- Is Agile a fit for data science?
- Why is Agile data science so hard?
- How to implement Agile Data Science
- Learn More
What is Agility?
The Movement Toward Agile
Traditional product development follows rigid pre-planned processes inspired by manufacturing. However, during the 1980s, in response to the inflexibility of these traditional Waterfall processes, some companies began to embrace speed, flexibility, and overlapping processes over rigid, linear, and distinct project phases (Takeuchi & Nonaka, 1986).
Since the 1990s, organizations – largely from the software industry – have formalized such approaches that focus on rapid incremental delivery and continuous customer feedback rather than linear plans that feature extensive documentation and detailed upfront planning.
This Agile movement picked up steam after being codified into the Agile Manifesto (below) in 2001. Software continued to lead adoption, and now Agile practices are standard for software development and are being adopted across other industries.
In the past several years, data science has also adopted an Agile approach. 30% of respondents in our 2021 data science methodology poll noted Kanban or Scrum as the most common methodology they use for data science projects.
What is Agile?
Contrary to a common misconception, Agile is not a set methodology. Rather, Agile is a philosophy focused on flexibility and adaption.
You aren’t Agile just because you “do” a common Agile framework like Scrum. Rather, you become Agile if you deliver frequently, solicit feedback, and adapt your plans based on this feedback. More specifically, you are Agile if you adhere to the Agile Manifesto (below) and to its Twelve Principles.
What Does Agile Data Science Mean?
Simply put, Agile Data Science merges Agile philosophies with data science practices.
It does not shoehorn data science into practices that compromise the natural data science life cycle.
Rather Agile data science respects data science for what it is — a highly exploratory process centered around scientific experimentation.
Although Agile’s underlying philosophy is the same for data science as in other industries, there are some important nuances for what this means in practice for data science.
Is Agile a Fit for Data Science?
Yes (or at least for most situations, yes).
By its nature, data science is an ambiguous, non-linear process that tends to lack clear up-front understanding and requirements. Agile is built for these situations.
Benefits of Agile Data Science
Here are some specific benefits that Agility in data science can help achieve:
- More Relevant Deliverables: By defining requirements just before development (as opposed to all upfront in a project), the features are more likely to meet the most current needs. Indeed, the stakeholders’ initial requests often do not map to their needs. Agile practices help you discover the true needs earlier.
- Quicker Delivery of Customer Value: By delivering incremental product features such as exploratory data reports, Tableau dashboards, or Minimal Viable Models, users gain value before the project’s end.
- Real Feedback: By soliciting feedback on the functional product, the data scientists can more accurately assess whether their deliverables work “in the wild”. Meanwhile, the product manager can assess whether the deliverables provide the intended business value.
- Cut Losses Early: No matter what you do, some data science projects simply fail. The sooner you get feedback that you’re headed to failure, the sooner you can pivot to related objectives or kill off the project.
- Improved Communication: Agile focuses on individuals, collaboration, and clear communication. As data science teams scale and become more diverse, the benefits of effective communication also increase — both within the team and with the stakeholders.
When Agility is Less Important
There are some counterexamples whereby agility might be less important.
For example, I asked Management Consultant Daniel Mezick whether agile works for data science. He re-phrased the question and said a better question to ask is: “Are you trying to deliver continuously or very frequently?” If yes, then Agile makes sense for your project. If not, then you could still benefit from certain aspects of Agile but not necessarily from the entirety of an agile framework.
Thus, some data science projects might not benefit as much from Agile. For example, because academic research usually intends to produce a single output such as a publication, Agility is not as important for academic data science projects.
Moreover, consider highly regulated projects such as pharmaceutical research. Specifically, a project manager of data-intensive projects at a large pharmaceutical company understood Agile’s benefits but believed they were not practical for FDA regulatory compliance. Therefore, his team used an Agile-Waterfall hybrid approach.
More broadly, the Is Agile a Fit for Data Science? post provides 10 factors to more thoroughly assess whether Agility is appropriate for a data science project. Three factors argue for Agility in data science but seven factors suggest that Agile’s fit for Data Science is specific to the use case and environment.
Agile Data Science Challenges
Although the concept of agile data science is fundamentally the same as in other fields, the path to achieve this agility is different. Indeed, it is full of potholes and you probably don’t have a GPS to tell you where to go.
- Misunderstanding: Agile often gets a bad rap, particularly among some data science circles who don’t understand its value and believe in some of 5 Myths of Agile Data Science.
- Lack of Data Science-specific Frameworks: The most well-known agile frameworks are software-specific or at least stem from software environments. Applying such approaches might inhibit the exploratory nature of data science.
- Less Straight-forward: Although Agile strives for simplicity, the flexible approaches are not as intuitive as a well-laid out plans. Indeed, stakeholders and management might insist on hard time-lines that can derail the entire effort.
- Longer Time Horizons: Agile practices emphasize getting functional products out quickly. Indeed, Scrum calls for potentially releasable increments in pre-defined cadences that do not exceed a month. Yet, data science research often requires longer time horizons that are difficult to know up-front. If you want to annoy a data scientist…ask them how long it will be before they produce a model with a pre-defined accuracy using data they don’t even have access to yet.
How to Implement Agile Data Science
In short, combine the natural data science life cycle with an Agile collaboration framework.
This is easier said than done. However, follow these five steps:
- Start with a commitment toward Agility.
- Communicate its benefits.
- Design (or select) an Agile collaboration framework that works for your specific circumstances. Read the following section of this post to review three common frameworks you can use.
- Implement this framework. Consider which elements are easiest or most important to implement and start the transition there.
- Adjust the process. Agility isn’t just about improving your process. Strong Agile teams also focus on improving their processes.
Recommended Agile Approaches for Data Science
Scrum
Scrum is the most common software collaboration framework – So much so that many practitioners falsely equate Scrum as Agile.
Scrum has a lot of great enablers for agile data science. But it comes with a lot of challenges, and teams generally struggle to implement it effectively.
Kanban
Kanban is a very simple set of principles that work well for a lot of environments including data science.
However, Kanban is the least definitive of these three frameworks which requires additional processes. Yet, many teams report positive results from Kanban.
Data Driven Scrum
Data Driven Scrum is a new agile collaboration framework specific for data science projects.
It attempts to combine the best of scrum and kanban from data science’s perspective. As the newest framework (that we know of), the verdict is still out on the framework’s utility.
Learn More
Consider reading these blog posts:
- 5 Agile Data Science Myths (internal post)
- What is Agile Data Science? (internal post)
- Is Agile a Fit for Data Science? (internal post)
- Using Agile development techniques for data science projects (O’Reilly Podcast with James Akred)
- Why Data Science Doesn’t Respond Well To Agile Methodologies (LinkedIn post by Jeffrey Humphries)