Agile Data Science
Do you think data science should be agile?
When framing agility in the context of delivering usable insights frequently, iterating on these insights, and validating the outcomes, I think all of us would say “yes”.
Yet, how do we achieve this? Even more basic, what does agile data science even mean?
The article covers these high-level fundamental questions. But if you’re looking to dive deeper into a specific agile data science methodology to help, jump into one of these three…
Agile Data Science Methodologies
What is Agility?
There’s a lot of confusion on what it actually means to be Agile. No, it doesn’t mean “doing” Scrum or even using common software practices that are “in vogue” like story point estimation or using user stories.
Rather, think simpler. Agility is about being flexible and adapting your plan based on feedback from incremental deliverables.
More importantly, Agile is a philosophy. It is not a specific set of practices. Instead, you can achieve agility through a myriad of practices if your mindset adheres to the Agile Manifesto (below) and to its Twelve Principles.
Where does Agile come from?
Traditional product development follows rigid pre-planned processes inspired by manufacturing. However, during the 1980s, in response to the inflexibility of these traditional Waterfall processes, some companies began to embrace speed, flexibility, and overlapping processes over rigid, linear, and distinct project phases (Takeuchi & Nonaka, 1986).
Since the 1990s, organizations – largely from the software industry – have formalized such approaches that focus on rapid incremental delivery and continuous customer feedback rather than linear plans that feature extensive documentation and detailed upfront planning.
Agile picked up steam after being codified into the Agile Manifesto (above) in 2001. Software continued to lead adoption, and now Agile practices are standard for software development and are being adopted across other industries.
It has become so important that the Project Management Institute, which focuses primarily on traditional project management, now also offers an Agile Certified Practitioner certification and added significant Agile portions to the Project Management Book of Knowledge in its September 2017 release.
Is Agile used in Data Science?
Even though data science and software engineering are different fields, many organizations treat them the same. Thus, it is no surprise that many organizations are also pushing for Agile adoption in data science. However, the results are mixed, often from failed attempts that force Agile software practices onto data science.
How many data science teams use Agile? That is not known but a 2017 study found that somewhere between 25% to 50% of data science teams currently use an Agile approach and that this percentage will likely increase in the future.
What does Agile Data Science mean?
Simply put, it merges Agile philosophies with data science practices.
It does not shoehorn data science into practices that compromise the natural data science life cycle.
Rather Agile data science respects data science for what it is — a highly exploratory process centered around scientific experimentation.
Although Agile’s underlying philosophy is the same for data science as in other industries, there are some important nuances for what this means in practice for data science. Here are some specific attributes and practices of effective data science teams.
6 Tips: Agile for Data Science Teams
- Have Fully-Functional Teams: Staff the data science team with all the skillsets needed to deliver value. As discussed in the 8 Key Data Science Roles post, this typically includes data engineers, data scientists, business analysts, and a product person.
- Allow Teams to Self-Manage: Upper-level management should not dictate how the team should function. Rather, they should provide direction and an environment for them to succeed. Encourage and trust the team to self-organize. The team should maintain a sustainable pace, frequently inspect its processes, and continually improve.
- Start Simple and Iterate Quickly: The primary output of data science teams are insights. Initial insights can start with static reports or analyses from data exploration. Then build up to interactive dashboards, Minimal Viable Models, and then to fully-functional productized intelligent systems.
- Measure, Measure, Measure: Solicit feedback regularly — both from stakeholders (via demos) and from the data itself (by closely monitoring model performance). See 10 Data Science Project Metrics to learn more.
- Collaborate: Long gone are the days of the lone wolf data scientist hiding in a corner. Rather, data science is a team sport. Agile data science teams collaborate and frequently communicate among themselves and with the broader stakeholder team.
- Have Flexible Plans: One great match between data science and Agile is that they both emphasize empirical learning whereby you deploy something, measure it, learn from it, and adjust your plans accordingly.
Is Agile for Data Science?
Yes (or at least for most situations, yes).
By its nature, data science is an ambiguous, non-linear process that tends to lack clear up-front understanding and requirements. Agile is built for these situations.
Moreover, in a discussion on the data science process, John Akred explains that data science needs “management techniques that accommodate and foster and help the non-linear processes succeed instead of attempt to force them into linearity”.
Benefits of Agile Data Science
Here are some specific benefits that Agility in data science can help achieve:
- More Relevant Deliverables: By defining requirements just before development (as opposed to all upfront in a project), the features are more likely to meet the most current needs. Indeed, the stakeholders’ initial requests often do not map to their needs. Agile practices help you discover the true needs earlier.
- Quicker Delivery of Customer Value: By delivering incremental product features such as exploratory data reports, Tableau dashboards, or Minimal Viable Models, users gain value before the project’s end.
- Real Feedback: By soliciting feedback on the functional product, the data scientists can more accurately assess whether their deliverables work “in the wild”. Meanwhile, the product manager can assess whether the deliverables provide the intended business value.
- Cut Losses Early: No matter what you do, some data science projects simply fail. The sooner you get feedback that you’re headed to failure, the sooner you can pivot to related objectives or kill off the project.
- Improved Communication: Agile focuses on individuals, collaboration, and clear communication. As data science teams scale and become more diverse, the benefits of effective communication also increase — both within the team and with the stakeholders.
When Agile Data Science makes less Sense
There are some counter examples whereby agility might be less important.
For example, I asked Management Consultant Daniel Mezick whether agile works for data science. He re-phrased the question and said a better question to ask is: “Are you trying to deliver continuously or very frequently?” If yes, then Agile makes sense for your project. If not, then you could still benefit from certain aspects of Agile but not necessarily from the entirety of an agile framework. Therefore, because academic research usually intends to produce a single output such as a publication, Agility is not as important for academic data science projects.
Moreover, consider highly regulated projects such as pharmaceutical research. Specifically, a project manager of data-intensive projects at a large pharmaceutical company understood Agile’s benefits but believed they were not practical for FDA regulatory compliance. Therefore, his team used an Agile-Waterfall hybrid approach.
More broadly, the Is Agile a Fit for Data Science? post provides 10 factors to more thoroughly assess whether Agility is appropriate for a data science project. Three factors argue for Agility in data science but seven factors suggest that Agile’s fit for Data Science is specific to the use case and environment.
Why is Agile in Data Science so Hard?
Although the concept of agile data science is fundamentally the same as in other fields, the path to achieve this agility is different. Indeed, it is full of potholes and you probably don’t have a GPS to tell you where to go.
Agile Data Science Challenges
- Misunderstanding: Agile often gets a bad rap, particularly among some data science circles who don’t understand its value and believe in some of 5 Myths of Agile Data Science.
- Lack of Data Science-specific Frameworks: The most well-known agile frameworks are software-specific or at least stem from software environments. Applying such approaches might inhibit the exploratory nature of data science.
- Less Straight-forward: Although Agile strives for simplicity, the flexible approaches are not as intuitive as a well-laid out plans. Indeed, stakeholders and management might insist on hard time-lines that can derail the entire effort.
- Longer Time Horizons: Agile practices emphasize getting functional products out quickly. Indeed, Scrum calls for potentially releasable increments in pre-defined cadences that do not exceed a month. Yet, data science research often requires longer time horizons that are difficult to know up-front. If you want to annoy a data scientist…ask them how long it will be before they produce a model with a pre-defined accuracy using data they don’t even have access to yet.
How Do You Achieve Data Science Agility?
In short, combine the natural data science life cycle with an Agile collaboration framework.
This is easier said than done. However, follow these five steps:
- Start with a commitment toward Agility.
- Communicate its benefits.
- Design (or select) an Agile collaboration framework that works for your specific circumstances.
- Implement this framework. Consider which elements are easiest or most important to implement and start the transition there.
- Help your organization through the change management process.
In terms of the third point, consider the following three collaboration frameworks,
Recommended Agile Methodologies for Data Science
Scrum is the most common software collaboration framework – So much so that many practitioners falsely equate Scrum as Agile.
Scrum has a lot of great enablers for agile data science. But it comes with a lot of challenges, and teams generally struggle to implement it effectively.
Kanban is a very simple set of principles that work well for a lot of environments including data science.
However, kanban is the least definitive of these three frameworks which requires additional processes. Yet, many teams report positive results from Kanban.
Data Driven Scrum is a new agile collaboration framework specific for data science projects.
It attempts to combine the best of scrum and kanban from data science’s perspective. As the newest framework (that we know of), the verdict is still out on the framework’s utility.
- Data Science Process Alliance is an organization Jeff and I are helping to launch. Agile training is at the core of the:
- The Basics:
- Agile Frameworks: