There are three key concepts that should be followed within an agile data science effort – use iterations, keep the iteration as small as possible and get feedback on each iteration.
In other words, while there are several alternative data science workflow frameworks (sometimes known as data science life cycle frameworks), to achieve agility, agile teams should execute an agile data science project by:
- Using iterations: The concept of an iteration is commonly understood as a foundational element within many agile frameworks, and that an iterative approach helps to achieve agility. In data science, an iteration could be thought of as an experiment.
- Keeping the iteration as small as possible: Each iteration should yield insight, even if this insight is that a certain variable is not helpful in trying to generate actionable insight. This insight should be used to help define and prioritize future tasks.
- Getting feedback on each iteration: One of the keys to prioritizing future work is to discuss the results of that work with the user/client. In this way, discussions on what might be useful actionable insight can help drive the definition and prioritization of future experiments.
Agile Data Science Training and Consulting
Agility is key to the success of data science projects and teams, but difficult to master. If you are interested in learning how to use Agile to deliver data science projects, explore individual certifications and corporate consulting options through the Data Science Process Alliance.
Or for an overview on the topic, read on…
Agile and Data Science
Furthermore, many do believe that teams should use an agile approach for data science. For example, Gartner insists that “work that is more exploratory, is less known and demands quick results […] demands agile”. By its nature, data science is an ambiguous, non-linear process that tends to lack clear up-front understanding and requirements. In a discussion on the data science process, John Akred explains that data science needs “management techniques that accommodate and foster and help the non-linear processes succeed instead of attempt to force them into linearity”.
Having a slightly different perspective, Management Consultant Daniel Mezick suggests that a better question than asking whether agile works for data science, is to ask: “Are you trying to deliver continuously or very frequently?” If yes, then agile makes sense for your project. If not, then you could still benefit from certain aspects of agile but not necessarily from the entirety of an agile framework. Pressman & Maxim pose the question about agility for data science slightly differently. They posit “No one is against agility. The real question is: What is the best way to achieve it?”
So, while one recent study found that somewhere between 25% to 50% of data science teams currently use an Agile approach, this percentage will likely increase in the future, as many have noted the importance of agility when doing a data science project (e.g., Vamsi Nellutla, Darío Martínez and Victor Borda).
The 2001 Agile Manifesto (below) provides the foundation for agile practitioners. It does not define what should be done, just how it should be done. This contrasts with an approach like CRISP-DM which focuses on the steps and phases of a project but not how the team should execute a project.
“We are uncovering better ways to developing
Software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more”
Approaches to Achieve Agile Data Science
Agile Data Science Benefits
- More Relevant Features: By defining requirements just before development, the features are more likely to meet the most current needs.
- Quicker Delivery of Customer Value: By delivering incremental product features, users gain value before the project’s completion.
- More Realistic Feedback: By soliciting feedback on the functional product, the agile team can more accurately assess whether their deliverables are of value and to adjust future deliverables according to feedback.
- Cut Losses from Building Wrong Features: If stakeholders provide feedback that a product feature is no longer useful, agile teams can learn this sooner, cut their losses, and divert efforts elsewhere.
- Cut Losses from Infeasible Features: Likewise, if data scientists are tasked with upfront discovery and analysis instead of developing an entire model, they are more likely to realize if they are working toward a dead-end that is not technically feasible.
- Improved Communication: Most agile approaches promote close coordination and communication within team members and with stakeholders.
Agile Data Science Challenges
- Less straightforward than waterfall which can lead to process confusion and poor implementation. Without defined cost estimates and timelines, teams might struggle to justify agile projects to executive sponsors who want to know how much they need to invest and when the product will be delivered.
- Organizational communication: Communication can be challenging with the broader organization who may not be used to, accepting of, or confused by agile. Not surprisingly, 70% of agile practitioners in State of Scrum Report reported tension between their teams and the rest of the organization.
- “Made for software”: A common general complaint from interviewees is that most existing agile approaches were designed by the software industry for software projects. While considered to be similar to software engineering, data science has its own unique challenges that existing agile approaches may not address. Applications of existing approaches might inhibit the exploratory nature of data science, lack the rigor to deal with the messiness of big data, use software engineering testing techniques that are not suitable for data science, and may not help evaluate whether the results are “good enough” to make a difference.
- Perceived planning issues: Poor project planning was another common complaint from interviews we conducted as several practitioners believe that agile skips over planning and that a more rigid process for requirements gathering and definition is needed.
- Not regulatory friendly: A specific complaint from a project manager of data-intensive projects at Eli Lilly, a large pharmaceutical company, is that agile testing practices are not practical for FDA regulatory compliance.
A Brief History of Agile
During the 1980s, in response to the shortcomings of Waterfall, some companies began to embrace speed, flexibility, and overlapping processes over rigid, linear, and distinct project phases (Takeuchi & Nonaka, 1986).
Since the 1990s, organizations, largely from the software industry, have formalized such approaches that focus on rapid incremental delivery and continuous customer feedback rather than linear plans that feature extensive documentation and detailed upfront planning. Not every agile approach follows all the principles equally but approaches they generally follow the manifesto and principles are considered to be agile (Mezick, 2017).
Agile is widely adopted at technology companies and has become so important that the Project Management Institute, which focuses primarily on traditional project management, now also offers an Agile Certified Practitioner certification and added significant agile portions to the Project Management Book of Knowledge in its September 2017 release.
- Data Science Process Alliance is an organization Jeff and I are helping to launch. Agile training is at the core of the:
- The Basics:
- Agile Frameworks: