5 Agile Data Science Myths

One of the most common questions of data science project management is some variation of: “Is agile a fit for data science?”

Unfortunately – like a lot of questions in data science – this question itself is often misunderstood. Many (if not most) blog posts, on-line forums, and conversations debate this question by evaluating specific tactical process details such as ideal sprint cadence length or story point estimation tactics, without understanding agile as a set of values and principles.

So we’ll start by debunking some of the common misconceptions of agile for data science. This will help establish a common understanding of agile from which we can debate “Is agile a fit for data science?” in future posts.

Myth 1: Agile = Scrum

What is agile, anyway? By far the most common reference in answering this question is the Agile Manifesto, written in 2001 by 17 self-described “organizational anarchists” (see right). Teams that follow the manifesto values and its accompanying 12 principles are generally considered agile.

Scrum is arguably the most popular agile framework. Because of its popularity and because it is often the only exposure some people have with agile, Scrum is often confused to be synonymous with agile. However, Scrum is just one implementation of agile. Other approaches, such as eXtreme Programming (XP), feature-driven development (FDD), dynamic-systems-development method (DSDM), Crystal, Lean, Adaptive Software Development (ASD), and Kanban, when implemented to follow the Agile Manifesto and its 12 principles, are also agile.

Myth 2: Agile is just something you “do”

Another misconception is that agile is something you “do”. Although agile’s set of principles and values are general and don’t prescribe set practices, much of the argument on “Is agile a fit for data science” centers on specific practices that are defined as part of an agile framework such as Scrum. This focus on “doing” agile should be secondary to “being” agile.

Rather, answering “Is agile a fit for data science” (or for any domain for that matter), should first focus on whether your team should embrace the values and principles of agile. If “yes” then you can debate what practices and framework you want to “do” to live up to agile. This may or may not include common agile artifacts such as fixed-lengths sprints, burn-down charts, or story points.

Myth 3: Agile is only for Software Development

It’s understandable that many data scientists (among others) object that agile’s philosophy is only for software development. After all, software development teams have been the overwhelming driving force to agile adoption. And even the Agile Manifesto itself is titled that “Manifesto for Agile Software Development”.

However, agile’s adoption has increasingly grown beyond just software development, and ironically, many of agile’s roots don’t come from IT (Harvard Business Review, 2016). Non-software implementations of agile include: National Public Radio in programming creation, John Deere for new machinery, Team Wikispeed for electric cars, Saab for fighter jets, C.H. Robinson for human resources management, and Silicon Valley Data Science for train timetable predictions (Harvard Business Review, 2016)(wikispeed.org)(Akred, 2015).

Myth 4: Agile Data Science Practices = Agile Software Practices

Similar to Myth 3, agile practices are often viewed as practices built only for software teams. Indeed, many agile frameworks, like XP and ASD, specifically incorporate software engineering practices into their definition. And software teams who use agile frameworks like Scrum or Kanban often add software-specific practices to develop an overarching software development methodology.

Like software engineering, data science teams can adopt agile practices to cater to their needs. Microsoft’s Team Data Science Process and the Domino Data Science Life Cycle are two such examples that integrate common agile practices (that resemble Scrum) with traditional data-centric methodologies (CRISP-DM). As a third example, Russell Journey in Agile Data Science 2.0, presents an agile development methodology that integrates processes from data science, web application development, and big data technologies. Such data science-centric agile methodologies will become increasingly more common as the field of data science project management matures.

Myth 5: Compliance to an Agile Methodology is the Goal

Many data scientists complain that they work in environments where adhering to agile practices often seems to be the goal and that their progress is measured through “irrelevant” artifacts such as story points, burn-down charts, or number of stories completed. While such metrics might help a team understand how it is performing in relation to its given methodology, strict adherence to metrics or processes are generally counter-productive.

To de-bunk the misconception that “methodology compliance” is the goal, just look at the first value statement of the Agile Manifesto (“Individuals and interactions over processes and tools”) and its first principle (“Our highest priority is to satisfy the customer…”).

So “Is agile a fit for data science?” With a clearer understanding of agile for what it is—a set of principles and values—we’re at least a step closer to answering that question.

Learn More

Agility specific to data science is a key topic that we explore in:

And for a book on the topic, check out Russell Jurney’s Agile 2.0.

Don’t Miss Out on the Latest

Sign up for the Data Science Project Manager’s Tips to learn 4 differentiating factors to better manage data science projects. Plus, you’ll get monthly updates on the latest articles, research, and offers.