It is ironic that data science, a field built upon rigorous scientific methodologies, has been slow to adopt much rigor from project management approaches. Rather, in data science, project management tends to be ad hoc and is often on the back-burner as a secondary consideration to technologies and algorithms.
Fortunately, this is changing as organizations are increasingly applying project management practices to data science. Why? Here are five reasons.
“There are companies that get project management and those that do not. Companies that ‘get it’ and manage data science as a project will get results. Companies that do not will be disappointed and call data science a ‘hype.’”Carol Choksy, Indiana University
The Rise of Remote Teams
The first reason to come to mind is the impact that the sudden rise of remote teams is having on projects of all kinds. While some find remote work to be a blessing, many find it a curse as “over 70 percent of employers report struggles with shifting to remote work” (npr.org). Remote work has been blamed for work-life balance disruptions, decreased focus on the task at hand, lower quality interpersonal interactions, and lower overall productivity (npr.org, bloomberg.com).
As Jeff shared in his last post, such challenges can at least be mitigated through proper processes that facilitate communication, account for varied work environments, encourage motivation, and re-affirm team culture. The freshly minted fully remote data science teams are re-thinking how they work together which uncovers new processes. Even as many of these teams transition back to the office, some of these best process practices will likely stick.
Data Science is a Team Sport
Yet, the trend toward project management process adoption in data science has been on the rise well before COVID-19 disrupted on-site work, in part because of the growing realization that “Data science is a team sport”.
We naturally intuit that we might be able to get by with ad hoc processes if only one or two people are working on a project. However, even for this type of “small project”, there are typically additional people that need to be involved. This might include data engineers, architects, product owners, process masters, and/or subject matter experts. These other people are needed to help ensure actionable insight, management approval and knowledge of how to best leverage potentially available data.
In short, data science teams continue to expand, which means the team process they use is of growing importance. This helps to explain why teams are adopting repeatable practices in their projects to scale and meet the communication and coordination challenges of their larger and more diverse teams.
Data Science is growing up
I’ve commonly heard that data science today is being managed like software from 20 or 30 or 40 years ago. For example, Daniel Mezick, President of New Technology Solutions, explained to me that in its early days, software development was misunderstood. The common metaphor was manufacturing: “Do A. Then B. Then C.” It took decades for people to realize that software development was more empirical and needed project processes that were more fluid than manufacturing processes or traditional waterfall practices.
Fast-forward to today, and data science is still often misunderstood as just software and should be managed as such. Do software practices apply to data science? Well, the answer is complicated (visit this dedicated page for a fuller answer). In short, data science can learn from the software field but simply following software practices often create more problems than it solves.
Fortunately, people are beginning to recognize data science as a distinct field and that applying some agile software practices while respecting the natural experimental-heavy nature of data science will likely yield the best results. Organizations are documenting and evangelizing data science specific project management processes such as IBM’s ASUM (2015), Microsoft’s Team Data Science Process (2016), Domino Data Lab’s methodology (2017), and Data Driven Scrum (2019).
Production Output becoming the Norm
Sure, analyses conducted in Jupyter Notebooks or R Studio with a Tableau viz or Shiny App on top are still needed, but it is becoming more common that such work is no longer the project itself but rather is a part of a more comprehensive project to productionize the model.
This is where the data science field can best lean on lessons from software engineering as the two blend together in producing software-production systems that are powered by data science models. Data science teams are adopting repeatable processes ranging from version control, model/code reviews, continuous integration, and incremental system deployments as they realize that these can help them develop production-level data science solutions.
Ethics and Regulations
With artificial intelligence integrating more and more into our daily lives, questions around ethical practices are reaching the forefront of conversations in data science. Governments have responded with new data privacy and use regulations such as Europe’s General Data Protection Regulation (May 2018) and the California Consumer Privacy Act (Jan 2020).
Whether out of fear of being in the news for illegal data use or in good faith for common societal good, industry leaders are realizing that ethical and regulatory considerations need to be built into data science projects. Ad hoc project management processes are less likely to integrate these key considerations, and if they do, it may not be until late in the project which opens the door for possible significant re-work.
So what Project Practices should we use?
There is no one-size-fits all project management approach for data science. In the end, it is up to you to best determine how to manage your next data science project. Here are a couple of possible next steps:
You could start researching various project management process approaches for data science (our approaches overview page is a good starting point).
You could also take a team process training course such as the Data Science Team Lead course.