A common theme found in our research is that state of data science project management today is immature, much like software project management from 20 or 30 years ago. Back then, software projects were mostly managed through predictive Waterfall methodologies, largely because these were the established best practices borrowed from other industries. However, software engineers faced repeated project challenges and failures which led companies to realize that software engineering was a distinct discipline that needed its own project management methodologies separate from industries like construction and manufacturing. This paved the foundation for the agile movement which, as a more natural fit for software project management, has led to increased project success rates.
Fast forward to today, data science is a nascent field. Without clearly understanding data science’s unique attributes, companies often bucket its projects with its closest known cousin, namely software engineering. The adoption of practices such as Scrum and Kanban are a positive step along the path of data science’s maturation but only partially solve fundamental issues of managing data science projects. Like software engineering decades ago, as the data science field matures, its focus will shift beyond just the technology and algorithms to incorporate more holistic project management approaches that specifically address the unique aspects of data science.
Signs of this positive direction are clear. An academic literature review found a strong uptick starting in 2014 for research addressing “the methodological aspects of big data projects” (Saltz & Shamshurin, 2016). Except for the CRISP-DM and KDD process-related information, nearly all sources that intersect both data science and project management in our research are from 2013 or later. While there is a preference for more recent information, this is more a consequence of the lack of sources prior to 2013 and a sign of the rapid proliferation of data science project management information.
Moreover, Microsoft’s Team Data Science Process (2016) and Dominos Data Lab’s Data Science Lifecycle (2017) and Data Science Process Alliance’s Data Driven Scrum (2019) contribute to the maturation of the data science field. The development of such approaches shows that leading technology companies recognize the need for a better project management approach for data science and their acknowledgment for the marketing value behind being seen as a leader in data science project management. Other companies will likely follow suit in the following years.
Contributions from corporations, academia, and data scientists to data science project management will fill the current dearth of information of data science project management and open rich communities similar to those in technical data science communities. By developing and applying more holistic team-based project management approaches, data scientists, project management, and organizations will more effectively convert data science investments in time, talent, and technology into tremendous value.
Got suggestions for how to further support this movement? We’d be happy to hear from you.
<Previous: Managing Data Science as Research