Similarities of Data Science and Research Efforts
In many ways, a data science project looks like a research project, in that both require significant effort exploring a problem that typically doesn’t have a known answer. For example, in data science, it’s often not clear where there is “value in the data”, which is similar to the open questions faced in a research project.
Given this similarity, it is no surprise that many think that one should (or at least could) manage a data science project just like a research project. Indeed many senior project managers, who have a research background, might be tempted to assume that research teams and data science teams can be managed in a similar manner.
The Challenges of Managing Data Science Projects like Research Projects
However, data science projects have some key differences, as compared to research projects. For example, a data science project typically needs to report on status, and get feedback from stakeholders, in a very different manner as compared to a research project. So, while a data science project does focus on exploration and discovery (such as “finding insight in the data”), which is similar to a research efforts ambiguous requirements, a data science effort still needs to have some way of tracking progress and reporting status to a client at regular intervals, which is typically far more frequent than what is needed during a research project .
These differences often cause challenges, that one will likely encounter if one tries to manage a data science project as a research effort. Consequently, applying a project management methodology designed for a research effort onto data science project will likely fall short of expectations. Below we elaborate on some of the key differences between research projects and data science projects.
Generally, a research effort explores one or more research questions requiring long periods of time until those research questions are understood, often that time period is measured in months or even years. However, a data science project is a journey that uses a combination of proof of concepts and trial-and-error efforts to effectively map the problem to the solution space; there must be iterative updates provided to the client to ensure the analysis is on an appropriate path to provide actionable insight. A project management environment that does not understand and respect this need to keep the client engaged might let the team “wander” in an area that would not lead to actionable insight, or might cause a client to become frustrated due to lack of awareness of progress to date, which could ultimately lead to not funding the data science effort.
A research effort often explores an area that the researchers find interesting, and the researcher might let project identify and then solve an interesting tangential effort. However, within a data science context, the effort must focus on actionable insight for a client. In addition, even though most data science outcomes leave room for improvement, the client does not have a perfectionist mindset and will often settle for a “very good” results, not the perfect result.
Much of the data science process involves tasks like data cleaning, exploratory data analysis, and experimentation that have unknown scope and complexity. While these tasks might be difficult to estimate, and hence, the project might feel like a research effort, it is also important to remember that clients need to estimate how much a data science project will cost and how long the effort will take. So, while an exact budget or timeline is not often possible, some sort of timeline and budget is often needed by a client.
Planning for Post-Deployment
Often people on a research project do not worry about about how to use the research product in a production context. However, a data science team might have take this into account.
<Previous: Shortcomings of managing data science as software engineering | Next: Outlook >