What is a Data Science R&D Approach?
The data science process can be viewed as a research endeavor that transitions into an engineering project.
As such, some organizations combine traditional research methodologies with modern agile development approaches. Google Brain and DemandJump are two companies that split data science into these two general buckets.
Training on How to use an R&D Approach
Or for a high-level overview, read on…
Data Science R & D Approach at Google Brain
Who is Google Brain? It’s a deep learning research group at Google. They do not ascribe to a rigid project management approach but loosely divides work into unstructured research time and semi-structured development sprints. Ryan Poplin, a Machine Learning Technical Lead who conducts genomics research, explains that most of their work is research-oriented and does not fit well into standard project management approaches (Poplin, 2017).
How are teams structured? At Google Brain, project teams are very fluid and loosely defined. Individuals generally have broad freedom to change teams and tend to prioritize their own work based on their interests and the broader team needs. Teams, consisting of 6 to 8 scientists and engineers, meet on a quarterly basis to determine if their projects should continue; projects that do not receive votes are decommissioned. Project managers tend to be hands-off from the daily research and instead focus externally to collaborate with stakeholders. Because so much of their teams’ success depends on the quality and availability of data, the project managers devote much of their time to procuring data sets that meet the researchers’ needs. Much of this work is just tracked in spreadsheets (Poplin, 2017).
Research and “Sprint”: Occasionally, they need to closely collaborate to produce a deliverable such as a proof of principle which Poplin describes as “a smallish project to prove a concept”. To complete the deliverable, the team comes together in an intensive output-focused two-week sprint; however, their concept of a sprint bears only some resemblance to Scrum’s definition of sprint. Team members collaborate closely and have daily standups to plan their work. The project manager takes a more hands-on approach to:
- track work items,
- record bugs
- manage a burn-down chart
- interpret issues
- help them execute work as a team
- and hold team members accountable.
Poplin says without the project manager, they would not be able to execute effectively as they would otherwise likely ignore project responsibilities and “just go back to the research” (Poplin, 2017).
Data Science R&D Approach at DemandJump
Who is DemandJump? A similar hybrid approach that uses unstructured research and structured development cycles is employed at DemandJump, an Indianapolis startup that offers an artificial intelligence marketing platform. Tyler Foxworthy, Scientific Advisor at DemandJump, sees data science as distinct from engineering and that the two disciplines should be managed differently. He explains, “For any type of problem that is unknown, you need to have two batches of time – the research and then figure out how to productionize it.”
Research: He compares his work leading a data science team to that of a thesis advisor – he helps set up a problem for his team and provides guidance but otherwise allows them to conduct their own research during largely unstructured time. Foxworthy believes that “you can’t put a time box on open problems because you can’t schedule insights.” Rather, “it’s better to scope specific time for research.”
Development: Eventually, when the underlying problem becomes well-defined and solve-able, then it’s time for the development phase. At that point, Foxworthy “gets the project manager and engineering involved because you should be able to scope the time.”
As Nick explains in the data science vs software engineering post, data science is generally a research endeavor that needs flexible processes. However, full-scale data science products to be deployed which is more of a software engineering approach. Therefore, it makes sense to use two different project management approaches for these two different project phases.
So is data science r&d the way to manage projects? Well…it’s complicated.
Research phases are difficult to monitor and control which requires discipline from its users to focus on producing value and trust from management to provide them with freedom. This hands-off project management approach during research could fall victim to the risks of low maturity processes.
Ryan Poplin admitted that the Google Brain approach isn’t for everyone but works well for them as highly motivated researchers whose work is usually individually-focused (Poplin, 2017).
Additionally, dividing the overall project into phases (research and then development) is counter to the agile practice of providing “vertical slices” of value frequently from the start of a project; rather it somewhat mirrors the phased approach of waterfall whereby value delivery is deferred until later project phases.
In summary, these approaches can be effective but are perhaps best reserved for mature team environments whose work is primarily research-focused.
Data Science Process Alliance: Given the requests we’ve had for training, Jeff and I have helped launch the DSPA which can help you learn how to better deliver projects.
Other Process Alternatives: This page is part of the learning center dedicated to exploring other general data science workflows and processes. Some related processes to explore include:
- Agile for Data Science how to deliver quickly
- Agile-Waterfall Hybrids which attempts to combine Agile and Waterfall
- Up-and-coming Emerging Approaches
- CRISP-DM – the most popular framework for data science