The previous post in this four-part post series investigated two common agile approaches to data science:
- Scrum: a time-boxed approach with well-defined team roles and meetings; focused on delivering batch product increments
- Kanban: a more fluid and less prescriptive approach focused on continuous deliveries and reducing cycle time times
And the initial post focused on traditional approaches:
- Ad hoc: minimal planning and focused on executing; often results in poor coordination and re-work and should only be considered for small, one-off projects
- Waterfall: somewhat opposite of ad hoc with a very heavy upfront planning approach; more for stable projects and generally not appropriate for science
- CRISP-DM: a natural data science workflow; lacks key project management considerations, particularly in regards to team management
“These two ways of thinking of the process need to be in place and they are not competing or contradictory but are complimentary. So when you are working with a wicked problem, you need to be open to a kind of process that is more designerly or fluid and you need to iterate but you also need a process that organizes people and time and puts structure on things.”
-Erik Stolterman, Indiana University
What is it?
Agile approaches emphasize rapid deliveries and responding to change while prescriptive approaches like waterfall focus on thorough upfront planning and following the plan. Can you do both? Yes…at least in theory. Agile-waterfall hybrids (also sometimes referred to as bimodal) are methodologies that take parallel paths to delivering a project: one path that focuses on extensive upfront planning and project controls and another that flexes to the needs of the project.
What are some example implementations?
One project manager I interviewed described an ideal data science project management approach as something “like Scrum with a waterfall wrapper around it” to ensure that the planning stays one step in front of the more fluid work of the data scientists.
Another manager at Eli Lilly explained that his data teams prefer agile development but that two-week development sprints may be followed with months of a waterfall-style of documentation and testing to ensure FDA and internal company compliance.
- Meets specific use cases: Agility is key to data science. But when it can get you in trouble, you might need some rigid, prescriptive practices.
- A stepping stone: Agile transformation are challenging on many fronts. Implementing some agile practices to an otherwise prescriptive approach can ease this transition.
- Might not yield benefits: Attempting to be everything (agile and prescriptive) can lead to being bad at both.
- Might lead to process confusion: Balancing the competing needs of agile and prescriptive approaches is challenging.
- May not be respected: People who ascribe to the agile camp of thought often view waterfall as the antithesis to agile and label bimodal approaches as “frAgile”.
The bottom line
Generally reserve such approaches for specific use cases when constraints like organizational policies or external regulations over-ride the freedom to be agile.
Research and Development
“For any type of problem that is unknown, you need to have two batches of time – the research and then figure out how to productionize it.”
-Tyler Foxworthy, (former) Chief Scientist at DemandJump
“Data science problems stop when they can be turned into engineering problems.”
What is it?
Broadly speaking, research and development is the general set of approaches to investigate, prototype, and eventually produce new innovative products.
A data science project can likewise be viewed as a research endeavor whose output transitions into an engineering project. Thus, a research and development project approach divides the overall project into two broad pieces — a data science research phase followed by an engineering phase. Each of these phases would be managed using a different approach, typically a loosely-structured or even ad hoc approach to the data science phase and typically an agile approach for the engineering phase.
What are some example implementations?
Two interesting use cases are at Google Brain and DemandJump. Both use a largely ad hoc approach for the data science phase and agile for the development phase.
Ryan Poplin, Machine Learning Lead at Google Brain, described his environment as loosely structured research teams which operate in two distinct phases. During the “Research” phase, the team members largely work independently with a project manager who is mostly focused on procuring data sets. Occasionally, they transition into a “Development” phase to produce a “proof of principle.” These are two week sprints featuring close collaboration, daily standups, bug trackers, burn-down charts, and a hands-on project manager.
DemandJump, an AI marketing startup in Indianapolis, followed a similar approach. Tyler Foxworthy, the Chief Scientist, keeps a very loose project structure and acts like a thesis advisor to his data scientists by helping to set up the problem and providing guidance. Once the underlying problem becomes well-defined, then he kicks the work over to a project manager and the software engineering team for development.
- Fits data science life cycle: Data science work needs a looser structure than deployment / engineering work. So manage each phase accordingly.
- Fits research backgrounds: Many data scientists come from research backgrounds and will find this approach natural.
- Phase transitions: Projects often fail at hard life cycle transition points. Coordination is needed with engineering, IT, and the business at the start of the project – not as an afterthought.
- Ad hoc data science: The initial data science phase could fall victim to the shortcoming of ad hoc processes.
The bottom line
These approaches are interesting options that work best for mature teams that are implementing research-intensive data science projects.
Agile-waterfall and research and development are general approaches that can be applied to data science. Are there other hybrid approaches that are specifically designed for data science? Read the final post to learn the answer.
Learn all 10 Approaches in this Series
Data Science Process Training
Data Science Process Alliance
Combining data science process research with industry-leading agile training, the Data Science Process Alliance is the leading data science process membership, training and certification organization.
Download the course brochure to get Data Science Team Lead certified and confidently deliver data science outcomes.