Ad hoc processes might work for smaller, one-off projects but are becoming less sustainable as data science matures into a team sport. Meanwhile, Waterfall is the classic highly-structured project management approach that dates back to antiquity and was common in software 10 – 20 years ago. Realizing the need for a process specific to data mining, CRISP-DM was defined in the late 1990s. Both approaches could be applied to data science.
Waterfall, traditional software development life cycle (SDLC), and predictive project management approaches are highly-structured with extensive up-front planning, horizontally-layered development phases, and a commitment to follow the plan. Given the uncertain and cyclical nature of most data science projects, it is hard to find a scenario in which a waterfall approach would be appropriate; yet, many data science projects revert back to one.
CRISP-DM) is the first widely-accepted process methodology for data mining. It remains the most popular Knolwedge Discovery in Database (KDD) methodology today and provides a natural over-arching framework for data science. However, it lacks team-based processes and its phased approach with heavy documentation is somewhat reminiscent of waterfall.