During the past month, we conducted at poll to see what project management framework teams used to help execute their data science projects. Based on our survey of 109 respondents, CRISP-DM was the most commonly used data science process framework (it was used by about half the respondents). This was followed by Scrum, Kanban and “my […]
How do you manage data science projects? Is it software? Is it research? Or maybe, simply magic? This four-part post is an overview 10 ways projects are or could be managed. To start, we’ll explore ad hoc project management, waterfall, and CRISP-DM.
While there is no standard process for a team to use when working on a data science project, CRISP-DM (CRoss-Industry Standard Process for Data Mining) is one framework that is often considered for data science projects. Perhaps because of this, there are lots of web sites describing the 6 phases of a CRISP-DM project, and […]
Ironically, data science teams that are so intensely focused on model measurement often don’t measure their own project performance which is problematic because… …But wait! Data scientists measure all sorts of metrics. Of course, they will closely monitor data science metrics and KPIs such as RMSE, F1 scores, or correlation coefficients. Such metrics are critical […]
Similarities of Data Science and Research Efforts In many ways, a data science project looks like a research project, in that both require significant effort exploring a problem that typically doesn’t have a known answer. For example, in data science, it’s often not clear where there is “value in the data”, which is similar to […]
Similarities of Data Science and Software Engineering Projects In many ways, data science looks like software engineering. Both require significant coding to address an underlying business problem or opportunity, which typically requires frequent stakeholder interaction. Furthermore, when a production data science model is required, just as for traditional software systems, there is a requirement to include […]
What is CRISP DM? The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that naturally describes the data science life cycle. It’s like a set of guardrails to help you plan, organize, and implement your data science project. Business understanding – What does the business need? Data understanding – What data do we have / need? Is […]
What is Waterfall? Waterfall, also referred to as the classic life cycle or traditional project management, originated from manufacturing and construction and was applied to software engineering projects starting in the 1960s. A waterfall project flows through defined phases such as shown in the diagram to the right. Some waterfall models include variations of these […]
Waterfall is the classic highly-structured project management approach that dates back to antiquity and was common in software 10 – 20 years ago. Realizing the need for a process specific to data mining, CRISP-DM was defined in the late 1990s. Both approaches could be applied to data science. Waterfall, traditional software development life cycle (SDLC), and predictive […]