What is SEMMA? Data is used by businesses to achieve a competitive advantage, improve performance, and deliver more useful services to customers. The data we collect about our surroundings serve as the foundation for hypotheses and models of the world we live in. Ultimately, data is accumulated to help in collecting knowledge. That means the […]
KDD and Data Mining What Is the KDD Process? Dating back to 1989, the namesake Knowledge Discovery in Database (KDD) represents the overall process of collecting data and methodically refining it. The KDD Process aspires to purge the ‘noise’ (useless, tangential outliers) while establishing a phased approach to derive patterns and trends that add important […]
During the past few months, we conducted a poll to see what project management framework teams used to help execute their data science projects. Based on our survey of 109 respondents, nearly half of the respondents most commonly use CRISP-DM. This was followed by Scrum, Kanban and “My Own”. See results below. A quick review of the […]
How do you manage data science projects? Is it software? Is it research? Or maybe, simply magic? This four-part post is an overview 10 ways projects are or could be managed. To start, we’ll explore ad hoc project management, waterfall, and CRISP-DM.
While there is no standard process for a team to use when working on a data science project, CRISP-DM (CRoss-Industry Standard Process for Data Mining) is one framework that is often considered for data science projects. Perhaps because of this, there are lots of web sites describing the 6 phases of a CRISP-DM project, and […]
Ironically, data science teams that are so intensely focused on model measurement often don’t measure their own project performance which is problematic because… …But wait! Data scientists measure all sorts of metrics. Of course, they will closely monitor data science metrics and KPIs such as RMSE, F1 scores, or correlation coefficients. Such metrics are critical […]
What is CRISP DM? The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that naturally describes the data science life cycle. It’s like a set of guardrails to help you plan, organize, and implement your data science (or machine learning) project. Business understanding – What does the business need? Data understanding – What data do we have […]
What is Waterfall? Waterfall, also referred to as the classic life cycle or traditional project management, originated from manufacturing and construction and was applied to software engineering projects starting in the 1960s. A waterfall project flows through defined phases such as shown in the diagram to the right. Some waterfall models include variations of these […]
Ad Hoc ad hoc (adv) – “for the particular end or case at hand without consideration of wider application” –Merriam Webster Dictionary High Reliance on Ad Hoc Processes Without established methodologies for managing data science projects, teams often resort to ad hoc practices that are not repeatable, sustainable, or organized. Such teams suffer from low […]
Ad hoc processes might work for smaller, one-off projects but are becoming less sustainable as data science matures into a team sport. Meanwhile, Waterfall is the classic highly-structured project management approach that dates back to antiquity and was common in software 10 – 20 years ago. Realizing the need for a process specific to data […]