Posted on

What is SEMMA? Data is used by businesses to achieve a competitive advantage, improve performance, and deliver more useful services to customers. The data we collect about our surroundings serve as the foundation for hypotheses and models of the world we live in. Ultimately, data is accumulated to help in collecting knowledge. That means the […]

KDD and Data Mining

Posted on

KDD and Data Mining What Is the KDD Process? Dating back to 1989, the namesake Knowledge Discovery in Database (KDD) represents the overall process of collecting data and methodically refining it. The KDD Process aspires to purge the ‘noise’ (useless, tangential outliers) while establishing a phased approach to derive patterns and trends that add important […]


CRISP-DM for Data Science Teams: 5 Actions to Consider

Posted on

While there is no standard process for a team to use when working on a data science project, CRISP-DM (CRoss-Industry Standard Process for Data Mining) is one framework that is often considered for data science projects. Perhaps because of this, there are lots of web sites describing the 6 phases of a CRISP-DM project, and […]


10 Data Science Project Metrics

Posted on
Measuring Data Science Project Performance

Ironically, data science teams that are so intensely focused on model measurement often don’t measure their own project performance which is problematic because… …But wait! Data scientists measure all sorts of metrics. Of course, they will closely monitor data science metrics and KPIs such as RMSE, F1 scores, or correlation coefficients. Such metrics are critical […]


Posted on
CRISP-DM Life cycle

What is CRISP DM? The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that naturally describes the data science life cycle. It’s like a set of guardrails to help you plan, organize, and implement your data science (or machine learning) project. Business understanding – What does the business need? Data understanding – What data do we have […]


Posted on
managing data science projects with waterfall

What is Waterfall? Waterfall, also referred to as the classic life cycle or traditional project management, originated from manufacturing and construction and was applied to software engineering projects starting in the 1960s. A waterfall project flows through defined phases such as shown in the diagram to the right. Some waterfall models include variations of these […]

Ad Hoc

Posted on

Ad Hoc ad hoc  (adv) – “for the particular end or case at hand without consideration of wider application” –Merriam Webster Dictionary High Reliance on Ad Hoc Processes Without established methodologies for managing data science projects, teams often resort to ad hoc practices that are not repeatable, sustainable, or organized. Such teams suffer from low […]

Traditional Approaches

Posted on

Ad hoc processes might work for smaller, one-off projects but are becoming less sustainable as data science matures into a team sport. Meanwhile, Waterfall is the classic highly-structured project management approach that dates back to antiquity and was common in software 10 – 20 years ago. Realizing the need for a process specific to data […]