What is a Machine Learning Life Cycle?

A machine learning life cycle describes the steps a team (or person) should use to create a predictive machine learning model.

Hence, a machine learning life cycle is a key part of most data science projects. In fact, for many people, it’s not clear what is the difference between a machine learning life cycle and a data science life cycle.

So, in this post, I’ll explore the machine learning life cycle and discuss how it relates to the data science life cycle.

People talking about the Machine Learning Life Cycle

1. What is a Life Cycle

A life cycle is used to explain the steps (or phases) of a project. In short, a team that uses a life cycle will have a consistent vocabulary to describe the work that need to be done.

While machine learning engineers and data scientists can typically describe the steps within a project, they might not use the same words, or even define the same number of phases. By having a consistent vocabulary, the team can better ensure they do not “miss a step”. While you might think that experienced team members would know the steps and not skip steps, teams can easily skip steps. For example, I have often seen that when the team has deadlines, the team finishes one model and then goes directly to trying to create a different model, without really exploring how well the first model performs. This could be due to very tight schedules, or the team’s desire to explore many models and “play with the data”.

There is another benefit to use a life cycle, beyond ensuring the team does not miss a step and having a consistent vocabulary. That other benefit is that non-technical people, such as a product owner or a senior manager, can better understand the work required and how for a long the project is towards completion.

In summary, a life cycle framework will:

  • Standardize the process and vocabulary
  • Help guide the team’s work
  • Allow others to understand how a problem is being approached
  • Encourage the team to be more thorough, increasing the value of the work.

2. An Example Machine Learning Life Cycle

There are many published machine learning life cycles, as well as some machine learning life cycles that are really data science life cycles.

But one of the most popular frameworks is a simple machine learning life cycle known as OSEMN. OSEMN was defined in 2010 by Hilary Mason and Chris Wiggins. OSEMN, which stands for Obtain, Scrum, Explore, Model, iNterpret, has 5 phases:

1. Obtain Data

This phase focuses on gathering data from relevant sources. It is also the phase where the team should be thinking of challenges such as how to automate data collection (if needed).

2. Scrub Data

Scrubbing the data, sometimes known as “munging the data” is required because the data obtained in step 1 is typically “messy”. For example, the data might have missing values. This is often the most time-consuming phase of a machine learning project.

3. Explore Data

Exploratory analysis is useful to get a basic understanding of the data. For example, histograms and scatter plots can easily show distributions of the data across various attributes.

4. Model Data

Building a predictive model is typically what people think about when the envision a machine learning project. Note that sometimes the team just needs to build a “good enough” model, not the best model possible.

5. Interpret Results

No model is perfect, and so, people need to understand the predictive power of the model. In addition, this is the phase were the team needs to explore potential bias in the model.

3. The Data Science Life Cycle

CRISP-DM is the most commonly known and used data science life cycle. So, let’s explore CRISP-DM.

CRISP-DM has 6 phases:

  1. Business understanding – What does the business need?
  2. Data understanding – What data do we have / need? Is it clean?
  3. Data preparation – How do we organize the data for modeling?
  4. Modeling – What modeling techniques should we apply?
  5. Evaluation – Which model best meets the business objectives?
  6. Deployment – How do stakeholders access the results?

4. Comparing a Data Science and Machine Learning Life Cycle

At a high level, we can see that the data science life cycle includes all the concepts covered in the machine learning life cycle. However, the data science life cycle includes additional areas of focus. Specifically, the data science life cycle covers the end-to-end steps of a project (such as understanding the business context at the start of the project and thinking about deployment towards the end of the project).

For example, in comparing CRISP-DM to OSEMN, the CRISP-DM phases that map to OSEMN include Data Preparation, Modeling, and Evaluation. But, CRISP-DM has additional phases, such as understanding the business understanding phase, as well as phases at the end of the life cycle, such as Deployment.

One final note is that there are no clear and consistent definitions – so teams might extend the machine learning life cycle to include all the steps in the data science life cycle. If a team only has a machine learning life cycle (and not a data science life cycle), taking this end-to-end view is definitely something that will be helpful for the team.

5. For more information on Using a Machine Learning Life Cycle:

Explore posts that provide:

Or, for a deeper dive, take one of DSPA’s courses on data science (and machine learning) project management…

Get our Brochure

Get Training from the Organization that Defined Data Driven Scrum

Share this Post: