Machine Learning Model Operations is a multidisciplinary field that is gaining traction as organizations are realizing that there’s a lot more work even after model deployment. Rather, the model maintenance work often requires more effort than the development and deployment of a model.
Hence, the world of Machine Learning Operations (also known as Machine Learning Model Operations, ML Ops, or Data Science Ops) is a set of emerging practices that facilitate the end-to-end machine learning life cycle – From design ideation up through and including ongoing maintenance.
To help us navigate through this exciting world, I sat down with Luigi from MLinProduction.com.
So, with some pointers from Luigi, let’s dive into:
- What is machine learning operations?
- Why is maintaining a model important?
- What else do you need to maintain a machine learning system?
- Why is this a hot topic today?
- When should you start planning operations?
- How is machine learning operations evolving?
- Where can you learn more?
Listen to the 25 minute interview above.
What is machine learning operations?
Machine learning operations (ML Ops) is an emerging field that rests at the intersection of development, IT operations, and machine learning. It aims to facilitate cross-functional collaboration by breaking down otherwise siloed teams.
Machine learning operations is more than just a single tool. Rather, it spans a wide set of practices, systems, and responsibilities that data scientists, data engineers, cloud engineers, IT operations, and business stakeholders use to develop, scale, deploy, and maintain machine learning solutions.
More specifically, Luigi thinks of machine learning operations “as everything from planning and building your original pipelines and delivery mechanisms for machine models all the way through seeing those through working in production, QA-ing any issues, and finding any issues proactively before they spoil your system and integrating any changes for the future development of the machine learning pipeline.”
Why is maintaining a model important?
What does this quote from an Ancient Greek philosopher have to do with maintaining machine models?
Source: Wikimedia Commons
Well, as Luigi explains: “The most predictive a model will be is when you get to production […because…] the world will change and the relationship between the input and the output will deteriorate over time.”
For example, imagine you are predicting the number of people who will buy a ticket for a cruise ship. If you developed your model in early 2020 based on data from 2019 … well, the model probably isn’t very effective in 2021.
Even less dramatic use cases (think a movie recommendation engine or the price to charge a ride-share user) require regular inspection of the input data and the model’s performance as consumer preferences and market conditions constantly evolve.
What else do you need to maintain a machine learning system?
A lot, actually.
The core model maintenance rests on properly monitoring and maintaining the input data and retraining the model when needed. Knowing when and how to execute this is in of itself a significant task and is the most unique piece to maintaining machine learning systems.
However, this is only part of the picture. In fact, per a 2015 paper from Google, the machine learning code is only a small portion of the overall infrastructure needed to maintain a machine learning system.
Image Source: Google Whitepaper
As such, much of what is already established in the more mature field of software operations applies. Afterall, “Machine learning systems at the end of the day are software systems. So a lot of the operational practices that people are trying to implement in machine learning today are really derived in some way on good software operations practices.” (Luigi interview).
Why is the field a hot topic today?
Well, it depends on your perspective. Luigi points out that companies like Google or Facebook have understood the importance of maintaining a production-based machine learning system for years. For them, ML Ops isn’t really new.
However, as machine learning has gone mainstream, it has caught the eye of more organizations. And the rising focus on machine learning operations is now following this same path. Luigi clarifies that “It was first how do we even create a model, then it was how do we get the model into the real world … and now it’s how we make sure the model continues to operate well…it’s a natural progression of the development life cycle of the model and most companies just weren’t at this place a few years ago… but now there are more companies at this stage of the game making sure the models operate post-deployment.”
When should you start planning operations?
Data science projects fail at alarming rates. One of the leading causes of failure rests from a hard handoff from the data scientists to another team that needs to maintain the model and system.
Don’t blindly throw the model over the wall! Disaster may result!
By bringing the operational considerations earlier in the project life cycle, you are more likely to avoid this pitfall. Luigi agrees. He puts it more bluntly: “Thinking about the deployment at the end of the development of the model is a terrible idea…you need to think about it at the very beginning of the project.”
Facilitate conversations about deployment and operations throughout the project. Disaster could be averted!
How is the field of machine learning operations evolving?
Luigi again points out the similarities with software. “I don’t think you’d be very wide of the mark by looking at how software operations has developed, and then figuring that you’d sort of be in the ballpark with machine learning operations”. Some specific items that he points out as evolving in the field include:
- How do you better handle model change management?
- How do you automate the monitoring of the model performance?
I’ll add two additional observations into ML Ops evolving trends:
- Machine learning systems are more frequently being recognized and treated as full-fledged products. This product life cycle encompasses the project life cycle plus the operations and looks to maximize value throughout the entire system (as opposed to delivering a model, closing the project, and leaving it be). Learn more with this post about Data Science Product Managers.
- Society is demanding more accountability from artificial intelligence. Thus, ethical protocols and legal regulatory compliance are becoming increasingly more important to operating a model. Ask yourself 10 Ethics Questions in Data Science.
Where can you learn more?
- MLinProduction.com – Luigi’s blog provides several best practices and tactical advice on how to build and maintain real-world machine learning systems.
- ml-ops.org – This intuitive site explores a lot of the conceptual and high-level aspects of machine learning operations.
- Why Machine Learning Models Hate “Change” – This Towards Data Science post provides a short and simple overview of model drift, one of the key elements to look out for in operational models.
- Practitioner’s Guide to ML Ops – Google’s detailed 37-page whitepaper outlines the machine learning operations life cycle and its core capabilities.
- What is the Data Science Life Cycle? – Our post provides a view into steps into the development and deployment of a data science model.
- What is the ML Life Cycle? – Similar post to the above, focusing on the machine learning life cycle.
- Data Science vs Software Engineering – Another one of our posts. It helps clarifies the differences and similarities between these fields.