Machine Learning Project Management is of growing importance
If you are leading machine learning projects, you have probably noticed (or soon will notice) that machine learning project management is of growing importance and that it would be extremely helpful to have a repeatable agile process framework that helps to ensure the team can generate useful predictive models.
Furthermore, as the size of the team executing ML projects grows in size, it becomes even more important to understand how to manage machine learning projects and do machine learning project management in a repeatable fashion.
5 Machine learning project management questions
1. What is the difference between Data Science, Machine Learning, Artificial Intelligence projects?
Many of the blog posts on this site are focused on data science project management. So, the first obvious question is how does machine learning project management relate to DS project management.
Many people use the terms AI, Data Science, and ML interchangeably, and very clearly, machine learning projects require one (or more) data scientists. But some people do try to distinguish between these three key concepts. Below is my view on these terms:
- Data science is a term that describes the end-to-end process of generating actionable knowledge from data (understanding the business problem and data sources, extracting / cleaning data, exploratory analysis, modeling, communicating the insights to stakeholders).
- Machine learning, on the other hand, typically focuses on the modeling aspect of the problem. In other words, the algorithms to build predictive models from the data. With this perspective, machine learning can be viewed as a key step in most data science projects.
- AI/Artificial Intelligence is a term that many people would agree is related to machine learning, but note that artificial intelligence can include other capabilities beyond machine learning (e.g., remote sensors).
In short, machine learning is viewed as part of data science as well as a key component of AI.
Note that when people talk about machine learning projects, they often are implicitly thinking of the entire project life cycle required to generate a predictive model. Hence, from my perspective, data science projects and machine learning projects are often very similar in nature, in that both require understanding the business context, collecting / munging data, and the creation of predictive models via machine learning.
2. Why are Machine Learning Projects Difficult to Manage?
Many people have noted that ML projects are prone to failure. Examples of why projects fail include:
- Poor project management, which is a general way to note that the team might not be following a repeatable process of how they collaborate and communicate with stakeholders, in part, due to the challenge in managing ML projects. This is noted by Ian Xiao.
- Predicting how long tasks will take to complete is difficult. This is part of the reason managing ML projects is a challenge! This “scoping difficulty” is due to the exploratory nature of many tasks, and is discussed by George Bezerra
- It can be challenging to set clearly defined goals, as well as setting clearly defined expectations. This is because it is often not clear of the value/quality of a potential predictive model prior to a project. This challenge is discussed in more depth by Lukas Biewald
In short, it can be challenging for a team to be able to know, in advance, what’s hard and what’s easy. In addition, Machine Learning tasks are prone to ‘fail’ in unexpected ways (e.g., not clear if training data is representative of the actual situation, is there bias, etc).
This helps to explain why there are many challenges in executing a machine learning project, and indeed, why many ML projects do not deliver the insights expected.
3. Why not use a Software Development Project Management Approach?
You might be thinking, ML projects are software projects (since code is typically part of the ML project), so teams should use a software development approach. You would not be alone – many people have this thought! However, expanding on the previous question, some key differences between software and ML projects include:
- Project Feasibility (i.e., risk in the project delivering the requested insight)
- Knowing it works (i.e., how to know the project system is working accurately)
- Progress Tracking (i.e., how far along is the team in completing the project)
- Task Estimation (i.e., knowing how long a task will take to complete)
For more information on these questions, see the data science vs software engineering post.
Beyond these basic differences, machine learning projects can also involve a lot of upfront work (e.g., data cleaning). This is different than software projects. Hence, agile processes that work for software teams (e.g., Scrum) might not be conducive for a machine learning project.
This helps to explain the challenge of using Scrum (which is the most popular software development process framework) for Machine Learning projects. For example, in “A Doomed Marriage of Machine Learning and Agile”, Ian Xiao explores the challenges in using Scrum for ML projects. For example, one key challenge is that some tasks take longer than others. Furthermore, some tasks are difficult to estimate (Scrum sprints are always the same fixed duration).
In short, while there are some aspects of ML projects that are similar to software development projects (e.g., each creates a “code base”), there are many differences, and these differences mean that a different process might be required for an ML project, as compared to a software development effort.
4. How might I use an agile framework for ML projects?
While using Scrum might not be appropriate, some, such as this YellowRoad blog post, suggest that teams should use an agile framework for ML projects, in that an agile framework should be most able to handle uncertain outcomes that often occur within an ML project, and that an agile framework can help a team focus on continuous process improvement (e.g., via retrospectives).
In thinking about how a machine learning project could use an agile approach, it is useful to take a step back and define the key concepts that should drive a lean agile machine learning project. Below are three key concepts:
- Agile is intended to be a sequence of iterative experimentation and adaptation cycles.
- The goal of each cycle is to explore a hypothesis (or experiment), build it, and then observe the ML in action, and then to analyze those observations to create the next idea or experiment.
- Going from an initial idea, through implementation, and the analysis of the results should be the basis for an iteration. The completion of the empirical process should mark the end of an iteration (not a predetermined number of elapsed hours).
Data Driven Scrum™ is a new framework, specifically defined for data science and machine learning projects. Data Driven Scrum (DDS) is a variation of Scrum that, in general, is better suited than Scrum to address some of the key challenges encountered when executing a machine learning project. DDS helps to ensure that teams can achieve the three concepts described above without encountering the fixed sprint issue in Scrum.
In short, DDS helps enable teams to focus on their highest priority tasks (i.e., trying to ensure that the time spent during an iteration goes towards work that was actually required to run the given experiment / iteration) while enabling tasks to be reprioritized as needed. The key concepts that help a team achieve an agile approach while using DDS include:
- Functional Iterations: DDS iterations may have an unknown or varying length duration. This is helpful for tasks that might take a long time (e.g., data cleaning) as well as enabling rapid iteration where possible (e.g., model exploration).
- Flexible Task Estimation: Since iterations are not time-boxed (but rather capability based), teams do not have to have detailed task duration estimates, which is often difficult within an ML context.
- Iteration Independent Meetings: Retrospectives and iteration reviews are important to have, but not done at the end of every iteration, but rather, on a calendar-based frequency the team deems appropriate.
5. Where can I find more info?
Check out other blogs posts, focusing on the more general data science project management changes, such as:
- More on Scrum, Kanban
- Blog post on data science process