Ironically, data science teams that are so intensely focused on model measurement often don’t measure their own project performance which is problematic because…
…But wait! Data scientists measure all sorts of metrics.
Of course, data scientists will closely monitor metrics such as RMSE, F1 scores, or correlation coefficients. Such metrics are critical to answer “How well is the model performing?” but by themselves cannot answer questions such as “How well is my project progressing?” or “Will my project make an impact?”
…So how do you measure data science project performance?
The answer is not straight-forward. Just as different model accuracy metrics measure different types of models, different project management metrics can help measure different teams, projects, and environments.
9 Metric Groups
Traditional Project Management Metrics
Traditional project managers compare time, cost, and scope performance relative to a baseline plan. On the one hand, because data science projects tend to evolve without adherence to an initial detailed plan, such variance metrics are not generally useful. On the other hand, data science project teams are often required (via contract or management degree) to hit deadlines or to adhere to resource or budget plans. As such metrics like on-time milestone completion rate and actual vs estimated budget might still be required.
Popular agile metrics like story point velocity and percentage of committed stories completed only help if the team uses story points and leverages time-boxed iterative frameworks such as Scrum. However, such practices may only be practical for the productization / engineering phases of your project.
Cycle times, as emphasized in Kanban approaches, are generally more meaningful as they measure how quickly a team turns around value. However, measuring cycle times of individual tasks are easily manipulated (e.g. by changing the granularity of task definition) or difficult to baseline for consistency (e.g. EDA on data set XYZ might naturally be significantly more difficult than on ABC). Therefore, you’ll either need to closely monitor and standardize task definitions or search for more definitive cycle times such as:
- Time from project request to kick-off to measure bandwidth to intake new requests
- Time from project kick-off to delivery of the minimal viable product to measure how quickly a team can deliver initial value
- Demo frequency or rate of meaningful insights to measure how frequently results are delivered to stakeholders
Value measured in financial terms is often the most important set of metrics for any project-based at for-profit organizations. Incremental revenue earned, incremental profit earned, or incremental costs reduced are among the simplest metrics while payback period measures the time needed for the benefits pay for the project costs. More advanced metrics, most notably Net Present Value (NPV) and Return on Investment (ROI), measure the value of project’s cash flow intake relative to costs by taking into account the timing of the cashflows and time-value of money. Generally, several of these metrics combined budget information paint the most comprehensive picture of financial impact.
Impact to Organizational Goals
While financial metrics tend to be somewhat standardized, organizational goals vary drastically. These goals can sometimes be measured in financial terms, but often it’s best to measure project impact with the same metrics that stakeholders use for their own projects. For example, a data science project manager at a non-profit that aspires to reduce childhood obesity might tie project goals to organizational goals such as childhood obesity rate or exercise minutes per child. Such metrics are natural to measure because these are often the same target variables that data scientists strive to influence.
Value is often derived from projects in ways that are not directly related to the stakeholders’ goals. For example, teams might create new data sets or a deployment application during a project. Such artifacts are of value themselves because they can be re-used (perhaps with modifications) in other projects. Thus, number (or value) of artifacts created can help measure whether the team is building effective underlying infrastructure to support future projects while number (or value) of artifacts re-used can measure whether the team is taking advantage of the fruits of previous projects.
Similarly, data science project team members often need to dedicate significant time to learning new technologies and algorithms. Thus, the number of competencies gained can indicate whether the project team gained relevant skillsets by executing a project.
Stakeholder satisfaction is of utmost importance, especially for agile teams (the first Agile Principle states that “Our highest priority is to satisfy customers…”). Net Promoter Score is one such metric touted by marketing departments (hbr.org) which can be calculated from surveys to customers who engage directly with the end data science product or with the data science team. However, data science output often works behind the scenes and/or for internal teams who may eschew from providing honest, critical feedback. As such, stakeholder satisfaction might have to be a soft measurement based on the intuition of the project or product manager or derived from proxies such as stakeholder use of the end product or number of actions initiated because of the project.
Software Specific Metrics
End-to-end data science projects have software deliverables that can be measured by software metrics. Examples include defect count, defect resolution time, frequency of tech reviews, latency (for real-time applications), or automated test coverage.
Model Performance Metrics
And we circle back to the start…Yes, technical model performance is a key group of metrics can drive much of the project strategy. For example: Is the model performance sufficiently better than baseline? If so, perhaps offer it to stakeholders and begin testing its performance in controlled experiments. Alternatively: Has the model performance flat-lined? This might indicate you should close further model development (because the results are about as good as they get for now) or search for different techniques or data sets that might drastically shift model development work.
The End Result
|Traditional metrics||How are we performing relative to plan?||Time, budget, and scope variance to plan|
|Agile metrics||How frequently are we providing value?||
Velocity metrics |
|Financial metrics||Are we creating organizational financial value?||Revenue and cost metrics, payback period, ROI, NPV|
|Organizational goals||Is my project impacting organizational goals?||Varies widely|
|Artifact creation||Are we creating re-useable artifacts?||Number / value of artifacts created|
|Competencies gained||Are team members gaining valuable skillsets?||Number / value of competencies gained|
|Stakeholder satisfaction||Are my project stakeholders satisfied?||Net promoter score; “gut feel” assessment|
|Software metrics||What is the quality of the overall system being developed?||Defect count, defect resolution rate, latency, test coverage|
|Model performance||How are the models performing?||RMSE, F1, recall, precision, ROC, p-value|
Expanding measurement beyond just model performance enables you to more holistically evaluate the progress of a project and its potential impact. By taking measurements frequently (perhaps at each stakeholder review session) you can uncover potential issues, shift the project’s direction accordingly, and if the metrics paint a very grim picture, you might be able to cut your losses early and start another more promising project with newfound wisdom. Moreover, evaluating groups of projects at the program- or portfolio-level can help inform your overall data science organizational strategy.
And at the very least, you’ll make Peter Drucker happy.