To deliver useful data science projects, it is important to effectively manage the data science team. But what does it mean to manage a data science team? How is it different from managing other teams? These are the questions I’ll explore in this post.
Way back in 2018, an article in the Harvard Business Review noted 4 key concepts to manage a data science team (Build trust and be candid, Connect the work to the business, Design great teams, When to specialize). However, the article didn’t go into any real specifics. While nobody would argue that it is helpful to design a great team, it’s not clear what that means (for example, should the team be centralized or distributed across business groups that could leverage data science insights)? Furthermore, should managers try to recruit specialists within a specific role or generalize (“full stack data scientists”)?
Our understanding of how to manage a data science team has certainly progressed since 2018. In fact, there are many aspects of what one should think about when managing data science teams.
While it is nice to build a team from scratch, that is often not practical, in that there is already a team in place. So, the focus is often on exploring how the current team works and what might be appropriate refinements. Either way, below, I discuss 5 key aspects of managing data science teams, with links for further reading.
1. Skills Needed within the Team
The skills required within a team might vary by project, but in general, include the following skills, typically defined by different roles:
- Data Scientist – finds and interprets rich data sources, merges data sources, creates visualizations, and uses machine learning to build models that aid in creating actionable insight from the data.
- Machine Learning Engineer – designs, develops, and codes large data (science) analytics applications to support scientific or enterprise/business processes.
- Data Engineer – makes the appropriate data accessible and available for data science efforts.
- Data Architect – designs and maintains the data storage needed for the predictive systems.
- Product Owner – the person who decides which features and functionality to build, the order in which to build them, and what aspects of them to observe and analyze.
Note that, in addition to understanding the skills needed by the team, one also has to explore how to structure the team. Specifically, should each team member have a specific role or should each person be more of a generalist, so each team member can do a wider variety to tasks. In other words, for some teams, roles (i.e., they are a data scientist or a ML engineer)are clearly defined for each person. However, on other teams, a person will have different roles on different projects, as well as possibly doing more than one role on each project.
For a deeper dive into team roles, read our post on 8 Key Data Science Team Roles.
2. Data Science Team Structure
Beyond the skills / roles needed within the data science team, for larger organizations, there are many alternative organizational structures. This is not an issue for smaller organizations, where typically there is one team working across the organization. However, for these larger organizations, at a high level, one could have several alternative structures. Example data science team structures include:
- Centralized / Center of Excellence – contains nearly all the organization’s data scientists in a single organizational structure. This group might have multiple teams with multiple managers. This structure typically enables better team resource management, as well as easier sharing of best practices across projects.
- Decentralized – Business Unit–Specific Data Science Teams. These teams are often more responsive to a business need, have a better understanding of the business context and have deeper domain knowledge.
- Centralized Data Science Consultants – provides a hybrid approach, where decentralized teams are augmented and helped by “consultants” from a more central group.
To explore the pros and cons of each of these structures (as well as other hybrid alternatives), check out our centralized vs decentralized post.
3. Data Science Team Collaboration
As noted by Rama Ramakrishnan of MIT, “Hiring talented data scientists is one thing; harnessing their capabilities for the benefit of the organization is another”. In other words, just having the right skills, and a good structure for how to organize the team is not enough.
Data Science Managers and leaders also need to think through how the team will coordinate and communicate within their team as well as with other teams (such as an IT team and stakeholders). To help improve coordination and communication, teams should use an agile coordination framework.
Perhaps the most well-known agile coordination frame is Scrum. Scrum is used extensively and successfully for software development projects. However, it is often not appropriate to use Scrum for data science. In short, data science is different than software development, and hence, a different approach might be more useful. One such new agile framework that works well for data science is Data Driven Scrum.
Independent of which framework a data science team uses, an agile framework enables a team to be flexible and adapt their plans based on feedback from their incremental deliverables. Below are three key characteristics of an Agile Data Science Team:
- Start Simple and Iterate Quickly: The primary output of data science teams are insights. Initial insights can start with static reports or analyses from data exploration.
- Collaborate: Data science is a team sport. Agile data science teams collaborate and frequently communicate among themselves and with the broader stakeholder team.
- Have Flexible Plans: One great match between data science and Agile is that they both emphasize empirical learning whereby you deploy something, measure it, learn from it, and adjust your plans accordingly.
Read our post on agile data science teams to dig deeper into agility and agile data science teams.
4. Leading the Data Science Team
Leading the data science team is, of course, a key aspect of managing the data science team, and in many ways, is similar to leading any other team.
When someone leads a data science team, that person should set the tone for each of the team’s projects. This includes:
- Making sure the team is working on the highest priority projects.
- Ensuring the team works hard (but at a sustainable pace).
- Being effective “people manager” (ex. everyone on the team is growing).
These aspects of leading a data science team are similar to leading other teams. However, there are unique characteristics of data science projects that can introduce unique challenges. So, someone who is used to leading other types of projects needs to understand and consider these unique aspects. Note that one does not need to become the technical expert to be a great data science leader. But it is important to understand the unique aspects of data science projects.
One example of unique challenges that need to be managed is ensuring ethical accountability. In other words, any team leader would (hopefully) want the team to be ethical. However, data science teams have unique challenges with respect to ethics. For example, the potential challenges and risks associated with biased predictive models is a risk most fields do not have to ponder. This is broadly part of achieving Responsible AI.
Rama Ramakrishnan of MIT, also observed that “leaders need to guide data teams by clearly identifying problems and setting metrics to gauge success”. The last dimension expands that concept to explore metrics in general.
5. Measure the Impact
It can be difficult to define metrics when data science teams are working on exploratory, research-type projects. However, having metrics is key to being able to understand and share how a data science team is working effectively.
Below are the varied range of potential metrics that could be used:
- Traditional Project metrics: How is the team performing relative to plan (ex. time, budget, and scope variance to plan)?
- Agile Project metrics: How frequently is the team providing value (ex. velocity, cycle times)?
- Financial metrics: Is the team creating organizational financial value (revenue, payback period, ROI, NPV)?
- Stakeholder satisfaction: What is the satisfaction of the project stakeholders (net promoter score, surveys)
- Traditional Software metrics: What is the quality of the overall system (defect count, defect resolution rate)?
- Model performance: How are the models performing (RMSE, F1, recall, precision, ROC, p-value)?
Read our post on metrics to explore these and other concepts in more depth.