It’s an understatement that great leadership is challenging and rare. And leading data science teams has unique challenges:
- Stakeholders might get disillusioned by your team’s inability to deliver magic
- The battle to recruit and retain data science talent is fierce
- Data science’s ethical dilemmas are particularly perplexing
- There is not an agreed-upon process for managing data science teams
- Few people possess a solid mix of the technical chops and softer leadership skillsets
- …and the list goes on
Source: wellquo.com
While this topic might seem most relevant to management and executives, remember that leadership is not a title. Rather, I encourage everyone reading this post to assess these points and to upskill their leadership capabilities.
So whether you’re a data science manager, a student, a tenured individual data scientist, or a business person branching out into this nascent field, I wish you the best on your journey to becoming a stronger and more fulfilled leader. Here are eight tips to get you started or to keep you going…
8 Tips to Leading Data Science Teams
1. Start With Why
How often do we jump right into “what” needs to be done and “how” to do it without understanding the deeper and more critical question of “why” we should do it in the first place? Yet, as Simon Sinek explains a clear and meaningful “why” drives action — not the “what or the “how”.
Simon explaining the importance of Why
As such, great leaders inspire their teams with a meaningful purpose to rally around. And when kicking off a new project, the leader should dive deeper and make sure any project they undertake has a “project why” that is consistent with the team’s motivating purpose.
This takes effort. But the investment in a clear “why” yields dividends for your team in so many aspects from higher productivity, better staff retention, and ultimately to clearer analyses and results.
2. Engage Stakeholders
At the end of the day, teams need to deliver value to a set of stakeholders. The most effective leaders will:
- Identify their stakeholders (which usually extends beyond just the obvious project requester)
- Listen to their requests
- Identify their needs (which is often different from their request…see the meme below)
- …And learn how to best engage stakeholders throughout the data science life cycle
Effective leaders identify the need
Don’t leave stakeholders in the dark or mistake their “requests” as their “needs”. Rather, dig deeper. Actively engage them and uncover their needs by leveraging agile principles such as satisfying the customer “through early and continuous delivery” and having “business people and developers […] work together”. Which leads us to the next point…
3. Implement Effective Processes
This does not necessarily mean to implement a specific framework such as Scrum but rather that you lead your team to:
- Educate your team on the “why” behind good processes
- Discover an effective process that fits the team and its work’s unique needs
- Foster a culture of continuous process improvement
Sounds obvious, right? And yet, in Jeff’s surveys, when asked about their process, about 80% of data scientists say they “just kind of do it” which leads to missed opportunities that can increase risk, reduce productivity, and degrade the quality of insights generated. Data scientists seem to acknowledge this issue because (in the same surveys), 85% of people responded that they would benefit from a more defined data science process.
Which project process should you implement? Jeff provides some general answers to help answer that in his process post. But the specific answer is highly dependent on your team and the type of problems you are solving.
Of course, selecting a team process is just half the battle, the other half is making sure your team is properly trained in using the process within a data science context.
4. Build the Right Data Science Team
Like any good team, a data science team needs to have the right people to get the job done. And just like your team process, your team composition is dependent on the organizational structure, the company culture, and the type of problem you’re trying to solve.
If you’re new to the field, avoid the common misconception that a fully-functional data science team just has a bunch of data scientists. Rather, it has all the needed roles to deliver a solution. In some circumstances, this might indeed be heavily data scientist-focused. But probably you need a diverse set of roles including business analysts, data architects, data engineers, machine learning engineers, a project manager, product manager, and of course data scientists.
To lead a data science team, you need to understand these required roles, how to attract and retain the right talent, and how to further develop the individual team members. Technical skillsets are obviously key. Equally important are the softer skills that team members need to become effective contributors.
5. Build a Data Science-Specific Culture
On the surface, this is another no-brainer. And yet, data science teams are often misunderstood as software teams. While these fields indeed overlap significantly, the data scientist has a distinct mindset from that of a typical software engineer. Just a few differences:
Area | Data Scientists | Software Engineers |
---|---|---|
Drive | Discovery and exploration | Implementing a solution |
Ambiguity | “That’s fine. It’s my job to sort through the noise.” | “I need clear requirements before starting.” |
Key skill sets | Math, stats, and some coding | Building production systems |
As such, managing data scientists as software engineers will likely leave them feeling misunderstood, and shoehorning their projects as software projects will likely lead to frustrating non-productive planning exercises that can siphon time and energy away from the team. Rather, build a culture where data scientists can be at their best.
6. Focus on the Long Term
It’s easy to focus on generating an interesting machine learning model. That’s what data science is all about, right? Well…while the model is a key and necessary part of the overall data science process, the model by itself is usually not sufficient to deliver value.
A production system has much more than just ML code
Rather, to deliver sustainable value, predictive models usually should be put into sustainable and stable systems that the stakeholder can access. Or as stated more boldly by Luigi:
“No machine learning model is valuable unless it’s deployed into production”
Luigi from MLinProudction.com
To ensure your team’s work delivers on-going value, you’ll have to balance what might seem like a never-ending firehose of stakeholders requests with the need to dedicate the time necessary to build production systems that check incoming data, provide alerts if data is missing or out of acceptable ranges, and deliver accuracy metrics that allow the data scientists to monitor and tune the models when needed.
The stakeholders might not understand this value but it’s your responsibility to educate them. Additionally, allocate development time for full-fledged systems production (or work with the team that is responsible for this). When needed, push back and say no to new incoming requests to allow time to clean up any unnecessarily accrued technical debt. You’ll thank yourself later.
7. Integrate Ethics into Everything
Do you know if all of your team’s practices and your projects are ethical? Well, that’s a tough question to digest!
Indeed, business and research ethics is always a tricky subject. And in new and rapidly evolving fields, this subject gets even messier. Jeff outlines a good set of considerations in his 10 Questions Data Science Ethics Questions post. With data misuse and unfair model output cropping up so often in the news, it’s not surprising that this has been in the most viewed post on our site in the last six months.
As a start, ensure that your teams’ and project outcomes’ are compliant with industry-relevant laws. But go beyond this to protect people’s privacy, minimize/remove unfair bias results against certain population segments, be keenly aware of how your work impacts the broader society, and mitigate any potential adverse outcomes. Your assessments could literally mean life and death.
Tough ethical trade-offs. Screenshot from moralmachine.net
8. Know Where to Learn More
So what else do you need to do to lead data science teams? There’s no exhaustive list but perhaps the best advice I can give is to know where you can dive deeper into the above-mentioned topics and into the numerous issues I didn’t mention. Here are a few places to turn:
45 Minute Conference Session: Jeff gave a speech on managing data science teams at the 2019 Open Data Science Conference. He hits on a lot of this post’s topics (or rather I hit on a lot of his speech’s topics).
Jeff talking at the Open Data Science Conference
The rest of this website: This post is part of the Team Management series which includes posts where you can:
- Learn about the 8 Key Roles for Data Science Team
- Understand the difference between Data Science and Software Engineering
- Assess 10 Ethical Questions for data science
- Know Why you (probably) Need a Product Manager
- Explore how to apply CRISP-DM for Teams
- Get 5 Tips for Remote Data Science Teams
- Review Lessons from 20 Data Science Teams
- Know the pros and cons of Centralized vs De-centralized Teams
- Ensure you get the difference between Data Science and Software Engineering Teams