Team

How to Lead Data Science Teams

leading data science teams

It’s an understatement that great leadership is challenging and rare. And leading data science teams has unique challenges:

  • Stakeholders might get disillusioned by your team’s inability to deliver magic
  • The battle to recruit and retain data science talent is fierce
  • Data science’s ethical dilemmas are particularly perplexing
  • There is not an agreed-upon process for managing data science teams
  • Few people possess a solid mix of the technical chops and softer leadership skillsets
  • …and the list goes on

leadership is not about titles
Source: wellquo.com

While this topic might seem most relevant to management and executives, remember that leadership is not a title. Rather, I encourage everyone reading this post to assess these points and to upskill their leadership capabilities. 

So whether you’re a data science manager, a student, a tenured individual data scientist, or a business person branching out into this nascent field, I wish you the best on your journey to becoming a stronger and more fulfilled leader. Here are eight tips to get you started or to keep you going…

8 Tips to Leading Data Science Teams

1. Start With Why

How often do we jump right into “what” needs to be done and “how” to do it without understanding the deeper and more critical question of “why” we should do it in the first place? Yet, as Simon Sinek explains a clear and meaningful “why” drives action — not the “what or the “how”.

Simon explaining the importance of Why

As such, great leaders inspire their teams with a meaningful purpose to rally around. And when kicking off a new project, the leader should dive deeper and make sure any project they undertake has a “project why” that is consistent with the team’s motivating purpose.

This takes effort. But the investment in a clear “why” yields dividends for your team in so many aspects from higher productivity, better staff retention, and ultimately to clearer analyses and results.

2. Engage Stakeholders

At the end of the day, teams need to deliver value to a set of stakeholders. The most effective leaders will:

  1. Identify their stakeholders (which usually extends beyond just the obvious project requester)
  2. Listen to their requests
  3. Identify their needs (which is often different from their request…see the meme below)
  4. …And learn how to best engage stakeholders throughout the data science life cycle
What the customer really needed
Effective leaders identify the need

Don’t leave stakeholders in the dark or mistake their “requests” as their “needs”. Rather, dig deeper. Actively engage them and uncover their needs by leveraging agile principles such as satisfying the customer “through early and continuous delivery” and having “business people and developers […] work together”. Which leads us to the next point…

3. Implement Effective Processes

This does not necessarily mean to implement a specific framework such as Scrum but rather that you lead your team to:

  • Educate your team on the “why” behind good processes
  • Discover an effective process that fits the team and its work’s unique needs
  • Foster a culture of continuous process improvement

Sounds obvious, right? And yet, in Jeff’s surveys, when asked about their process, about 80% of data scientists say they “just kind of do it” which leads to missed opportunities that can increase risk, reduce productivity, and degrade the quality of insights generated. Data scientists seem to acknowledge this issue because (in the same surveys), 85% of people responded that they would benefit from a more defined process. 

Which project process should you implement? Jeff provides some general answers to help answer that in his 3 Steps to Define an Effective Data Science Process post. But the specific answer is highly dependent on your team and the type of problems you are solving.

Of course, selecting a team process is just half the battle, the other half is making sure your team is properly trained in using the process within a data science context. 

If needed, the Data Science Process Alliance can you define an appropriate team process and then help train your team

4. Build the Right Data Science Team

Like any good team, a data science team needs to have the right people to get the job done. And just like your team process, your team composition is dependent on the organizational structure, the company culture, and the type of problem you’re trying to solve.

If you’re new to the field, avoid the common misconception that a fully-functional data science team just has a bunch of data scientists. Rather, it has all the needed roles to deliver a solution. In some circumstances, this might indeed be heavily data scientist-focused. But probably you need a diverse set of roles including business analysts, data architects, data engineers, machine learning engineers, a project manager, product manager, and of course data scientists.

To lead a data science team, you need to understand these required roles, how to attract and retain the right talent, and how to further develop the individual team members. Technical skillsets are obviously key. Equally important are the softer skills that team members need to become effective contributors.

5. Build a Data Science-Specific Culture

On the surface, this is another no-brainer. And yet, data science teams are often misunderstood as software teams. While these fields indeed overlap significantly, the data scientist has a distinct mindset from that of a typical software engineer. Just a few differences:

AreaData ScientistsSoftware Engineers
DriveDiscovery and explorationImplementing a solution
Ambiguity“That’s fine. It’s my job to sort through the noise.”“I need clear requirements before starting.”
Key skill setsMath, stats, and some codingBuilding production systems
A few differences between data scientists and software engineers

As such, managing data scientists as software engineers will likely leave them feeling misunderstood, and shoehorning their projects as software projects will likely lead to frustrating non-productive planning exercises that can siphon time and energy away from the team. Rather, build a culture where data scientists can be at their best.

6. Focus on the Long Term

It’s easy to focus on generating an interesting machine learning model. That’s what data science is all about, right? Well…while the model is a key and necessary part of the overall data science process, the model by itself is usually not sufficient to deliver value.

machine learning code
A production system has much more than just ML code

Rather, to deliver sustainable value, predictive models usually should be put into sustainable and stable systems that the stakeholder can access. Or as stated more boldly by Luigi:

“No machine learning model is valuable unless it’s deployed into production”

Luigi from MLinProudction.com

To ensure your team’s work delivers on-going value, you’ll have to balance what might seem like a never-ending firehose of stakeholders requests with the need to dedicate the time necessary to build production systems that check incoming data, provide alerts if data is missing or out of acceptable ranges, and deliver accuracy metrics that allow the data scientists to monitor and tune the models when needed.

The stakeholders might not understand this value but it’s your responsibility to educate them. Additionally, allocate development time for full-fledged systems production (or work with the team that is responsible for this). When needed, push back and say no to new incoming requests to allow time to clean up any unnecessarily accrued technical debt. You’ll thank yourself later.

7. Integrate Ethics into Everything

Do you know if all of your team’s practices and your projects are ethical? Well, that’s a tough question to digest!

Indeed, business and research ethics is always a tricky subject. And in new and rapidly evolving fields, this subject gets even messier. Jeff outlines a good set of considerations in his 10 Questions Data Science Ethics Questions post. With data misuse and unfair model output cropping up so often in the news, it’s not surprising that this has been in the most viewed post on our site in the last six months.

As a start, ensure that your teams’ and project outcomes’ are compliant with industry-relevant laws. But go beyond this to protect people’s privacy, minimize/remove unfair bias results against certain population segments, be keenly aware of how your work impacts the broader society, and mitigate any potential adverse outcomes. Your assessments could literally mean life and death.

ethical trade-offs
Tough ethical trade-offs. Screenshot from moralmachine.net

8. Know Where to Learn More

So what else do you need to do to lead data science teams? There’s no exhaustive list but perhaps the best advice I can give is to know where you can dive deeper into the above-mentioned topics and into the numerous issues I didn’t mention. Here are a few places to turn:

Data Science Process Alliance: Given the requests we’ve had for training, Jeff and I are kickstarting the Data Science Process Alliance which provides individual training and corporate consulting to better lead data science projects and teams.

45 Minute Conference Session: Jeff gave a speech on managing data science teams at the 2019 Open Data Science Conference. He hits on a lot of this post’s topics (or rather I hit on a lot of his speech’s topics).

Jeff talking at the Open Data Science Conference

The rest of this website: This post is part of the Team Management series which includes posts where you can:

Parting Thoughts

The path to effective data science leadership is long and arduous. But the rewards are worth it. Best of luck on your journey to lead data science teams. Feel free to contact Jeff and me to let us know how your journey is going.

Become a Data Science Leader

Master the skills and gain the confidence to deliver data science projects and to lead data teams. Grow with the Data Science Process Alliance’s consulting and certification programs.

Leave a Reply

Your email address will not be published. Required fields are marked *