Based on my personal experience, below are the 8 key data science team roles to think about when building and leading a data science team.
These are not in any specific order, as their importance might vary from one project to another, or from one organization to another. Furthermore, not all roles are required to be done by different people. So, for a smaller project, one person might fill the role of a data scientist as well as a data engineer.
However, it is helpful to think through tasks as it relates to these roles because thinking of tasks in terms of what role will complete the task will help the team if the team needs to add one or more people (i.e., where is the time being spent, what “type of person” will best help the team).
A table showing each role and its general level of expertise
Data Scientist
Data Scientists find and interpret rich data sources, merge data sources, create visualizations, and use machine learning to build models that aid in creating actionable insight from the data. They know the end-to-end process of data exploration and can present and communicate data insights and findings to a range of team members. In short, they apply the scientific discovery process, including hypothesis testing, to obtain actionable knowledge related to a scientific or business problem.
If you lead a data science team, you need to understand that data scientists might get frustrated if they are managed like software engineers. It’s key to understand the difference between data scientists and software engineers and to manage the data scientists in ways that don’t alienate them into a different role.
Data Engineer
Data engineers make the appropriate data accessible and available for data science efforts. They design, develop, and code data-focused applications that capture data, as well as clean the data. This role also helps to ensure consistency of datasets (e.g., meaning of attributes across datasets).
Data Science Architect
Data science architects design and maintain the architecture of data science applications and facilities. In other words, this role creates and manages relevant data models, data storage systems and processes workflows. In conjunction with the Data Engineer, they manage and merge large amounts of data and their related sources.
Data Science Developer
Data Science Developers design, develop, and code large data (science) analytics applications to support scientific or enterprise/business processes. This role enables models to be deployed (i.e., use a model in production) and requires some expertise in data science, as well as knowledge of how to effectively develop software applications. Sometimes this role is known as a machine learning engineer. Regardless, they help bridge the worlds of data science and software development.
Data Science Product Owner
As explained in Nick’s post, 10 Reasons why you (probably) need a Data Science Product Manager, the product person is the central point of product leadership – the person who decides which features and functionality to build, the order in which to build them, and what aspects of them to observe and analyze. The product owner is responsible for prioritizing what work gets done, ensuring that each work item is clearly defined from a business context, and that the upcoming work and priorities of the team are visible and transparent.
In addition, the product owner must agree that the tasks in the done column are actually done. In short, the product owner represents all the stakeholders for the project. While a product owner is often the product manager, it is possible to have separate these roles, in that the product managers has a more strategic focus on the product’s vision, company objectives, and the market (as compared to product owners which are more tactical and directly involved within the day-to-day data science team by translating a product manager’s strategy into actionable tasks.
Data/Business Analyst
Data/Business Analysts analyze a large variety of data to extract information about system, service, or organization performance and present them in usable/actionable form. They better shape a problem for the data scientist to explore. Note the difference between a data analyst and data scientist.
Process Master
The process master (known as a Scrum Master in Scrum teams) acts as a coach, facilitator, impediment remover as well as helping everyone involved understand and embrace the project values, principles, and practices to aid the organization obtain exceptional results.
Subject Matter Expert
Subject matter experts are people with extensive knowledge of how to apply the analytics within a specific organizational context. This role is accountable to ensure the desired insights are actionable.
Learn More
- Academic paper (external): The ambiguity of data science team roles and the need for a data science workforce framework
- This post is part of the Data Science Team series which includes posts like: