Based on my personal experience, below are the 8 key roles to think about when building and leading a data science team. These are not in any specific order, as their importance might vary from one project to another, or from one organization to another. Furthermore, not all roles are required to be done by different people. So, for a smaller project, one person might fill the role of a data scientist as well as a data engineer.
However, it is helpful to think through tasks as it relates to these roles because thinking of tasks in terms of what role will complete the task will help the team if the team needs to add one or more people (i.e., where is the time being spent, what “type of person” will best help the team).
If you want a quick answer to what are these roles, below is a table that summarizes my thoughts.
Read on if your interesting in my thoughts behind these roles
Data Scientists find and interpret rich data sources, merge data sources, create visualizations, and use machine learning to build models that aid in creating actionable insight from the data. They know the end-to-end process of data exploration and can present and communicate data insights and findings to a range of team members. In short, they apply the scientific discovery process, including hypothesis testing, to obtain actionable knowledge related to a scientific or business problem.
Data engineers make the appropriate data accessible and available for data science efforts. They design, develop, and code data-focused applications that capture data, as well as clean the data. This role also helps to ensure consistency of datasets (e.g., meaning of attributes across datasets).
Data Science Architect
Data science architects design and maintain the architecture of data science applications and facilities. In other words, this role creates and manages relevant data models, data storage systems and processes workflows. In conjunction with the Data Engineer, they manage and merge large amounts of data and their related sources.
Data Science Developer
Data Science Developers design, develop, and code large data (science) analytics applications to support scientific or enterprise/business processes. This role enables models to be deployed (i.e., use a model in production) and requires some expertise in data science, as well as knowledge of how to effectively develop software applications. Sometimes this role is known as an ML engineer.
The product owner is the central point of product leadership – the person who decides which features and functionality to build, the order in which to build them, and what aspects of them to observe and analyze. The product owner is responsible for prioritizing what work gets done, ensuring that each work item is clearly defined from a business context, and that the upcoming work and priorities of the team are visible and transparent. In addition, the product owner must agree that the tasks in the done column are actually done. In short, the product owner represents all the stakeholders for the project. While a product owner is often the product manager, it is possible to have separate these roles, in that the Product managers has a more strategic focus on the product’s vision, company objectives, and the market (as compared to a product owners which are more tactical and directly involved within the day-to-day data science team by translating a product manager’s strategy into actionable tasks.
Data/Business Analysts analyze a large variety of data to extract information about system, service, or organization performance and present them in usable/actionable form. They better shape a problem for the data scientist to explore.
The process master acts as a coach, facilitator, impediment remover as well as helping everyone involved understand and embrace the project values, principles, and practices to aid the organization obtain exceptional results.
Subject Matter Expert
Subject matter experts are people with extensive knowledge of how to apply the analytics within a specific organizational context. This role is accountable to ensure the desired insights are actionable.
Clearly, larger data science projects are a team effort that requires effective coordination among various roles with diverse skillsets. If you would like even more information on this topic, review this paper. In addition, many of these thoughts were captured at a recent talk I gave at an Open Data Science Conference (where I discussed a framework to help guide data science project managers). The folks running ODSC put my talk on YouTube, so if you’re curious take a look.