This post will explore managing analytics projects by focusing on three questions:
- What is the difference between a data scientist and a data analyst?
- What is the difference between analytics projects and data science projects?
- What project management / project coordination framework might be helpful to manage analytics projects?
1. What is the Difference between a Data Scientist and a Data Analyst?
Data scientists and data analysts are specific roles, and a project often requires both roles. For greater detail, read the data scientist vs data analyst post but at a high level…
Data Analysts
In short, data analysts help to:
- Determine what is the business problem to be addressed
- What data is needed
- What is possible
- How to present the findings to the client.
In other words, this role helps to better shape a problem for the data scientist to explore.
Data Scientists
Meanwhile, data scientists:
- Find and interpret rich data sources
- Merge data sources
- Create visualizations
- Use machine learning to build models that aid in creating actionable insight from the data.
They know the end-to-end process of data exploration and can present and communicate data insights and findings to a range of team members. In short, they apply the scientific discovery process, including hypothesis testing, to obtain actionable knowledge related to a scientific or business problem.
For more information on roles within a data science team, see our blog post on 8 roles in a data science team.
2. What is the Difference between Analytics and Data Science Projects?
Next, we can now explore the difference between analytics projects and data science projects.
It is important to note that there is no widely accepted explanation of the difference, or even if there is a difference. In fact, many people use the two phrases interchangeably.
For example, a paper on Managing Business Analytics Projects by Viaene and Van den Bunder, in MIT’s Sloan management review back in 2011, notes that “Business analytics projects are often characterized by uncertain or changing requirements — and a high implementation risk. So it takes a special breed of project manager to execute and deliver them”. This seems very similar to how many would describe data science projects.
However, others view data analytics as being more focused, where the project typically has specific questions that were asked by a stakeholder. In addition, a data analytics project typically explores structured data. Data science projects, on the other hand, are typically more open-ended, where it is often not clear on the question to answer, what might be useful, and what could be predicted via a machine learning model.
Hence, the data science project team is often expected to identify interesting questions that might help an organization (“find value in the data”). Data Science projects often leverage both structured and unstructured data. The below Venn diagram helps to show how a data science effort compares to a data analytics effort, in terms of skills required for the project.
It’s interesting to note that many blog posts on this topic are from universities (Emma’s is from the University of Wisconsin). This is driven by the fact that there are both analytics programs and data science programs, and students want to understand the difference (or if there is a difference). In other words, universities are trying to help prospective students better understand their program offerings (i.e., data science vs data analytics programs).
As the Venn diagram shows, a data analytics project might include many of the same activities as a data science project. Furthermore, in the real world, this distinction often becomes blurred.
Data Analytics vs Data Science Projects
The Streaming Video Provider Example
If we consider a company like Netflix, it’s easy to imagine that they have lots of structured data collected from customers, such as each person’s viewing history. A data analytics project might be focused on exploring the popularity of their different programs. This insight could then be presented to stakeholders, showing which programs are not popular, and hence, providing a recommendation (and justification) on which programs should be canceled.
A data science project, on the other hand, might build a machine learning algorithm that improves the recommendations to better suggest other programs a customer might enjoy.
But, in the real world, this situation might become more nuanced. It might be that a machine learning model could help answer the question relating to canceling shows. A full data science project might explore new data sources (such as social media natural language posts) that could help understand the show’s popularity and the future value of those viewers.
There could also be a middle scenario, when a project exploring which shows to cancel explores the use of a machine learning predictive model to estimate the future value of people who watch a show (i.e., do they become high-value customers in the future). In this situation, one could view the project as partway between an analytics project and a data science project.
A Retail Store Example
In a different example, consider a retail organization analyzing sales. The data would be structured transaction data (sales date, product, customer info, etc). This analysis could be used, for example, to determine which products to cancel or for inventory management. This might be considered a data analytics project, where the goal is to understand trends in sales (perhaps by location of customer segment).
On the other hand, a data science project might explore what do people often buy at the same time (e.g., diapers and beer), which can help where products are placed in a retail store or how products are advertised.
However, the analytics project could be expanded to use additional data sources (e.g., social media mentions of the product, weather during the year) to help better predict the sales (and hence, the inventory required for each product).
Insights from these Examples
As these two examples help to show, while at the extreme, one might be able to clearly describe the difference between an analytics project and a data science project. However, for many real-world situations, a data-focused project might have some aspects of an analytics project and a data science project.
3. What Project Management Framework might help manage Analytics Projects?
So, now that we have a better understanding of data analytics projects, and how they relate to data science projects, we can better assess which project management frameworks for data analytic projects.
When managing a data analytics project, one key aspect to focus on is ensuring that the team can effectively collaborate and communicate (internally and with external stakeholders). This can be facilitated by using a framework that supports the key characteristics of an analytical project, such as the need to do exploratory analysis.
Furthermore, due to the fact that there is often not a clear line between a data analytics project and a data science project, the key insights shared on our website with respect to data science project management, are very relevant and useful for managing data analytics projects. In other words, even if a project is a ‘traditional analytics project’, I would use a framework that can support future project / client needs, where more data science-type efforts might be required.
Learn More
Jump in and learn more through these articles:
- Data Science Workflows
- Coordination Frameworks
- Agile Data Science
- Traditional Data Science Approaches