Data Science Projects are challenging. To increase your odds of success, start them by asking several key data science project questions.
As Ben Franklin said, “an ounce of preparation saves a pound of headaches in your data science projects”.* To help you prepare for a project and evaluate whether to proceed, be sure you can answer the following set of questions.
Given the nuances of each project, this is not an exhaustive list but serves as a baseline that would apply across all sizeable, stakeholder-driven data science projects.
10 Data Science Project Questions
1. What is the business requesting?
Got a fuzzy concept of what is requested? So do a lot of data science project teams at the start. And unless you aggressively figure out their exact requests before the project launch, your entire project might fall short.
As Domino Data Labs observed, “We’ve seen large organizations hire 30+ PhDs without clear business alignment upfront. They then emerge from a six week research hole only to realize they had misunderstood the target variable, rendering the analysis irrelevant.”
2. What does the business need?
Data science’s hype leads some stakeholders to fall for its mysterious allure, assuming that “data science” is the answer when traditional analyses might fulfill their (at least) initial needs. Even if data science is the answer, stakeholders are notorious for not truly understanding what they need.
As Henry Ford supposedly stated, “If I had asked people what they wanted, they would have said faster horses.” Data science’s ambiguous nature further exacerbates this misunderstanding of underlying needs. So don’t take the business requests at face value. Don’t just necessarily build a faster horse. Dive deeper.
3. Who are all of the stakeholders and what are their individual needs?
Your project’s impact likely extends beyond just the requester. Pro-active stakeholder identification and management mitigates the risk of ignoring key stakeholders whose input should be considered and might even stop the project dead in its track if their individual needs are ignored.
Moreover, it also opens the door for further value creation across the organization. For example, a customer churn prediction project likely starts with the customer service or retention department, but you’ll learn a lot along the way that can provide guidance to various groups such as product, strategy, or marketing.
4. Do the stakeholders have clear expectations?
Be sure your stakeholders don’t just think this is another software project. Rather, educate them that them on how data science is different from software. Emphasize that data science journey progresses through experimentation that might not consistently lead to tangible progress.
Moreover, set expectations for touch-points (e.g. stand-ups, regularly scheduled review sessions, ad hoc review sessions, status reports). Some semblance of a highly visible project roadmap typically helps set these expectations but don’t go overboard with a comprehensive detailed plan from start to finish. The plan will need to change—probably sooner than you’d expect.
5. What is the simplest solution that adds value to the stakeholders?
Take a concept from agile. Start small and deliver something of value as quickly as possible. This could be insights based on descriptive statistics, an analysis that establishes the baseline, or a mockup dashboard that helps define the end deliverable.
An early, interim deliverable gives stakeholders the opportunity to provide feedback. If it’s deemed valuable, stakeholders gets something of value quickly, and you know you’re on the right path.
Even a “failed” deliverable adds value to the project by guiding you to short-circuit a path that might be technically impractical or not valued by the business. It’s much better to “fail fast” and learn early in the project lifecycle as opposed to learning the problems after the entire solution has been attempted.
Moreover, a simple solution might surprise you and sufficiently solve the problem.
6. What is the value of this project? How will it be measured?
A clear understanding of project value helps prioritize projects. The intent is not to provide some magical and precise ROI calculation but rather enough information to help determine whether the project should commence, relative to other priorities. Moreover, the value metric definition helps the data scientists to focus on maximizing/minimizing the target variable(s) that are most important.
So before you start, explore how will you measure project success?
7. Why do this project?
Fact, figures, and a clear understanding of all the stakeholders and their needs only goes so far. As explained by Simon Sienk, in “Start with Why,” the neocortex, the only part of the brain capable of processing the “what” information, does not control behavior. Just focusing only on the “what” does not inspire the data science team and project stakeholders to deliver.
Rather, a clear and common vision of the project’s impact and its “why” will better motivate both executive sponsorship and the data science development team’s commitment and focus.
To effectively lead a data science team (or any team for that matter), define an inspiring but realistic project mission statement / war cry / purpose and clearly communicate it.
8. What are the risks?
Risk identification is a fundamental process in any size-able project. Data science projects are no exception and have some unique characteristics that should be explored.
For example, as the public’s scrutiny of algorithmic decision-making increases, the potential negative discriminatory results of a model’s output or (at the least the perception thereof) should be assessed. Brainstorm: “What could go wrong?” from various project perspectives such as technical, market, societal, legal, and security angles. Start with a list of ethics questions.
For each type of potential issue (e.g., bias in the model, ethical use of data), ensure it’s clear who is accountable within the team – in terms of making sure the team thinks through these potential issues during the project.
9. What people and resources are needed?
Start with the obvious. Who do you need to develop the solution and ask them roughly how much time they’ll need to complete the project? Tie down a decent estimate for the initial deliverable and have a general “t-shirt” sizing for the broader project scope.
Then, think broadly. What data sources will you need? If they exist internally, where are they? If they exist externally, can you purchase them? Alternatively, can you start collecting the data? What security / firewall requests will you need? Computing resources? Systems integrations?
Bring together IT, the business, and the data science project team together upfront to avoid a disjointed approach where these potentially critical requests are realized later in the project and block progress.
10. What other questions should be answered?
The meta-question. While the other nine are generally necessary before an effective launch to a project, they are not sufficient. Ask yourself, your team, and your stakeholders some variant of: “What other key questions do we need to answer before committing to this proposed project?”
*OK so Ben stated “an ounce of prevention is worth a pound of cure” but I’d like to think that if he were alive today, he would be an outstanding data scientist who would state something like this.