Uncategorized

10 Questions to Ask Before Starting a Data Science Project

Questions to ask for Data Science Projects

Data science, like almost any other meaningful endeavor, begins with a mission to solve a stated or implied need. Yet, how often do we commit to a project and dive straight into data analysis without actually understanding how we can truly add value?

Rather, as Ben Franklin said, “an ounce of preparation saves a pound of headaches in your data science projects”.* To help you prepare for a project and evaluate whether to proceed, be sure you can answer the following set of questions. Given the nuances of each project, this is not an exhaustive list but serves as a baseline that would apply across all sizeable, stakeholder-driven data science projects.

What is the business requesting?

Got a fuzzy concept of what is requested? So do a lot of data science project teams at the start. And unless you aggressively figure out their exact requests before the project launch, your entire project might fall short. As Domino Data Labs observed, “We’ve seen large organizations hire 30+ PhDs without clear business alignment upfront. They then emerge from a six week research hole only to realize they had misunderstood the target variable, rendering the analysis irrelevant.”

What does the business need?

Data science’s hype leads some stakeholders to fall for its mysterious allure, assuming that “data science” is the answer when traditional analyses might fulfill their (at least) initial needs. Even if data science is the answer, stakeholders are notorious for not truly understanding what they need. As Henry Ford supposedly stated “If I had asked people what they wanted, they would have said faster horses.” Data science’s ambiguous nature further exacerbates this misunderstanding of underlying needs. So don’t take the business requests at face value. Don’t just necessarily build a faster horse. Dive deeper.

Who are all of the stakeholders and what are their individual needs?

Your project’s impact likely extends beyond just the requester. Pro-active stakeholder identification and management mitigates the risk of ignoring key stakeholders whose input should be considered and might even stop the project dead in its track if their individual needs are ignored. Moreover, it also opens the door for further value creation across the organization. For example, a customer churn prediction project likely starts with the customer service or retention department, but you’ll learn a lot along the way that can provide guidance to various groups such as product, strategy, or marketing.

Do the stakeholders have clear expectations?

Be sure your stakeholders don’t falsely assume this is a software project. Educate them that the data science journey progresses through experimentation that might not consistently lead to tangible progress. Moreover, set expectations for touch-points (e.g. stand-ups, regularly scheduled review sessions, ad hoc review sessions, status reports). Some semblance of a highly visible project plan typically helps set these expectations but don’t go overboard with a comprehensive detailed plan from start to finish. The plan will need to change—probably sooner than you’d expect.

What is the simplest solution that adds value to the stakeholders?

Start small and deliver something of value as quickly as possible. This could be insights based on descriptive statistics, an analysis that establishes the baseline, or a mockup dashboard that helps define the end deliverable. An early, interim deliverable gives stakeholders the opportunity to provide feedback. If it’s deemed valuable, stakeholders gets something of value quickly, and you know you’re on the right path. Even a “failed” deliverable adds value to the project by guiding you to short-circuit a path that might be technically impractical or not valued by the business. It’s much better to “fail fast” and learn early in the project lifecycle as opposed to learning the problems after the entire solution has been attempted. Moreover, a simple solution might surprise you and sufficiently solve the problem.

What is the value of this project? How will it be measured?

A clear understanding of project value helps prioritize projects. The intent is not to provide some magical and precise ROI calculation but rather enough information to help determine whether the project should commence, relative to other priorities. Moreover, the value metric definition helps the data scientists to focus on maximizing/minimizing the target variable(s) that are most important.  

Why do this project?

Fact, figures, and a clear understanding of all the stakeholders and their needs only goes so far. As explained by Simon Sienk, in “Start with Why,” the neocortex, the only part of the brain capable of processing the “what” information, does not control behavior. As such, focusing only on the “what” does not inspire the data science team and project stakeholders to deliver. Rather, a clear and common vision of the project’s impact and its “why” will better motivate both executive sponsorship and the data science development team’s commitment and focus. Define an inspiring but realistic project mission statement / war cry / purpose and clearly communicate it.

What are the risks?

Risk identification is a fundamental process in any size-able project. Data science projects are no exception and have some unique characteristics that should be explored. For example, as the public’s scrutiny of algorithmic decision-making increases, the potential negative discriminatory results of a model’s output or (at the least the perception thereof) should be assessed. Brainstorm: “What could go wrong?” from various project perspectives such as technical, market, societal, legal, and security angles. For each type of potential issue (e.g., bias in the model, ethical use of data), ensure it’s clear who is accountable within the team – in terms of making sure the team thinks through these potential issues during the project.

What people and resources are needed?

Start with the obvious. Who do you need to develop the solution and ask them roughly how much time they’ll need to complete the project? Tie down a decent estimate for the initial deliverable and have a general “t-shirt” sizing for the broader project scope. Then, think broadly. What data sources will you need? If they exist internally, where are they? If they exist externally, can you purchase them? Alternatively, can you start collecting the data? What security / firewall requests will you need? Computing resources? Systems integrations? Bring together IT, the business, and the data science project team together upfront to avoid a disjointed approach where these potentially critical requests are realized later in the project and block progress.

What other questions should be answered?

The meta-question. While the other nine are generally necessary before an effective launch to a project, they are not sufficient. Ask yourself, your team, and your stakeholders some variant of: “What other key questions do we need to answer before committing to this proposed project?”

*OK so Ben stated “an ounce of prevention is worth a pound of cure” but I’d like to think that if he were alive today, he would be an outstanding data scientist who would state something like this.

Leave a Reply

Your email address will not be published. Required fields are marked *