The Data Science User Story

To drive effective data science outcomes, data scientists and stakeholders need to jointly understand the deliverable requests and their value. And yet, communication breaks down whenever data science team members and stakeholders interpret the project scope differently. To help bridge this gap, teams can write many of the project deliverables as data science user stories. These concise statements help everyone in a project to understand who will use the data science deliverable, what they will get, and why this is of value.

To put this into practice, we’ll cover:

  • 5 common communication gaps
  • Data science user stories
  • Refining data science user stories
  • Data science hypothesis stories
  • Recap and where to learn more

5 Common Communication Gaps

Any experienced professional who has worked in or has been a stakeholder in data science projects has likely encountered most (or all) of these issues as the collective team attempts to gain data science business understanding.

1. AI just to do AI: The stakeholder vaguely assumes artificial intelligence is the solution, without a clear justification.

data science business understanding challenge

2. Unclear Need: The customer doesn’t know what they need. They generally have a fuzzy idea that some aspect of their business or life could be improved but they don’t understand how.

data science business understanding challenge

3. Old Requirements: The project requirements were defined long in advance. But things have changed. Yet, the team proceeds to deliver per these outdated needs.

data science business understanding challenge
data science business understanding challenge

4. Poor Documentation: The business request is poorly documented. As information is passed verbally between project members, the result deviates from its original intent.

data science business understanding challenge

5. Jargon: The stakeholders speak in business jargon. Data scientists speak in algorithms. Neither party can understand each other.

data science business understanding challenge

Bridging the Gap: Data Science User Stories

Admittedly, there’s more going on in the above communication gaps than data science user stories can solve. However, your data science team can mitigate at least some of these issues by defining individual project deliverables in the user story format.

Background: What is a User Story?

Popular among agile software teams, the user story provides clarity to both the requester and the development team. It represents a request in a clear and simple format. Most commonly, it is a three-line statement:

As a <user>
I would like <a deliverable>
So that I can <accomplish something>

The lines specify:

  1. The User (who): Identify the individual or group who will interact with the data science solution. This helps the team to tailor their solution according to the target persona.
  2. The Deliverable (what): Clearly articulate the deliverable such as an output file, an API, a model, or a dashboard. This provides context to help understand what should be delivered.
  3. The Outcome (why): Describe the desired outcome or benefit the identified user expects from the specific deliverable. Ideally, it is specific, measurable, and aligned to a broader strategic objective. It should answer the question: What can the user do with the deliverable that they were previously unable to accomplish?

Example: Our Story to Understand Stories

Naturally, the best way to explain data science user stories is through a story. In this story, you are an AI Product Manager, and you would like to launch a successful new product that helps solve many of the Retention Department’s needs. Previous conversations have been fuzzy. So this time, you introduce this concept of a user story. You and the Retention Director meet to define three data science user stories.

As a Retention Campaign Manager
I would like a list of users who are likely to churn
So that I know who should get a targeted retention offer

As a Call Center Agent who is attempting to retain a pre-identified at-risk customer
I would like to know the reason why that customer is likely to churn
So that I can pitch the appropriate retention offer that is most likely to retain that customer

As the Retention Director
I would like a dashboard that tracks the performance of the retention campaign
So that I can understand the rates at we are retaining at-risk customers against the status quo

In short – No jargon. Nothing complicated. Just plain and simple text that everyone in the project team can understand.

Refining the Data Science User Story

Our three prior stories are not yet actionable. There are a lot of key details missing. Instead of scoping everything, you focus on just the first story because it unlocks the capabilities of the following two.

Second Iteration of the First Data Science User Story

So you facilitate a follow-up conversation with the Retention Director and a data scientist. The data scientist asks:

  • The model won’t be perfect. So of those it predicts will churn, some would actually stay. How good do the estimates have to be for those who the model predicts to churn?
  • What’s the minimum number of predictions you need in your list for this campaign to be worth implementing?
  • Do you need these predictions updated frequently? At what cadence?

Note the data scientist avoids using the terms precision and recall in the first two questions. Rather, he does a good job in using common language.

Based on the feedback, you refine the story to be more specific:

As the Retention Director
I would like to know in real-time the customers with at least a 50% of churning* in the next 90 days
So that I know who should get a targeted retention offer

*a person has churned if they voluntarily unsubscribe from all of the company’s services and do not re-activate for at least 72 hours

Third Iteration of the First Data Science User Story

It looks clearer. But your team doesn’t understand what exactly it will deliver. The broader team asks: “Are we building an API? Is it a spreadsheet? Scribbles on a napkin? Does the list somehow go into their customer relationship management system?”

Acceptance Criteria

To answer these questions, you and the Retention Director meet with a MarTech analyst and a DevOps engineer. Together you further refine the story. To avoid adding a lot of clauses that look messy, you break out the details into a set of acceptance criteria that accompany the user story. These acceptance criteria serve as a checklist that the deliverable needs to adhere to.

During the conversation, DevOps points out that the request would require an API and a lot of surrounding infrastructure to ship scores in real time. MarTech also expressed concerns that they are fully dedicated to another project and would not be able to provide integrations into the CRM.

Data Science MVP

To overcome this issue, you introduce the concept of a Data Science Minimum Viable Product (MVP). Through this, you explain that your team will focus on an initial end-to-end deliverable that still provides value to Retention. However, this MVP lacks several non-critical but useful functionality such as automated integrations and real-time predictions. You explain that these could come later based on feedback from the MVP. After some conversation, the Retention Director agrees to the smaller scope of the MVP definition. Specifically, the MVP delivers a daily .csv instead of a real-time system. Using the same user story format, you define the MVP as follows:

MVP User Story:

As a Retention Campaign Manager

I would like a daily list of at least 100 customers who have at least a 50% chance of churning in the next 90 days

So that I know who should get a targeted retention offer

Acceptance Criteria:

  • Scoring criteria:
    • all active customers are scored by the model
    • add all active customers to the list
    • churn definition – a person has churned if they voluntarily unsubscribe from all of the company’s services and do not re-activate for at least 72 hours
  • File generation
    • .csv format
    • generated every morning (including holidays and weekends) before 8am NYC time
    • file is generated if at least one customer is identified to be “at risk”
    • file is uploaded to the “Retention –> Churn Prediction” folder in Amazon S3
    • file name should be retention_list_YY_MM_DD whereby YY represents two-digit year, MM represents two-digit month, and DD represents the two-digit day
  • File format has three columns:
    • userID – a 15-digit integer that cannot start with a 0
    • churn_probability_90day – a real number to 3-decimal points, ranging from 0 to 1, inclusive
    • churn_flag – a Boolean field set to True if and only if the value of the prior line is greater than or equal to the probability threshold (this threshold is initially .500 but might vary in the future). Else the field is False

The Hypothesis Story

With a solid definition of the MVP, you run back to the full data science team. The data scientists push back saying that they don’t know whether they can even deliver the MVP model. So they ask for an opportunity to first vet out the ability to produce a proof-of-concept model. This model is not intended for use on actual customers but rather as a low-risk and quick model as an initial step.

They prefer to use a different format to define their work statement. Enter the hypothesis story.

Background: What is a Hypothesis Story?

The hypothesis story is another type of short statement that describes work. However, it is worded differently to better align with the experimental nature of data science. Some Scrum teams use this format occasionally. Most Data Driven Scrum teams use this format often. There are many different formats. The most basic format is:

<State the question>
<What will be done with the answer>

Using this format, you come to:

What is the likelihood of each active customer churning in the next 90 days?
We would like to build a proof-of-concept model to assess whether we can successfully predict churn for the Retention Department

Defining Our First Hypothesis Story

You and the data science team then decide to refine the initial hypothesis story. After some back-and-forth, you agree to proceed with the following as the initial body of work in the project. This doesn’t deliver the full MVP scope but the proof-of-concept model is a step in that direction.

Hypothesis Story

User: As a Data Science Team

Need: We want to predict customer churn accurately

Outcome: So that we can implement effective retention strategies and reduce customer attrition

H₀: There is no significant correlation between customer churn and demographic factors such as age, gender, and location.

H₁: There is a significant correlation between customer churn and demographic factors such as age, gender, and location.

Why Do this Test:

By conducting the above hypothesis tests, the data science team can gain insights into the factors that drive customer churn. The findings will enable the development of effective retention strategies tailored to different customer segments, engagement levels, and customer support interactions. Ultimately, this predictive approach will help reduce customer attrition, improve customer satisfaction, and drive business growth.

Acceptance Criteria:

  1. Attempt to develop a statistically significant model that use demographic factors and customer attributes that predicts customer churn
  2. This investigation is complete when the team presents findings internally and document them in the team proof-of-concept wiki page.
  3. Timebox the analysis and documentation to up to 80 hours of collective effort.

The team then sub-tasks this body of work so that they smaller concrete tasks that they can coordinate using a Kanban board.

Conclusion

Recap

Effective communication between data science teams and stakeholders is crucial for project success. But there are often many gaps in this communication. An effective approach to help bridge these gaps is by defining project scope in clear and concise stories. The data science user story tends to work better for user-facing deliverables while the hypothesis story tends to work better for coordinating experiments internal to the data science team. However, you could use these formats somewhat interchangeably.

Moreover, there are plenty of formats that a team could use to define its work. Sticking to a pre-defined story template isn’t what is important. Rather, define the units of work in whatever manner is most conducive to facilitate conversation and understanding of the deliverable and its value proposition.

Regardless of the formats and scoping exercises you use, be sure that prior to the start of the project, you:

  1. Clearly define the overall project purpose
  2. Define the initial usable end-to-end deliverable as a data science MVP
  3. Define initial data science user stories and hypothesis stories to describe the work that rolls up to the MVP

Learn More

  • Managing all of this is a skilled art for the Data Science Product Manager. To better understand these concepts and to empower yourself as a product manager, enroll in a Data Science Team Lead Plus course.
  • Mike Cohn is a thought leader for user stories. Watch his 1-hour conference talk on writing user stories.
  • Stories are a part of the broader umbrella concept of Agile. To learn about what Agile means for data science, read the Agile data science post.
  • This post mentions several practical tips in starting a project. Consider asking 10 questions before starting a data science project.