Traditional software approaches favor developing software layer-by-layer (horizontal slicing) while software agilists strive to deliver software by thin end-to-end value streams (vertical slicing).
…but what makes sense for data science?
Consider a churn project…
Imagine that you are tasked to pro-actively minimize customer churn at a telecom company. The retention department has requested the following three deliverables:
- The likelihood of each customer voluntarily disconnecting service
- The likelihood of each customer being forced to disconnect due to non-payment
- The likelihood of each at-risk customer accepting each of five retention package offers
…We could slice work horizontally
Considering CRISP-DM’s six phases and the traditional software mindset, we could:
- Develop a comprehensive project plan based on an in-depth business understanding of all three deliverables
- Collect and analyze (nearly) all relevant data
- Clean, integrate, and format (nearly) all relevant data
- Develop the best possible version(s) of each of the three models within generous time constraints
- Evaluate the results of all three models
- Build an application with an automated scoring pipeline for the entire system
Visually, our work would span sequentially across each layered phase until we have delivered on one large release. Naturally, there might be some backtracking but generally, we do not proceed to the next phase until the completion of the lower layer.
The problems with horizontal slicing
Except for the sixth slice (deployment), each horizontal slice phases provide intermediate work that does not directly provide stakeholder value. It’s like baking an entire cake layer-by-layer, starting with the base layer and moving up to whipped cream and sprinkles. This creates numerous issues:
- The stakeholder must wait until the entire system is deployed before realizing value.
- Meaningful stakeholder feedback is challenging which increases the risk that the entire system might need significant refactoring to accommodate “end-of-project” stakeholder feedback.
- Model evaluation is deferred which limits the data scientists’ ability to understand whether they’re on the right path in earlier project phases.
- By focusing on work like “set up a database” at the start of an effort, the data scientists are naturally more focused on technical tasks during the initial phases, as opposed to delivering business value and getting feedback from their sponsors.
…Or we could slice vertically
Alternatively, taking the agile mindset, we could first bake a small version of the cake (good enough for a slice) that has a small portion of all layers:
- Develop a high-level project roadmap for the entire system with detailed next steps for only the initial model for the voluntary churn use case (first deliverable)
- Collect and analyze enough data for only the voluntary churn model
- Clean, integrate, and format the most promising data for only the voluntary churn model
- Develop a basic, sufficient model for only the voluntary churn model
- Evaluate the results of only the voluntary churn model
- Based on stakeholder needs, the next step could be to:
- Develop an automated system for only the voluntary churn model or deliver the scored file for voluntary churn manually (accrue technical debt for the sake of moving onto something else of higher stakeholder value)
- Improve the basic churn model, possibly with new data sources (go back to phases 2, 3, or 4 for voluntary churn)
- Develop a basic model for the non-pay disconnects
Visually, our workflow cuts vertically up each of the layered phases. Generally, we proceed to the next deliverable once the stakeholder says the current deliverable is sufficient. We will have numerous small releases and will likely re-visit and improve previous deliverables as the project progresses.
The benefits of vertical slicing
By vertically slicing, we can achieve agility. We reduce the feedback loop cycle time and…
- The stakeholder gets value sooner.
- Stakeholder feedback allows for changes earlier in the project that could short-circuit non-value add work or uncover other value streams that can be added into the future project phases.
- The data scientists can evaluate a basic model earlier which likewise could short-circuit work that might lead to a technical dead-end.
- The development team is more aligned to business value.
But how do we slice vertically in real life?
The above analysis admittedly simplifies the reality of the nuances and challenges of vertical slicing in data science. We spend much of our one-on-one mentoring time with our Team Lead students going over specific practices but here are some guidelines for real projects:
- Evaluate whether to vertically slice: There are exceptions when a traditional or mixed approach might be preferred (e.g. technical Proof-of-concepts without business stakeholders, academic research, highly regulated environments that feature a series of approvals).
- If an extensive layer (such as bulk data collection or the setup of an underlying architecture) is more efficiently executed in full (i.e. a horizontal slice), consider a mixed approach of a base horizontal layer (often developed in a “sprint 0”) with vertical slices built on top.
- Start with Why: If you’re going to vertically slice, explain its value to motivate the team towards its use. This is particularly critical for team members who might come from a more structured background and are culturally opposed to agile approaches.
- Focus on sufficiency and speed-to-delivery: The “best” model can be developed iteratively after you’ve validated that you’re on the right path.
- Prioritize (largely) based on the stakeholder: Ask them to break-up their request into small deliverables, each of which provides value. Then ask them to prioritize these requests. Technical dependencies likely determine some ordering of work, but when possible, work on the requests as sequential vertical slices per the stakeholder requests.
- Encourage the stakeholder to provide feedback with each deliverable: They can validate whether you’re heading in the right direction or provide feedback to pivot to a new one.
- To shorten the feedback loop time, split the business value into as thin as vertical slices as possible: Taking the churn example, perhaps the voluntary churn model (first deliverable) can be sliced to first myopically examine customers who churn due to a specific reason (such as price-value or poor signal quality).
- To reduce inter-team coordination overhead and hand-offs, staff the project team with the skillsets to deliver the entire vertical slice (from the business understanding to the deployment).
So, should you vertically slice?
Generally yes but it depends on the specifics of each project and team. For most stakeholder-focused projects and corporate teams (such as the churn example in this post), vertical slicing can help align your work to business value, shorten the feedback loop so that you can validate whether you’re on the right path (and pivot if not), and provide stakeholders with value earlier.
Where can you learn more?
The importance of vertically slicing data science deliverables is a key concept that is unfortunately not explored enough. Here’s a few places to learn more:
- Data Science Team Lead course: We cover this in detail.
- Data science product management post: Can help provide context around how product managers can vertically slice features
- Data sciece roadmap post: Vertical slicing should be incoprorated into roadmaps.
- CRISP-DM page: Repeats much of the same topic in this post but provides context around the fuller classic data science life cycle