The following is an interview by Jeff Saltz with Jelle de Jong.
Jelle, a certified DSPA Team Lead, is an independent consultant. He currently works with Lely, an agricultural business based in the Netherlands.
Jeff: Can you provide some background on your consulting business?
Jelle: I’ve been in the quantitative modeling industry for 15+ years. Some of my engagements are short, such as 6 months, but others are much longer. I’ve been working with Lely for the past 18 months, and prior to that, I’ve worked at organizations such as T-mobile, EY, and ABN Amro Bank.
Jeff: Can you describe your data science work at Lely?
Jelle: Sure. Lely is an international business in the agricultural sector, based in the Netherlands. Lely offers solutions for almost all activities on the dairy farm – from milking and feeding to cleaning. In short, Lely provides solutions for automation of tasks on the farm through robotics and farm management systems that help manage the dairy farm smartly.
Yes, there really is a need for data science within the dairy industry. Lely leverages data science for challenges such as optimizing the behavior of an autonomous milking robot, tailoring the milking robot to individual cows, as well as giving advice to farmers on how to best make day-to-day tactical decisions.
Jeff: You are Data Science Team Lead certified. How has the course been helpful to you?
Jelle: For me, it was really important to start thinking about the issue of organizing work for delivering data-driven solutions in a more conceptual way, especially for how to think about how data science is different than software engineering. In particular, the Data Driven Scrum (DDS) approach is helpful to mitigate some of the issues of using mainstream agile methodologies for data science.
Jeff: How has the course made a difference in how your team works together?
Jelle: More clear communication about data science processes and less muddled thinking, which makes it easier to get these ideas across. The concrete ideas will help with creating transparency towards stakeholders, such as where are we in a project, what are the steps to be taken, and what can we expect. It will also help organize and coordinate within and across teams – to do data science work more effectively.
In terms of being successful with data science, I think that having proper agile processes in place can really make the difference, maybe even more so than the application of advanced modeling methods, as these ideas will help in building the right solutions – solving the business problem – and aligning with the rest of the organization.
Jeff: Taking a step back, why not just use CRISP-DM to manage a project?
Jelle: When using task-oriented workflows, such as CRISP-DM, there is a great deal of focus on what to do next, such as data preparation, modeling, or model evaluation. However, in my projects, there is typically a lot of going back and forth between different CRISP-type phases. In other words, it’s not really practical for my projects to be done in a waterfall-like CRISP-DM phased approach. Rather, we need to build up our knowledge, and go through the phases, incrementally.
So, a phase is really never done. Because of this, we need a way to determine when to go back to the previous phase, and when to go forward. In fact, we really have to decide how much of a phase to actually complete before going to the next phase. This is where collaboration frameworks are helpful. I would note that some folks call these outcome-driven workflows, since they focus on incremental delivery of value through a phased approach.
Jeff: Do you now use Scrum, Data Driven Scrum or a different outcome-driven workflow?
Jelle: Before becoming DSTL certified, we were using scrum within our organization. For data science, the fixed time-boxed sprints and estimation are not an ideal fit.
Data Driven Scrum, which is an adaptation of Scrum, fixes the awkward fit of Scrum to data science – especially Scrum’s fix length iterations. I also like that DDS provides a data-driven solution life cycle focus. In short, the course introduced me to DDS, which can eliminate these Scrum issues while still keeping the benefits of Scrum and smooth collaboration with other teams that might be using Scrum, such as software development teams.
Jeff: DDS and Scrum are both agile frameworks – what does agility mean to you in a data science context?
Jelle: What is the essence of agile? I think it provides adaptivity and iterative incremental learning. It also provides the ability to pivot, so we have to have the right iteration, or complete experimental cycle, focus.
Jeff: How does DDS fit with CRISP-DM?
Jelle: I think DDS is in some sense complementary to a CRISP-DM type workflow, in that DDS gives a higher-order organization to the work to be done, it helps with planning and doing road mapping and in general providing a communication framework. Whereas the phases of CRISP-DM help structure the work by providing guidance on what tasks are to be done, it feels to me it is from an abstract perspective. This is because the phases are not typically strictly executed as a simple workflow, where the team goes from one phase to the next. Rather, the team might stay on a phase for a while or even go back to a previous phase. But this going forward, staying in the same phase or going back to the previous phase is left for the team to figure out, in that there are no concrete steps to decide what to do next.
However, when using DDS, each iteration, or phase, now has clear goals, and moving forward, or going back to a previous phase, is a formal decision moment, based on what we learned in the current iteration. We now have a clear way to communicate where we are in a project / product lifecycle.
Jeff: How do you integrate the concept of scaling a model within your implementation of DDS?
Jelle: As part of each iteration, when we are thinking about what to create, we include the concept of having an iteration be focused on scaling up the solution, rather than just improving a solution. So, for example we might scale up from a proof-of-concept to a pilot, or from a pilot to full deployment. Of course, we might also have an iteration go back to an enhanced proof-of-concept. Note that we might also think of an iteration enhancing an existing production model – which is directly related to the concept of MLOps.
So, if our validation of a PoC was very positive, we might decide to invest and move forward with a pilot.
Jeff: Can you discuss a bit more about how you think about scaling your solution?
Jelle: Sure – we think in terms of product scaling phases. One can think of these as being an elaboration of the CRISP-DM deployment phase, where the current phases in CRISP-DM are focused on creating a proof-of-concept, and then we define phases that incrementally scale the solution.
After an initial lightweight project / product definition, or initiation phase, we aim for a proof-of-concept, which focuses on the typical CRISP-DM set of phases, such as data understanding, data collection, data preparation, exploratory analysis, modeling and model evaluation. Our model evaluation typically looks at offline data to see if the model is “good enough” with respect to the business goals. We try to understand if there is evidence to justify a pilot.
The pilot is the next phase from a scaling perspective. In this phase, we focus on testing the model with live data and integrating it with operational processes. To understand how the model compares with the status quo, we define and execute some sort of A/B testing. Our observe and analyze efforts focus on understanding if the pilot is delivering the value that was expected.
After the pilot, our next phase is the scale-up to full deployment. During this phase, we focus on scaling the infrastructure as well as fine-tuning, or optimizing, the model. We also set up production monitoring and troubleshooting tools and processes.
Of course, the last phase is maintenance, which is really continuous improvement, where we continue to improve, for example, doing additional A/B testing, as well as monitor the current solution.
Note that each of these phases might have several iterations and the team collectively determines if it should move to the next phase. In other words, should the project scale up the solution, or go back and do additional work from an earlier phase, such as doing a different PoC.
Note: for reference, we have included a summary of Jelle’s ‘scaling phases’
Jeff: Thanks for taking the time to discuss your data science process and our Team Lead course. Maybe one last question – what was your favorite part of the course and what would you say to others considering taking the course?
Jelle: Favorite part was definitely the one-on-one conversations / discussion with the course instructor. Great to have discussions on this topic with someone who thinks about these things from a scientific / methodological perspective, but also has hands-on experience with managing data science work. Also, the discussions in the videos between the course instructors work well to get a varied perspective on the discussed topics.
I would recommend taking the course as it gives you the conceptual frameworks that you need to think constructively on how to better organize data science work within your organization and additionally might introduce you to new frameworks or workflows that you could apply in your team. Additionally, reflecting on your current way-of-working, through the course, is very useful as it can lead to new insights on where to improve.
To learn more, explore training via our Data Science Team Lead course.
Or jump in and learn more through these articles: