To help manage GenAI projects, below I first describe the GenAI life cycle, compare it with the CRISP-DM life cycle, which is the most common life cycle used in traditional data science projects, and then discuss key project challenges that introduce uncertainty into GenAI projects.
The GenAI Life Cycle
Like other types of projects, a GenAI project starts with Problem Definition, pinpointing the challenge / opportunity that the GenAI Application will tackle. Data Investigation follows, with the team selecting the data to augment and/or train the GenAI Large Language Model (an LLM is an advanced AI predictive model that processes and generates human-like text). Next is Data Preparation, which structures this data for optimal AI utilization. This is followed by Development, which is when the team builds the GenAI application. Evaluation then tests the reliability and user-friendliness of the AI-based application. Successful testing leads to Deployment, where the AI application is integrated into its operational environment. The cycle concludes with Monitoring and Improvement, where ongoing feedback refines the AI application, ensuring its relevance and performance in the real world.
Below is an explanation of each life cycle phase:
- Problem Definition: Define the problem, understand its business context, and set clear objectives for the GenAI solution to be developed. This includes determining the scope, potential impact, and desired outcomes of the GenAI application.
- Data Investigation: Investigate and source data that can be leveraged by Retrieval-Augmented Generation (RAG) to supplement the Large Language Model being used. RAG enables the LLM to access and use a wide array of up-to-date, external information, thus significantly enhancing the LLM’s ability to deliver detailed and relevant responses. So, this phase focuses on assessing the data landscape, focusing on data availability, relevance, and quality.
- Data Preparation: This step involves cleaning, formatting, and structuring data to make it suitable for use with the chosen GenAI models and technologies. This often includes preparing the data by processing and embedding it into a vector store database.
- Development: Develop the agent by using appropriate the LLM model(s), with considerations for integrating RAG and using other AI techniques such as designing effective prompts (which are natural language instructions given or input to an LLM, guiding it to produce a desired output). This phase also includes, if necessary, the fine-tuning of a Large Language Model.
- Evaluation: Conduct rigorous testing of the agent to ensure its correctness, readability, performance, and reliability. Evaluate the agent against predefined criteria and objectives to ensure it meets the required standards and business needs.
- Deployment: Deploy the agent in the intended environment, which includes setting up the necessary infrastructure. This infrastructure setup should facilitate hosting, scaling, and managing the agent, ensuring its smooth operation and integration into existing systems.
- Monitoring and Improvement: Implement continuous monitoring of the deployed application to track its performance, user satisfaction, and operational efficiency. Regularly update and improve the agent based on performance data, user feedback, and evolving business needs.
Reviewing the CRISP-DM Life Cycle
To compare the GenAI life cycle to the life cycle used for other, more traditional, data science projects, we can leverage CRISP-DM (Cross-Industry Standard Process for Data Mining), which is a well-known framework used for data science projects. In fact, despite being developed over two decades ago, CRISP-DM remains the most popular choice for data science teams. It is favored for its general applicability, common-sense approach, and adaptability to various project needs.
This methodology is divided into six distinct phases, each addressing a key aspect of the data science life cycle:
- Business Understanding: This initial phase focuses on understanding the project objectives and requirements from a business perspective. It involves defining the problem, identifying the necessary data, and formulating a preliminary plan to achieve the objectives.
- Data Understanding: This stage entails the collection and exploration of data. It involves tasks like data collection, initial data analysis, and identifying data quality issues.
- Data Preparation: Here, data is cleaned and transformed into a suitable format for modeling. This includes tasks like selecting and formatting data, as well as creating new data sets from multiple sources.
- Modeling: This phase is often considered the most exciting part of a data science project. It involves selecting modeling techniques, building and assessing various models, and iterating until a satisfactory model is developed.
- Evaluation: This phase focuses on evaluating how well the application meets the defined business objectives. It involves reviewing the results, the process, and determining the next steps, such as whether to proceed to deployment or initiate new projects.
- Deployment: The final phase involves planning and implementing the deployment of the application. This can range from generating a report to implementing a robust predictive model. It includes planning for monitoring and maintenance, producing final reports, and conducting project reviews.
Comparing GenAI and CRISP-DM Life Cycles
While CRISP-DM and the GenAI life cycle share similar foundational phases, the GenAI life cycle incorporates additional considerations that are specific to the development of AI systems, such as the use of LLMs and the use of RAG.
Below is a phase-by-phase comparison of the GenAI life cycle with CRISP-DM:
- Problem Definition vs. Business Understanding
- Similarity: Both phases focus on understanding the problem and setting the objectives of the project.
- Difference: This first phase is very similar across both life cycles.
- Data Investigation vs. Data Understanding
- Similarity: These stages involve identifying relevant data sources and assessing their quality and relevance.
- Difference: In GenAI, Data Investigation is particularly focused on the suitability of data for RAG, which is specific to AI model training, while CRISP-DM’s Data Understanding is more general, considering all aspects of data relevant to the problem.
- Data Preparation
- Similarity: Both life cycles include a phase dedicated to cleaning and transforming data into a suitable format for analysis or processing.
- Difference: GenAI’s Data Preparation specifically involves embedding data into vector store databases for use with AI models, which is a step beyond the traditional data preparation in CRISP-DM and tailored to AI’s needs.
- Development vs. Modeling
- Similarity: Both involve selecting models or solutions to address the defined problem.
- Difference: GenAI’s Development phase includes the creation of prompts and the potential fine-tuning of LLMs, which is a step specific to GenAI applications, whereas CRISP-DM’s Modeling is typically more statistically driven.
- Evaluation
- Similarity: Evaluation in both methodologies is about testing the model’s performance against objectives.
- Difference: GenAI’s Evaluation is likely to be more user-centric, focusing on the agent’s interaction with end-users, readability, and user-friendliness, while CRISP-DM might focus more on statistical evaluation metrics.
- Deployment
- Similarity: The deployment phase in both methodologies is about integrating the solution into the user environment.
- Difference: GenAI’s Deployment might involve more complex infrastructure setups due to the potential computational requirements.
- Monitoring and Improvement vs. CRISP-DM’s Deployment
- Similarity: There is no direct equivalent in CRISP-DM for the Monitoring and Improvement phase, but CRISP-DM does acknowledge the need for ongoing support post-deployment.
- Difference: GenAI places a specific and structured emphasis on continuous monitoring and iterative improvement post-deployment, which is essential for AI systems that learn and evolve over time.
Exploring GenAI Project Uncertainty
Below, I explore two key data science project risks that drive project uncertainty, in that these risks create significant uncertainty in project schedules and deliverables. These typical data science risks are then mapped to analogous risks for GenAI projects. For these challenges, one can see that the life cycle phases are similar (across data science and GenAI projects), but the actual challenges, and tasks to related to these challenges, are very different.
Model Accuracy Uncertainty
The uncertainty of model accuracy focuses on if the project will be able to generate a predictive model that is ‘good enough’. Due to this uncertainty, achieving an acceptable level of model accuracy presents a key challenge that often impacts project timelines and project success criteria. At a high level, across both traditional machine learning projects and GenAI projects, this ‘model accuracy’ challenge focuses on ensuring that the data provided to the model can inform the model such that the model predicts (i.e., generates) responses with precision.
While the tasks are very different for improving model accuracy for traditional ML and GenAI projects, the key life cycle phases are similar: data preparation, development/modeling, and evaluation.
Data Science / Machine Learning Model Accuracy
For traditional machine learning projects, the model accuracy risk is primarily focused a model’s predictive accuracy. The primary sources of model accuracy uncertainty focus on data quality, data relevance, and the inherent predictability of the dataset.
Teams use techniques such as feature engineering to refine data attributes to boost model performance. Teams can also try to select more appropriate machine learning algorithms to improve model accuracy. However, sometimes the available data does not have the desired predictive insights, even after trying these approaches.
GenAI Model Accuracy
GenAI projects face a similar risk, but the focus is on the generation of contextually accurate output. In other words, the analogous risk for a GenAI project is determining if the GenAI project will be able to generate answers that are ‘good enough’.
In GenAI projects, the quality of the output hinges on carefully crafted input prompts that direct the model to produce the desired result. Hence, prompt engineering is central for improving this type of model accuracy. Prompt engineering is the art of crafting inputs that guide the AI model to generate the desired output. Unlike traditional data processing, where the focus is on the data itself, prompt engineering shifts the emphasis to how we communicate with the model, making it a unique aspect of GenAI projects. Refining what data is available via retrieval-augmented generation can also improve model output, as can selecting the most suitable large language model (e.g., OpenAI 3.5 vs. 4).
Hence, just as more data and better data quality can improve a traditional predictive model, refining prompts, enhancing the underlying RAG data, and using a different model can lead to more accurate and reliable output from a GenAI system. However, sometimes the desired outputs can’t be generated, even after trying these approaches.
Data Preparation Uncertainty
The risk of ‘data preparation’ taking longer than expected is a common concern in data science projects because the process can be intricate and time-consuming. It involves cleaning, transforming, and encoding data, dealing with missing values, and ensuring that the dataset is well-suited for modeling. This phase can be unpredictable due to unforeseen issues with data quality, volume, and structure.
In GenAI projects, while there is still a significant need for data preparation—such as formatting data for training a large language model or ensuring that the data is suitable for Retrieval-Augmented Generation — the process may not carry the same level of risk as in traditional data science projects. This is because GenAI often leverages pre-trained models that require less traditional feature engineering and can handle more unstructured forms of data.
However, there are analogous risks in GenAI projects:
- Complexity of Data Embedding: GenAI systems often require data to be embedded into high-dimensional vector spaces, which can be complex and time-consuming, especially when dealing with large datasets.
- Training Data Quality: Preparing high-quality training data for GenAI can be challenging. The models often require large amounts of data, and ensuring that this data is diverse, unbiased, and representative can introduce delays.
- Fine-Tuning Challenges: While large pre-trained models are used, fine-tuning them on specific tasks or datasets can be a complex process, with the potential for overfitting, underfitting, or other issues that require additional iterations of data preparation and model training.
Key Insights
In both traditional data science and GenAI projects, the objective is to refine the data inputs to improve a model’s ability to produce accurate and useful outputs.
While one might view these life cycles as a linear, step-by-step waterfall approach, developing these solutions are inherently iterative, involving frequent loops through the project’s life cycle (i.e., revisiting previous stages for application enhancement). These iterations facilitate risk mitigation by allowing teams to refine their approach based on current project results.
In summary, while the nature of project uncertainty differs, GenAI projects are not immune to project risks in data preparation or development. In fact, GenAI projects have their own unique challenges that can affect the time required to move from data preparation to development to deployment. Just as in more traditional data science projects, these risks need to be managed proactively, via an iterative process that enables the team to proactively address these uncertainties.