Machine learning predictive models are often used in data science projects. These models often deliver significant value. Unfortunately, there are many examples of AI (machine learning algorithms) not working as desired. For example, a model can cause harm to subsegments of society, as well as to the organization deploying these AI techniques. Responsible AI focuses on minimizing these undesirable results.
As data science professionals, we need to think through the potential issues that might arise during a project. These challenges include machine learning algorithms that might cause issues such as bias.
- What is Responsible AI?
- Why is Responsible AI Important?
- Who is Responsible for Responsible AI?
- 5 Best Practices to Achieve Responsible AI
What is Responsible AI?
There is not one commonly accepted definition for Responsible AI. So, let’s explore what is Responsible AI by first reviewing what is AI (Artificial Intelligence).
Most data science projects use machine learning to develop predictive models. In a different blog post, I discussed the difference between AI, Machine Learning, and data science. But people often use AI and Machine learning interchangeably. They use these terms to note the creation and use of predictive algorithms.
These models learn from previous observations of data to predict future observations. For example, a machine learning model can use previous emails, marked as spam (or not spam), to predict future emails that are spam.
However, the key aspect of using predictive models is creating, and then using those models in a responsible way. At a conceptual level, Responsible AI is making sure the predictive models are fair and where appropriate, transparent (i.e., an explanation can be provided for the prediction).
Two example definitions
These definitions might help to show the breadth of how people view responsible AI. One from Accenture and is as follows:
Responsible AI is the practice of designing, developing, and deploying AI with good intention to empower employees and businesses, and fairly impact customers and society—allowing companies to engender trust and scale AI with confidence.
The other is from Alexander Gillis:
Responsible AI is a governance framework that documents how a specific organization is addressing the challenges around artificial intelligence (AI) from both an ethical and legal point of view. Resolving ambiguity for where responsibility lies if something goes wrong is an important driver for responsible AI initiatives.
Note how the second definition mentions the use of a framework to help achieve responsible AI. This is key. It is not enough for a project manager or data scientist to want an ethical data science project. It is also important to create a framework in place to help raise the appropriate questions at the appropriate time.
Why is Responsible AI Important?
Before we explore how to achieve Responsible AI, let’s review why it is important to try and achieve Responsible AI.
In fact, there are many examples of why it is important to evaluate how predictive models are built and used. For example, way back in 2015, it was found that Google labeled pictures of Black people as gorillas. Clearly, Google did not do this on purpose, and in fact, Google said it was “genuinely sorry” and would work to fix the issue immediately. However, in 2018, Google had not yet really fixed the issue. More recently, in 2021, Facebook had a similar issue, where an AI system put a primates label on a video of a Black Man.
In a different example, many couples who applied for an Apple credit card, when it was first introduced in 2019, found that women received less credit than their male spouses, even when both applicants (the husband and wife) shared the same income and credit score.
These examples show why it is important to try to ensure that AI models are used in an appropriate way. I did not highlight these examples to pick on Google, Facebook or Apple. Rather, I wanted to point out that even for companies that have many, many data scientists, and clearly do not want to introduce these issues, achieving Responsible AI is difficult.
Put another way, just stating “let’s do Responsible AI” is not enough. There needs to be a process in place to help minimize the risk when deploying a predictive model.
Who is Responsible for Responsible AI?
As a first step towards achieving Responsible AI, it might be helpful to think who is responsible for Responsible AI.
First and foremost, the entire team is should be part of the effort to help ensure an ethical AI model. However, from an accountability perspective, the data science project manager is the person responsible for responsible AI. In an organization with multiple data science projects, it is the person in charge of these efforts (e.g, VP of data science, Chief Analytics Officer) that should be accountable to ensure that the models developed are responsible and ethical.
5 Responsible AI Best Practices
When trying to create Responsible AI solutions, be sure to think about the following key best practices to help achieve Responsible AI.
1. Be Human Centered
Human Centered Data Science addresses the fact that bias and inequality may result from the automated collection, analysis, and distribution of very large datasets.
A human-centered approach can enhance each phase of the data science life cycle. In short, across each phase of the life cycle, the team should focus on those that might be impacted by the predictive model.
So, the team should be thinking about the model being created from the users’ perspective (not just the organization building the model). For example, the team should think about (or engage with) a diverse set of potential users and use-case scenarios. Do this throughout the project, not just at the start, nor at the end of the project.
This helps enable the team to build a wide variety of user perspectives into the project. With these users in mind, the team can then consider the impact that might occur due to wrong predictions. Equally important, the team should also understand the potential impact of incorrect predictions.
2. Minimize the risk of Creating a Biased Model
It is important that the team understands and mitigates any potential bias in the data.
When a segment of a population is underrepresented in data, the model might not accurately take into account that underrepresented segment. This can lead to the continued marginalization of that segment of the population. One way to think about this challenge is that a model with biased data can lead to the automatization or continuation of a bias.
For example, if a dataset contains mostly images of white men, then a facial-recognition model trained with these examples may be less accurate for women or people with different skin tones. Hence, rather than just thinking about how much data is needed to train a model, the team should take into account data diversity. As noted by researchers at MIT, “we need to stop thinking that if you just collect a ton of raw data, that is going to get you somewhere. We need to be very careful about how we design datasets in the first place”.
Hence, a key step in the data science workflow should be a thorough evaluation of the available data. This evaluation includes how the data might mitigate or introduce potential biases. It is also a good idea for a domain expert to help review the training data. Doing this review during the exploratory analysis phase of the project is often appropriate. By doing this review, a domain expert may see biases the team has overlooked.
3. Understand the Importance of an Explainable AI Model
Some models that teams create are “black box”. Black box models can generate accurate predictions. However, when using a black box model, it can be difficult to explain why the model made a specific prediction.
The team should, where possible, work to make it easy for people to understand what are the drivers of the predicted results (e.g., based on features and models chosen). In some situations, the team might have to choose between more accurate but less explainable models. Other times, a slightly less accurate but more explainable model would be more appropriate. A team can use a human centered focus can help evaluate the alternatives.
Understanding the behavior of a model is commonly known as Explainable AI. This can help a team understand potential biases. Explainable AI helps characterize model accuracy, fairness, transparency and outcomes. Explainable AI is also helpful within an organization, where transparency can build trust and confidence in the model from others in the organization. This trust is important when evaluating if predictive model should be put into production.
Note that teams can work to explain their “black box” models. For example, Google focuses on achieving explainable AI via understanding the model predictions, not how the model generated those predictions. In other words, tools can help the team understand and interpret the predictions made by machine learning models and achieve explainable AI.
4. Ask Questions
A team should identify potential challenges and help achieve Responsible AI by asking questions across the life cycle.
These questions range from basic management questions:
– Which laws and regulations might be applicable to our project?
– How are we achieving ethical accountability?
To data related questions, such as:
– Is an individual’s privacy and anonymity impinged via our aggregation and linking of data?
– How do we know that the data is ethically available for its intended use?
To model related questions, such as:
– Did the team explore their potential biases?
– How transparent does the model need to be and how is that transparency achieved?
– Are misinterpretations of the results possible?
– What are likely misinterpretations of the results?
5. Learn More, Continue to Explore
Clearly, it takes a lot more than just reading any blog post for a team to achieve Responsible AI. Keep reading, keep learning.
Below are some additional resources: