When Can We Trust ML Models?

Introduction

The term epistemic trust is described by as the trust of knowledge that has been produced or provided by experts. In this context, not knowing the true “trustworthiness” of information in this uncertain world is the reason why the term trust exists in the first place . With the recent increase in the complexity of Machine Learning (ML) models, proving a level of “trustworthiness” became a more challenging problem . To address these challenges, there seems to be a trend toward Interpretable Machine Learning (IML) . In this article, we will discuss the challenges in building ethical ML models and how IML can help us, using the definitions and framework provided by . Additionally, we will argue that having a common framework and terminology is important for IML .

Efforts for Ethical ML Models

Previous versions of Google Translate translated Turkish sentences “O bir hemşire” and “O bir mühendis,” which have the meaning “She/He is a nurse” and “She/He is an engineer,” respectively, to “She is a nurse” and “He is an engineer” . While currently, Turkish-English translations seem to be fixed by showing both possible outputs, Turkish-Spanish and Turkish-Russian translations still contain the same gender bias (Last checked online: 21 May 2021). introduced variable importance measure to observe which variables are most significant when a classifier makes a prediction. In the experiments of , the classifier trained to predict the likelihood of a person getting arrested is shown to be decided based on sensitive attributes of samples such as race and gender . For problems such as these, organizations like IEEE , EU , and UNI Global Union published guidelines for ethically aligned artificial intelligence systems . Concerning these requirements, to comply with them and also being able to show that the model complies with them, IML is often used . Thus, we will now discuss IML in the following section with the predictive, descriptive, and relevant (PDR) framework of .

Interpretable Machine Learning

One can say, all stored data are interpretations since they are obtained by extracting information with some type of a sensor (i.e., eye, camera, microphone). That is why we will refer to the definitions of , after briefly describing them to overcome this kind of weak definitions in our context. IML is defined by as the extraction of relevant information in an ML model regarding the relationships in the data and the relationships learned by the model. Relevancy or relevant information, here is the information that provides insight into a given domain and problem .

When a data science practitioner wants to develop an IML model, the first step will be defining the domain, the problem, and collecting data. For example, “accurately predicting the credit score of a person” might be a problem defined . Often these kinds of problems have social aspects when they are going to be used in the industry; therefore, a more appropriate problem definition might be “accurately and fairly predicting the credit score of a person” . Ethical or fairness requirements should be defined in this case clearly. For this example, we can say that the model is not relevant if it is not fair, since the answer is not in our problem domain . After these steps are done, the practitioner collects data. If data is not appropriately selected, the covered domain by the model might not be translated with good generalization into the target domain (i.e., there might be bias.) . In the next step, the practitioner designs the model.

The practitioner might face a trade-off between model complexity (e.g., increased parameter count or black-box models) and interpretability . If the practitioner fits the data with a more complex model, often, this results in a decrease in interpretability . The measure of how well the model fitted the data is defined as predictive accuracy , this measure remains fixed after this step. In the next stage, the model is analyzed using IML methods to describe what relations the data has and what the model has learned . How well the interpretation method describes what the model has learned is denoted as descriptive accuracy . The goal here using interpretation methods to maximize predictive accuracy, descriptive accuracy, and relevancy (PDR) of the information extracted . These steps are repeated until these values are satisfactory for the practitioner .

Interpretation Methods

The possible interpretation methods to use vary depending on which stage the practitioner is working on . Methods used on the design stage are concerned with increasing predictive and descriptive accuracy, while the methods applied after the design stage consider only descriptive accuracy .

As an example, consider the following modular model design method. If a model can be divided into parts that can be independently interpretable, that model is said to be modular . used a dataset consisting of pneumonia patients to predict their mortality risk with a generalized additive model with pairwise interactions . Some components of this model being independently interpretable by visualization methods (e.g., curves and heat maps) lead to some interesting discoveries concerning what the model learned from the data . One such discovery was that the model associated a lower risk of mortality with having asthma , which is the opposite in reality because having asthma is known to increase the risk of mortality . After this discovery, patients with asthma were more intensely taken care of, resulting in decreased risk of mortality for those patients . The risk of mortality would be higher since the asthma patients could have been deprioritized if the IML method wasn’t used .

Linear and logistic regression models can provide insights for features under some assumptions of the problem domain . gives an example of inappropriate usage of statistical feature importance interpretations. In the Students for Fair Admissions v. Harvard lawsuit, one side fits the admission data to a logistic regression model to analyze the process of admission . They argued that being Asian-American without low-income is associated negatively with the probability of admission. Later, they added another argument indicating that subjective ratings such as “personal ratings” were negatively impacting the admission probability of Asian-American students . In contrast, the university then gives a report saying that accounting for some variables on the model, the race feature was no longer significant . Thus, both sides had opposite conclusions from the same model. IML cannot be used to verify whether or not the relations in the data are causal or not . Causal inference is another subject of statistics, distinct from IML, focused on these problems. The model both sides used could at best interpret association, not causality .

Conclusion

In the previous section, we saw two examples. They showed the trustworthiness of a model is not only dependent on predictive accuracy but also on relevancy and descriptive accuracy . The problem that remains is that it is unclear whether there is a proper way to measure relevancy and descriptive accuracy . These terms are currently more helpful when setting the expectations for the ML model. However, we should also try to describe what the model requires in those terms in the future. Currently, they help us describe the destination point, finding out how we can go there is up to us.