The bias is the inability to capture the true relationship between the data and the learning line by the machine learning algorithm. Prejudice is like racism in our society: it favors one gender and ignores others. Biases could be introduced at various phases of model development, including insufficient data, inconsistent data collection, and poor data practices. There are different types of bias in general that would be discussed in this article and ways in which the bias could be mitigated by the machine learning algorithm. Here are the topics that would be covered in this article.
- Reason for bias
- Types of bias
- Mitigation techniques
Bias cannot be completely resolved, but it can be reduced to a minimum level so that there is a balance between bias and variance. Let’s start by understanding the reason for the appearance of the bias at the initial levels.
Reason for bias
Bias is a phenomenon that occurs when the machine learning algorithm has made a huge number of assumptions that are not consistent with the real world problems the algorithm is used to solve. So basically bias skews the result of an algorithm in favor or against the solution. A high bias would fail to capture the true data patterns and the underfitting of machine learning algorithms.
Biases are usually unintended, but their existence can have a huge influence on machine learning systems and the results could be disastrous, ranging from terrible customer experiences to fatal misdiagnoses. If the machine learning pipeline contains inherent biases, the algorithm will not only learn them, but also make worse predictions. When creating a new machine learning model, it is essential to identify, assess, and eliminate any biases that may be influencing the predictions. The above explanation is presented in tabular form for better understanding.
Types of bias
Bias may be a human problem, but bias amplification is a technical problem, a mathematically explainable and controllable byproduct of how models are trained. Bias occurs at different stages of the machine learning process. These commonly known biases are listed below:
- Sample bias occurs when the data collected is not representative of the environment in which a program is intended to be implemented. No algorithm can be trained with all the data in the universe, rather it could be trained on the carefully chosen subset.
- Exclusion bias occurs when certain features are excluded from the data set, usually during data processing. When there is a large amount of data, say petabytes of data, choosing a small sample for training purposes is the best option, but doing so may accidentally exclude features from the sample, resulting in a biased sample. There may also be an exclusion bias due to the removal of duplicates from the sample.
- Bias of the experimenter or the observer occurs during data collection. While collecting data, the experimenter or observer may record only some instances of data and ignore others, the skipped part may benefit the learner, but the learner learns from the instances which are biased by the environment. Thus, a biased learner is constructed.
- Measurement bias is the result of incorrect data recording. For example, an insurance company takes the weight of customers for health insurance and the scale is faulty but the data is still recorded unnoticed. The result would be that the learner would classify customers into the wrong categories.
- Prejudices is the result of human cultural differences and stereotypes. When this detrimental data is conveyed to the learner, he is applying the same stereotype that exists in real life.
- Algorithm bias refers to certain parameters of an algorithm that cause it to create unfair or subjective results. When he does this, he unfairly favors someone or something over another person or thing. It may exist due to the design of the algorithm. For example, an algorithm decides whether to approve credit card applications and data is fed that includes the applicant’s gender. Based on this, the algorithm could decide that women earn less than men and therefore applications from women would be rejected.
In general, bias is either implicitly (unconsciously) added to the learner or explicitly (consciously) added to the learner, but ultimately the result would be biased. Let’s see how to mitigate bias to get an unbiased result from the learner.
Bias mitigation algorithms are categorized based on where they are deployed in the machine learning pipeline. An image is shown below. Generally, if it is pre-processing algorithms, you can modify the training data. Processing algorithms modify the training procedure of a machine learning model. If you cannot modify the training data or the learning algorithm, you must use the post-processing algorithms.
Mitigation of preprocessing biases
Mitigating preprocessing bias starts with training data, which is used in the first phase of the AI development process and often introduces underlying bias. Analyzing the model’s performance train on this data can reveal disparate impacts (i.e. a specific gender is more or less likely to get car insurance), consider this in terms of detrimental bias (i.e. a woman has an accident with her vehicle and still gets low budget insurance) or in terms of fairness (i.e. I want to ensuring that clients obtain unbiased assurance about their gender).
Negative results will likely occur with a lack of diversity within the teams responsible for building and implementing the technology during the training data stage. How the data is used to train the learner shapes the results. If a feature is eliminated according to the team but it could be important for the learner, the result would be biased.
In-Process Bias Mitigation
Current processing models provide unique opportunities to increase fairness and reduce bias when training a machine learning model. For example, when a bank attempts to calculate a customer’s “repayment capacity” before approving a loan. The AI system can predict a person’s ability based on sensitive variables such as race, gender, or proxies that can be correlated. This can be overcome by using contradictory deviation and prejudice remover.
- Contradictory debiasing is a classifier model that learns to maximize the accuracy of predictions and simultaneously reduce an adversary’s ability to determine the protected attribute from the predictions. This method leads to a fair classifier since the predictions are not discriminatory between group members. Essentially, the goal is to “break the system” and get them to do something they may not want to do, in response to the impact of negative biases on the process.
- Bias remover is to add a discrimination sensitive regularization term to the learning objective.
Mitigating post-processing bias
Post-processing mitigation becomes useful after the model is trained, but now wants to mitigate biases in predictions. This could be achieved using:
- Equalized odds solve a linear program to optimize equalized odds by changing output labels based on likelihood probability.
- Calibrated equalized odds calculate the probabilities with which to change the output labels with an equalized odds objective using overcalibrated classifier outputs.
- Classification of rejection options is used to give favorable results to the unprivileged (biased) groups and unfavorable results to the privileged (impartial) groups in a confidence band around the decision boundary with the greatest uncertainty.
However, as the outputs increase, the accuracy may be impaired. For example, an algorithm sorts candidate resumes, this process may result in fewer qualified men being hired if the sorting is equal to gender, rather than relevant skills (sometimes called positive bias or positive action). This will impact the accuracy of the model, but it achieves the desired goal.
Ultimately, there is no way to completely eliminate algorithm bias, but it could be mitigated using some techniques as mentioned in the article. This helps build balanced biased machine learning.