Alongside it, there is a growing concern for ethical problems caused by those new technologies, as we can see below:
Those problems are all very real and should be solved, but today I want to give a step back and discuss some concepts to explain why they occur. One of the biggest misconceptions in those situations is thinking that those algorithms are biased on purpose; in other words, some people think that the developers were purposefully developing a biased model, to oppress and to reflect their own agendas. This can happen, but in most situations this is a not the case, as those bias can occur without being explicated programmed, as unintended consequences.
To explain how it happens I will use the 2019 Child’s Play movie as a metaphor (watch out: spoilers ahead).
In this movie, there is a new revolutionary toy called Buddi, a high tech doll powered by AI that learn from their surroundings and act accordingly. That way, each toy has a different behavior and adapts itself to the desires of his kid. Besides that, each doll has some built-in security features to avoid some behaviors, such as cursing or committing violence.
The movie starts if the main character (Andy) receiving one of this dolls that call himself Chucky, but Chucky is not a normal doll: because of some sabotage into his production, he is not 100% functional and he is missing the built-in security features. In one of the earliest scenes, Andy’s cat scratch him, making him angry with the cat. In that same night, Chucky watches a horror movie with Andy and his friends and see that the kids are laughing and appear to be happy with all the violence shown on screen. So, by combining the two things (the cat is bad because he hurt Andy and violence is fun), in the same night Chucky kills Andy’s cat. The film then continues in this same trend, but I will not spoil the rest of the movie here.
The interesting part for us here is that the bad action Chucky took (killing the cat) was caused by a good objective (making the kid happy) and by some misinterpretation from the part of the “model” (killing is fun). There was nothing in the original programming of Chucky that explicitly told him to kill a cat, there was only some rules for “make your children happy” and “learn what your children likes” and there’s where the unintended consequences appear. Check here for another famous thought experiment about how a simple AI that generates handwritten notes can leads to a catastrophic situation.
The Chucky example is obviously far-fetched for our current reality, but the same idea applies for our current ML models. Machine Learning models receive historical data to learn upon and it does that learning by optimizing some metric, for example, reducing the number of predictions error it commits. The problem is that if we naively collect the data, it will reflect our biased world, and then the model will reflect it in their predictions.
To illustrate this principle, let’s suppose that we build a AI tool that analyses the performance of our employees and decides who should be promoted to a C-level. This model will be trained using all past data of employees from several companies, their skills and who in the end got promoted to a C-level. The catch however is that we know that for historical reasons, we have less women at C-level jobs than man; therefore if we just train our model using this data, the model will probably use gender as an important variable to decide and there is our bias appearing without we making any effort to implement it. And that history is actually not just a thought experiment, look here and here for some similar situations.
Saying that the models unconsciously generate those bias doesn’t mean, however, that there are no solutions or that we shouldn’t do anything about that. On the contrary, acknowledging that true complexity of the problem is the first step to address it. On the technical side, the first solution that comes to mind is just to just some variables from the dataset; for example, in your previous example of the HR model, we can just remove the gender variable from the data and then the model cannot be gender-biased, right? Unfortunately, that’s not the case, because ML models are especially good at using other variables to extract latent information (in other words, the model would be able to infer the gender of each person by looking at the rest of the variables), so the bias would still be there. Another alternative that comes to mind is to just alters the metrics to show the model how to learn in a “fair” way, but when you actually goes into the math of how to doing that the problem is much more complex than it appears on the first glance, as explained here.
Another approach that is promising to reduce the unfairness and the bias of the model is to hire a more diverse team to build and to evaluate the model’s performance, because with more perspectives in the development of team it’s easier to detect those problem and find ways to solve it. You can read here if you want more details on that.
As you can see, ethics in AI is a complex topic and I just scratched the surface here, so you can start to think more about those things. There is a lot of materials online to learn more, but if you are interested I really recommend the Fast Ai’s Practical Data Ethics course as the first stop you should make in this learning process.