I attended a workshop at the Maastricht European Centre on Privacy and Cybersecurity where I met Prof. Alessandro Mantelero of Polytecnico di Torino.
Prof. Mantelero is well known as a specialist in the protection of personal data, which has become a very important topic with GDPR.
In his talk he was discussing among other things the biases of algorithms and highlighted the need to avoid them. Using examples from the insurance sector (smart boxes) and from the field of criminology (predictive crime algorithms), Alessandro showed that algorithms can discriminate certain groups of people.
How smart boxes can discriminate groups of people
For instance smart boxes installed on vehicles can identify certain journey (based on GPS coordinates), neighborhoods and times of the day when accidents are more likely to happen. So could a higher likelihood of accidents be detected for black people driving at night, most probably people coming back from a party where alcohol has been consumed. The smart box doesn’t know obviously about this alcohol consumption. Hence the correlation will only be drawn on the basis of variables it collects : GPS coordinates and time of the day for example. The algorithm will “mix” these variables with other information available on the insured drivers (who happen to live in neighborhoods where other people from the same ethnicity are working) and conclude that black people driving at night in this area have a higher risk of causing an accident and should pay a higher premium. This biased correlation puts other black people working in nightshift at risk of discrimation althoough they have not consumed any alcohol.
The problem is the causality
The problem in the example above is the causality. The smartbox doesn’t know you have consumed alcohol. This is the real cause of accident, it has nothing to do with ethnicity, but the algorithm can’t know it. Other variables are therefore used by the algorithm to predict accidents and the those variables taken together happen to discriminate other people. This is the problem of Big Data. Whereas causality was at the heart of sociologists’ and statisticians’ work 20 years ago (and even before that), causality is not something we try to understand today with Big Data technologies.
Is discrimination inherent to any kind of algorithm
The question I asked myself is whether algorithms aren’t doomed to be discriminatory by nature.
Algorithms are by essence programmed to identify groups of people (segments, clusters) in order to have a more personal relationship with them. I think there is nothing wrong with that. Who wouldn’t like to be treated personally rather than like everyone else ?
When it starts to backfire is when the algorithms associate behavioral variables that lead to isolate well-identifiable people; in other words when the combination of such variables leads to isolate a cluster of people based on their traits rather than on their acts.
How to avoid algorithmic discrimination
Alessandro Mantelero suggested that algorithms be tested and results scrutinized as to their discriminatory nature. This is a wise advice. Drugs manufacturers do it. Why wouldn’t algorithms producers assess the effect of their products ?
Image : Shutterstock
Posted in big data, Innovation, Research.