On 15 June 2017 my friend Philippe Van Impe gave me the opportunity to organize the first meetup in his newly opened Big Data temple in Brussels : DigitYser. And what a success ! The topic of this first Meetup at DigitYser was “Big Data and ethics“, an important topic in data science yet insufficiently covered. The idea for this meetup popped up during my talk at the Data Innovation Summit where people started tweeting about the importance of ethics and in a matter of minutes we had the speakers for this upcoming meetup.
Philippe was kind enough to take care of the oganization in early April and we ended up with some 175 registrations, a huge number given he “non-technical” topic. Eventually some 75 people showed up which was more than enough because we weren’t prepared to welcome more (we’ll have to buy some chairs for the upcoming meetups).
After a not-so-short introduction by Philippe, and a few words by myself to launch the evening, we had the pleasure to listen to Prof. Michael Ekstrand (Boise State University, Idaho) whom I first met at the RecSys conference in 2016 at MIT in Boston, MA. Michael who had just arrived a few hours before from the US, gave a very inspiring talk on the ethics of algorithms and the values behind them. Here are a few points I think were especially important in Michael’s talk :
- a system might be designed to serve the interests of the largest segment(s) of users, ignoring adverse effects on minorities. The latter, Michael argued, might get worse results. Particularly interesting was the reference to a research by Microsoft Research (Mehrotra et al. 2016) which looked at consumption patterns of web search vs. news across different groups of users (Regular Users vs. Frequent News Users). The results clearly show that different clusters co-exist, some of which might be negatively impacted by changes carried out to please the largest clusters.
- the ethics of a system depends on what it is designed for. If a system tries to maximize clicks, low-quality pages (clickbait) might be promoted which poses an ethical problem (does a system serve greater Good if it promotes shitty content ?)
- beware of badly trained recommendation systems : Michael gave the example of a restaurant recommending feature on Twitter (that luckily didn’t go in production) where Mexican restaurant were ranked low. The recommender system had been trained on a public corpus and learned an association between “Mexican” and “illegal” that led to correlating everything which was Mexican with bad. That’s not exactly the type of ethical data treatment you want to promote.
Posted in Big data, Innovation.