14 November 2016 1835 words, 8 min. read

Big Data : 4 types of filter bubbles

By Pierre-Nicolas Schwab PhD in marketing, director of IntoTheMinds
The question of the existence of filter bubbles (also called cognitive bubbles) is central to the fields of Big Data and algorithm design. Among different books, I was very much influenced by the Dominique Cardon’s book on algorithms and society. […]

The question of the existence of filter bubbles (also called cognitive bubbles) is central to the fields of Big Data and algorithm design. Among different books, I was very much influenced by the Dominique Cardon’s book on algorithms and society. In this book entitled “What are algorithms dreaming of ?” Cardon proposes a framework based on four types of web measurements, each resting on a specified data type (see summary table below).


Examples Data Population Type of computation Principle
On the side Médiamétrie, Google Analytics, advertising Views Representative sample Vote Popularity
Above Google Page Rank, Digg, Wikipedia Links Selective vote, communities Meritocratic rankings Authority
In Number of Facebook friends, Retweets on Twitter, ratings Likes Social network, affinities, declarative data Benchmark Reputation
Below Amazon Recommendation, Targeted advertising Tracks implicit feedback and behaviors Machine Learning Prediction


Framework by Dominique Cardon on data typolgies and usages

To extend the discussion I started in earlier articles on filter bubbles, I asked myself the question: “can the framework proposed by Dominique Cardon be reused to distinguish different types of filter bubbles”. This is the question I am trying to answer in this article.


1st type of filter bubble: the one created by audience measurement

According to D. Cardon “views” is a type of data used to produce calculations of audience. At the origin of these measurements are medias, which had to agreed on some sort of third-party authority to ensure the neutrality and fairness of of market share calculations.
In this model (which is nowadays applied to online measurements too) the computation takes place “next” to the data (to reuse the words of D. Cardon). A measure of the popularity is produced based on the number of users’ views. Calculation is independent of the content and the content itself is only dependent on the producer himself. In the case of media, this producer is the journalist (and the editorial team). These persons are the gatekeepers and they follow certain rules.
As far as the end user is concerned, he has the freedom to change his behavior if the content producer doesn’t meet his expectations. He can freely access other contents that may challenge his own views in an attempt to avoid polarization. Academic research has been conducted on IT tools designed to expose the online user to contrary opinions. In this paper (An et al 2012) for instance, US media were classified according to their political views and a plugin has been designed to propose challenging views to readers.
One can also question the bias of public media (which in theory should be the mirror of society and of its diversity). Although journalistic ethics (the supreme gatekeeper) should act as a barrier to journalist’s biases, nothing can prevent political “coulouring”. Remember for instance that RAI of the 1970s had one channel allocated to each political “family” (RAI 3 was for example communist). Finally the editor-in-chief also poses conscious choices that define an editorial model and model the perception readers, listeners or viewers eventually have.

In this model hyperlinks are used as a proxy for authority. A hyperlink to an external ressource implicitly confirms its quality which enables ranking. The main problem is the predominance of the consumption of search results appearing on the first page.
While it is understandable that many pages may not be relevant (because they contain errors and deserve to be demoted) one can wonder about the differences in terms of informational quality between the first 10 results and the next 10. Did you know that only 8.5% of users visit the second page of results when they search for something on google? And even less (1.1%) go on the third page.
What you may not know though is that Google’s ranking Algorithm (PageRank) embedes machine learning capabilities. When you click on a link and then go back to the results page, google will degrade the position of the link because in its logic it is considered less relevant than if you didn’t go back to the results page. One may therefore wonder whether a short article, easy to read but full of errors and biases, could be ranked above a long article, very rich in information and details but difficult to read. If this happens, this would be the beginning of a self-reinforcing loop which would promote “easy” content over challenging and rich one. This may be  how our societies surreptitiously slip into mediocrity as I wrote in an article (in French) that attracted me some strong comments.

3rd type of filter bubble: the one created by social geetkeepers (= influencers)

Feedback mechanisms are also integrated within social sites (retweets in Twitter, likes in Facebook, pinned images in Pinterest, checks in Foursquare, …) and we naturally have to wonder how much of an influence some people may have when it comes to content propagation within a network. You won’t be surprized to hear that some people have become celebrities on social networks and are able to generate considerable buzz.
A study on the polarization of views on twitter shows the influence of people who are located at the nodes of the social network; in particular this study shows that idea propagation is not a neutral process and that those at the nodes filter out the content that doesn’t fit their views, hence exposing their followers to polarized ideas.
It is also interesting to look at how much weight those influencers have on the diffusion of ideas. If you consider retweets as a modern form word-of-mouth, research shows that only a minority of twitter users are able to “raise their voice” above the noise. Only a minority manages to be heard. Twitter is therefore far from being a democratic medium where every vote (“tweet”) counts. This is a meritocracy where fame and excess can grant you a special status: being heard.

4th type of filter bubble: one created by our own behavior

As Dominique Cardon puts it, algorithms “predict” the future by extending the slope of past behaviors. This is probably what inspired Eli Pariser when he invented the term “filter bubble”.
One example among many is Netflix which recommends the next movie to watch based on those which were watched in the past (for more information read this article on the RecSys conference 2016). “Programming” someone’s future based on his/her past observed behaviors (which are sometimes far from  virtuous and enriching) put us at risk. Less “qualitative” behaviors (like watching a Jean-Claude Vandamme rather than an Opera for instance) will start a self-reinforcing loop. We’ll be served more “junk” content and because we are only Human Beings with weaknesses, the algorithm will exploit them and we’ll slowly but surely slip into mediocrity. There is no gatekeeper. The algorithm maintains and reinforce our weaknesses. There is no external perspective to pull our head out of the water and challenge us. If the programmer behind the algorithmic recipe didn’t want it that way, it won’t be there. That’s the very definition of “algorithmic governance” : the algorithm applies pressure on us, puts us in a framework and eventually decides on our future behaviors.

This poses the question of Big Data and Ethics (see for instance how Meetup.com tackles this).

Conclusion on the different types of filters bubbles: the role of the gatekeeper

The framework proposed by Dominique Cardon shows that each type of data (views, links, likes, traces) leads to the creation of a specific type of filter bubble. What differentiates them is not so much the type of data used than the type of gatekeeper : is there a gatekeeper ? How much bargaining power does it have over the user ? Are its intentions good or bad ? Does it follow ethical rules or not ?

These are the questions that must be asked to assess the dangerousness of an algorithmic model. Let us review each of the four previous models in the light of gatekeeping.

In the first model ( “on the side”) “views” are used by an independent authority, acknowledges by all players, to rank the different players (for instance medias). The market share is the indicator that drives actions. The more market share you have, the higher your capacity to address larger crowds and shape their opinion.

Besides the gatekeeper contoling the audience, there are many other internal gatekeepers (producers of content for instance) that may follow their own rules. Journalists for instance follow a code of conduct to avoid biases.

Eventually the end-user has the freedom to switch. Monopolies are no longer the rules and if he wants to the consumer can stop consuming from one player.

To sum up, in the first model, gatekeeping is implemented at different level and the gatekeeping rules are known (how the audience is measured, which code of conduct journalists must follow)

In the second model ( “above”) hyperlinks are used as a proxy for authority. The monopolistic situation of Google has lead to a drastic reduction of the number of gatekeepers. Google is the authority, defines the rules and doesn’t make them public. The power the user had in the first model vanishes here. Once you use Google as a search engine you accept to conform to their model (a blackbox) and accept to lose part of your freedom to chose other metrics of authority. Only the experienced user can regain some gatekeeping power by using a different browser and search engine (such as TOR and qwant for instance) which will expose you to different search results.

To sum up, in the second model power is concentrated in the hands of a single gatekeeper which therefore impose its views, can change the rules of the game (i.e. the algorithmic recipe) without having to refer to anyone. Webmasters’ power is too divided to be of any help.

In the third model ( “in”) the gatekeepers are the nodes of the social network. They are more numerous than in the second model. Still, an incredible power is concentrated in a few hands. In the social media ecosystem those gatekeeprs are at least identified and their power can be used to spread ideas and start virality (see for instance the famous youtubeurs who sell their virtual celebrity to brands which are trying to reach a new audience).
In the fourth model ( “below”), the user becomes his/her own gatekeeper. His/her behavior, with all its flaws and potentially bad habits, becomes the foundations of the future. The algorithm is THE authority: it has the power to decide, to propose and to strengthen behaviors that may even be harmful for the user. In another article I questioned the danger of addiction that the Netflix recommendation algorithms created. Other potential algorithmic dangers include the instructions pushed to Nespresso telemarketers to sell more coffee to someone who may already consume too much; or bank employees instructed by a machine to sell you products that you may not really need.

Picture : shutterstock

Posted in big data, Innovation, Marketing.

Post your opinion

Your email address will not be published. Required fields are marked *