Filter Bubbles and Privacy, and the Myth of the Privacy Setting

6 min read

When discussing information literacy, we often ignore the role of pervasive online tracking. In this post, we will lay out the connections between accessing accurate information, tracking, and privacy. We will use Twitter an as explicit example. However, while Twitter provides a convenient example, the general principles we lay out here are applicable across the web.

Major online platforms "personalize" the content we see on them. Everything from Amazon's shopping recommendations to Facebook's News Feed to our timelines on Twitter are controlled by algorithms. This "personalization" uses information that these companies have collected about us to present us with an experience that is designed to have us behave in a way that aligns with the company's interests. And we need to be clear on this: personalization is often sold as "showing people more relevant information" but that definition is incomplete. Personalization isn't done for the people using a product; it's done to further the needs of the company offering the product. To the extent that personalization shows people "more relevant information," this information furthers the goals of the company first, and the needs of users second.

Personalization requires that companies collect, store, and analyze information about us. Personalization also requires that we are compared against other people. This process begins with data collection about us -- what we read, what we click on, what we hover over, what we share, what we "like", sites we visit, our location, who we connect with, who we converse with, what we buy, what we search for, the devices we use, etc. This information is collected in many ways, but some of the more visible methods companies use to get this information is via cookies that are set by ad networks, or social share icons. Of course, every social network (Facebook, Instagram, Twitter, Pinterest, Musical.ly, etc) collects this information from you directly when you spend time on their sites.

The web, flipping us the bird

When you see social sharing icons, know that when a site flips you the bird, your browsing information is being widely shared with these companies and other ad brokers.

This core information collected by sites can be combined with information from other sources. Many companies explicitly claim this right in their terms of service. For example, Voxer's terms claim this right using this language:

Information We May Receive From Third Parties. We may collect information about you from other Product users, such as when a friend provides friend details or contact information, or indicates a relationship with you. If you authorize the activity, Facebook may share with us certain approved data, which may include your profile information, your image and your list of friends, their profile information and their images.

By combining information from other sources, companies can have information about us that includes our educational background, employment history, where we live, voting records, any criminal justice information from parking tickets to arrests to felonies, in addition to our browsing histories. With these datasets, companies can sort us into multiple demographics, which they can then use to compare us against other people pulled from other demographics.

In very general terms, this is how targeted advertising, content recommendation, shopping recommendation, and other forms of personalization all work. Collect a data set, mine it for patterns and the probablity that these patterns are significant and meaningful. Computers make math cheap, so this process can be repeated and refined as needed.

However, while the algorithms can churn nearly indefinitely, they need data and interaction to continue to have relevance. In this way, algorithms can be compared to the annoying office mate with pointless gossip and an incessant need to publicly overshare: they derive value from their audience.

And we are the audience.

Twitter's "Personalization and Data" settings provice a great example of how this works. As we state earlier, while Twitter provides this example, they are not unique. The settings shown in the screenshot below highlight some of the data that is collected, and how this information is used. The screenshot also highlights how, on social media, there is no such thing as a privacy setting. What they give us is a visibility setting -- while we have minimal control over what we might see, nothing is private from the company that offers the service.

Twitter's personalization settings

From looking at this page, we can see that Twitter can collect a broad range of information that has nothing to do with the core functionality of Twitter, and everything to do with creating profiles about us. For example, why would Twitter need to know the other apps on our devices to allow us to share 140 character text snippets?

Twitter is also clear that regardless of what we see here, they will personalize information for us. If we use Twitter, we only have the option to play by their rules (to the extent that they enforce them, of course):

Twitter always uses some information, like where you signed up and your current location, to help show you more relevant content.

What this explanation leaves out, of course, is for whom the content is most relevant: the person reading it, or Twitter. Remember: their platform, their business, their needs.

But when we look at the options on this page, we also need to realize that the data they collect in the name of personalization is where our filter bubbles begin. A best-case definition of "relevant content" is "information they think we are most interested in." However, a key goal of many corporate social sites is to make it more difficult to leave. In design, dark patterns are used to get people to act against their best interest. Creating feeds of "relevant content" -- or more accurately, suppressing information according to the dictates of an algorithm -- can be understood as a dark information pattern. "Relevant content" might be what is most likely to keep us on a site, but it probably won't have much overlap with information that challenges our bias, breaks our assumptions, or broadens our world.

The fact that our personal information is used to narrow the information we encounter only adds insult to injury.

We can counter this, but it takes work. Some easier steps include:

Using ad blockers and javascript blockers (uBlock Origin and Privacy Badger are highly recommended as ad blockers. For javascript blockers, try Scriptsafe for Chrome and NoScript for Firefox ).
Clear your browser cookies regularly.
When searching or doing other research, use Tor and/or a VPN.

These steps will help minimize the amount of data that companies can collect and use, but they don't eliminate the problem. The root of the problem lies in information assymetry: companies know more about us than we know about them, and this gap increases over time. However, privacy and information literacy are directly related issues. The more we safeguard our personal information, the more freedom we have from filter bubbles.