Facebook data privacy scandal: A cheat sheet
Read about the saga of Facebook's failures in ensuring privacy for user data, including how it relates to Cambridge Analytica, the GDPR, the Brexit campaign, and the 2016 US presidential election.
A decade of apparent indifference for data privacy at Facebook has culminated in revelations that organizations harvested user data for targeted advertising, particularly political advertising, to apparent success. While the most well-known offender is Cambridge Analytica—the political consulting and strategic communication firm behind the pro-Brexit Leave.EU campaign, as well as Donald Trump's 2016 presidential campaign—other companies have likely used similar tactics to collect personal data of Facebook users.
TechRepublic's cheat sheet about the Facebook data privacy scandal covers the ongoing controversy surrounding the illicit use of profile information. This article will be updated as more information about this developing story comes to the forefront.
What is the Facebook data privacy scandal?
The Facebook data privacy scandal centers around the collection of personally identifiable information of "up to 87 million people" by the political consulting and strategic communication firm Cambridge Analytica. That company—and others—were able to gain access to personal data of Facebook users due to the confluence of a variety of factors, broadly including inadequate safeguards against companies engaging in data harvesting, little to no oversight of developers by Facebook, developer abuse of the Facebook API, and users agreeing to overly broad terms and conditions.
In the case of Cambridge Analytica, the company was able to harvest personally identifiable information through a personality quiz app called thisisyourdigitiallife, based on the OCEAN personality model. Information gathered via this app is useful in building a "psychographic" profile of users (the OCEAN acronym stands for openness, conscientiousness, extraversion, agreeableness, and neuroticism). Adding the app to your Facebook account to take the quiz gives the creator of the app access to profile information and user history for the user taking the quiz, as well as all of the friends that user has on Facebook. This data includes all of the items that users and their friends have liked on Facebook.
Researchers associated with Cambridge Analytica claimed in a paper that it "can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender," with a model developed by the researchers that uses a combination of dimensionality reduction and logistic/linear regression to infer this information about users.
The model—according to the researchers—is effective due to the relationship of likes to a given attribute. However, most likes are not explicitly indicative of their attributes. The researchers note that "less than 5% of users labeled as gay were connected with explicitly gay groups," but that liking "Juicy Couture" and "Adam Lambert" are likes indicative of gay men, while "WWE" and "Being Confused After Waking Up From Naps" are likes indicative of straight men. Other such connections are peculiarly lateral, with "curly fries" being an indicator of high IQ, "sour candy" being an indicator of not smoking, and "Gene Wilder" being an indicator the user's parents had not seperated by age 21.
What is the timeline of the Facebook data privacy scandal?
Facebook has more than a decade-long track record of incidents highlighting inadequate and insufficient measures to protect data privacy. While the severity of these individual cases varies, the sequence of repeated failures paints a larger picture of systemic problems.
In 2005, researchers at MIT created a script that downloaded publicly posted information of over 70,000 users from four schools. (Facebook only began to allow search engines to crawl profiles in September 2007.)
In 2007, activities that users engaged in on other websites was automatically added to Facebook user profiles as part of Beacon, one of Facebook's first attempts to monetize user profiles. As an example, Beacon indicated on the Facebook News Feed the titles of videos that users rented from Blockbuster Video, which was a violation of the Video Privacy Protection Act. A class action suit was filed, for which Facebook paid $9.5 million to a fund for privacy and security as part of a settlement agreement.
In 2011, following an FTC investigation, the company entered into a consent decree, promising to address concerns about how user data was tracked and shared. That investigation was prompted by an incident in December 2009 in which information thought private by users was being shared publicly, according to contemporaneous reporting by The New York Times.
In 2013, Facebook disclosed details of a bug that exposed the personal details of six million accounts over approximately a year. When users downloaded their own Facebook history, that user would obtain in the same action not just their own address book, but also the email addresses and phone numbers of their friends that other people had stored in their address books. The data that Facebook exposed had not been given to Facebook by users to begin with—it had been vacuumed from the contact lists of other Facebook users who happen to know that person. This phenomenon has since been described as "shadow profiles."
The Cambridge Analytica portion of the data privacy scandal starts in February 2014. A spate of reviews on theTurkopticon website—a third-party review website for users of Amazon's Mechanical Turk—detail a task requested by Aleksandr Kogan asking users to complete a survey in exchange for money. The survey required users to add the thisisyourdigitiallife app to their Facebook account, which is in violation of Mechanical Turk's terms of service. One review quotes the request as requiring users to "provide our app access to your Facebook so we can download some of your data—some demographic data, your likes, your friends list, whether your friends know one another, and some of your private messages."
In December 2015, Facebook learned for the first time that the data set Kogan generated with the app was shared with Cambridge Analytica. Facebook founder and CEO Mark Zuckerberg claims "we immediately banned Kogan's app from our platform, and demanded that Kogan and Cambridge Analytica formally certify that they had deleted all improperly acquired data. They provided these certifications."
According to Cambridge Analytica, the company took legal action in August 2016 against GSR (Kogan) for licensing "illegally acquired data" to the company, with a settlement reached that November.
On March 17, 2018, an exposé was published by the The Guardian and The New York Times, initially reporting that 50 million Facebook profiles were harvested by Cambridge Analytica; the figure was later revised to "up to 87 million" profiles. The exposé relies on information provided by Christopher Wylie, a former employee of SCL Elections and Global Science Research, the creator of the thisisyourdigitiallife app. Wylie claimed that the data from that app was sold to Cambridge Analytica, which used the data to develop "psychographic" profiles of users, and target users with pro-Trump advertising, a claim that Cambridge Analytica denied.
On March 16, 2018, Facebook threatened to sue The Guardian over publication of the story, according to a tweet by Guardian reporter Carole Cadwalladr. Campbell Brown, a former CNN journalist who now works as head of news partnerships at Facebook, said it was "not our wisest move," adding "If it were me I would have probably not threatened to sue The Guardian." Similarly, Cambridge Analytica threatened to sue The Guardian for defamation.
On March 20, 2018, the FTC opened an investigation to determine if Facebook has violated the terms of the settlement from the 2011 investigation.
In April 2018, reports indicated that Facebook granted Zuckerberg and other high ranking executives powers over controlling personal information on the platform that are not available to normal users. Messages from Zuckerberg sent to other users were remotely deleted from users' inboxes, which the company claimed was part of a corporate security measure following the 2014 Sony Pictures hack. Facebook subsequently announced plans to make available the "unsend" capability "to all users in several months," and that Zuckerberg will be unable to unsend messages until such time that feature rolls out.
On April 4, 2018, The Washington Post reported that Facebook announced "malicious actors" abused the search function to gather public profile information of "most of its 2 billion users worldwide."
In a CBS News/YouGov poll published on April 10, 2018, 61% of Americans said Congress should do more to regulate social media and tech companies. This sentiment was echoed in a CBS News interview with Box CEO Aaron Levie and YML CEO Ashish Toshniwal who called on Congress to regulate Facebook. According to Levie, "There are so many examples where we don't have modern ways of either regulating, controlling, or putting the right protections in place in the internet age. And this is a fundamental issue that, that we're gonna have to grapple with as an industry for the next decade."
On May 2, 2018, SCL Group, which owns Cambridge Analytica, was dissolved. In a press release, the company indicated that "the siege of media coverage has driven away virtually all of the Company's customers and suppliers."
On May 15, 2018, The New York Times reported that Cambridge Analytica is being investigated by the FBI and the Justice Department. A source indicated to CBS News that prosecutors are focusing on potential financial crimes.
On May 16, 2018, Christopher Wylie testified before the Senate Judiciary Committee. Among other things, Wylie notes that Cambridge Analytica, under the direction of Steve Bannon, sought to "exploit certain vulnerabilities in certain segments to send them information that will remove them from the public forum, and feed them conspiracies and they'll never see mainstream media." Wylie noted that the company targeted people with "characteristics that would lead them to vote for the Democratic party, particularly African American voters."
What are the key companies involved in the Facebook data privacy scandal?
In addition to Facebook, these are the companies connected to this data privacy story.
SCL Group (formerly Strategic Communication Laboratories) is at the center of the privacy scandal, though has operated primarily through subsidiaries. Nominally, SCL was a behavioral research/strategic communication company based in the UK. The company was dissolved on May 1, 2018.
Cambridge Analytica and SCL USA are offshoots of SCL Group, primarily operating in the US. Registration documentation indicates the pair formally came into existence in 2013. As with SCL Group, the pair were dissolved on May 1, 2018.
Global Science Research was a market research firm based in the UK from 2014 to 2017. It was the originator of the thisisyourdigitiallife app. The personal data derived from the app (if not the app itself) was sold to Cambridge Analytica for use in campaign messaging.
Emerdata is the functional successor to SCL and Cambridge Analytica. It was founded in August 2017, with registration documents listing several people associated with SCL and Cambridge Analytica, as well as the same address as that of SCL Group's London headquarters.
AggregateIQ is a Canadian consulting and technology company founded in 2013. The company produced Ripon, the software platform for Cambridge Analytica's political campaign work, which leaked publicly after being discovered in an unprotected GitLab bucket.
Cubeyou is a US-based data analytics firm that also operated surveys on Facebook, and worked with Cambridge University from 2013 to 2015. It was suspended from Facebook in April 2018 following a CNBC report.
The Internet Research Agency is a St. Petersburg-based organization with ties to Russian intelligence services. The organization engages in politically-charged manipulation across English-language social media, including Facebook.
Who are the key people involved in the Facebook data privacy scandal?
Nigel Oakes is the founder of SCL Group, the parent company of Cambridge Analytica. A report from Buzzfeed News unearthed a quote from 1992 in which Oakes stated, "We use the same techniques as Aristotle and Hitler. ... We appeal to people on an emotional level to get them to agree on a functional level."
Alexander Nix was the CEO of Cambridge Analytica, and a director of SCL Group. He was suspended following reportsdetailing a video in which Nix claimed the company "offered bribes to smear opponents as corrupt," and that it "campaigned secretly in elections... through front companies or using subcontractors."
Robert Mercer is a conservative activist, computer scientist, and a co-founder of Cambridge Analytica. A New York Times report indicates that Mercer invested $15 million in the company. His daughters Jennifer Mercer and Rebekah Anne Mercer serve as directors of Emerdata.
Christopher Wylie is the former director of research at Cambridge Analytica. He provided information to The Guardian for its exposé of the Facebook data privacy scandal. He has since testified before committees in the US and UK about Cambridge Analytica's involvement in this scandal.
Steve Bannon is a co-founder of Cambridge Analytica, as well as a founding member and former executive chairman of Breitbart News, an alt-right news outlet. Breitbart News has reportedly received funding from the Mercer family as far back as 2010. Bannon left Breitbart in January 2018. According to Christopher Wylie, Bannon is responsible for testing phrases such as "drain the swamp" at Cambridge Analytica, which were used extensively on Breitbart.
Aleksandr Kogan is a Senior Research Associate at Cambridge University and co-founder of Global Science Research, which created the data harvesting thisisyourdigitiallife app. He worked as a researcher and consultant for Facebook in 2013 and 2015. Kogan also received Russian government grants and is an associate professor at St. Petersburg State University, though he claims this is an honorary role.
Joseph Chancellor is a co-founder of Global Science Research, which created the data harvesting thisisyourdigitiallife app. Around November 2015, he was hired by Facebook as a "quantitative social psychologist."
Michal Kosinski, David Stillwell, and Thore Graepel are the researchers who developed the model to "psychometrically" analyze users based on their Facebook likes. At the time this model was published, Kosinski and Stillwell were affiliated with Cambridge University, while Graepel was affiliated with the Cambridge-based Microsoft Research.
Mark Zuckerberg is the founder and CEO of Facebook. He founded the website in 2004 from his dorm room at Harvard.
How have Facebook and Mark Zuckerberg responded to the data privacy scandal?
Each time Facebook finds itself embroiled in a privacy scandal, the general playbook seems to be the same: Mark Zuckerberg delivers an apology, with oft-recycled lines, such as "this was a big mistake," or "I know we can do better." Despite repeated controversies regarding Facebook's handling of personal data, it has continued to gain new users. This is by design—founding president Sean Parker indicated at an Axios conference in November 2017 that the first step of building Facebook features was "How do we consume as much of your time and conscious attention as possible?" Parker also likened the design of Facebook to "exploiting a vulnerability in human psychology."
On March 16, 2018, Facebook announced that SCL and Cambridge Analytica had been banned from the platform. The announcement indicated, correctly, that "Kogan gained access to this information in a legitimate way and through the proper channels that governed all developers on Facebook at that time," and passing the information to a third party was against the platform policies.
The following day, the announcement was amended to state:
The claim that this is a data breach is completely false. Aleksandr Kogan requested and gained access to information from users who chose to sign up to his app, and everyone involved gave their consent. People knowingly provided their information, no systems were infiltrated, and no passwords or sensitive pieces of information were stolen or hacked.
On March 21, 2018, Mark Zuckerberg posted his first public statement about the issue, stating in part that:
"We have a responsibility to protect your data, and if we can't then we don't deserve to serve you. I've been working to understand exactly what happened and how to make sure this doesn't happen again."
On March 26, 2018, Facebook placed full-page ads stating: "This was a breach of trust, and I'm sorry we didn't do more at the time. We're now taking steps to ensure this doesn't happen again," in The New York Times, The Washington Post, and The Wall Street Journal, as well as The Observer, The Sunday Times, Mail on Sunday, Sunday Mirror, Sunday Express, and Sunday Telegraph in the UK.
In a blog post on April 4, 2018, Facebook announced a series of changes to data handling practices and API access capabilities. Foremost among these include limiting the Events API, which is no longer able to access the guest list or wall posts. Additionally, Facebook removed the ability to search for users by phone number or email address, and made changes to the account recovery process to fight scraping.
One April 10, 2018 and April 11, 2018, Mark Zuckerberg testified before Congress. Details about his testimony are in the next section of this article.
On April 10, 2018, Facebook announced the launch of its data abuse bug bounty program. While Facebook has an existing security bug bounty program, this is targeted specifically to prevent malicious users from engaging in data harvesting. There is no limit to how much Facebook could potentially pay in a bounty, though to date the highest amount the company has paid is $40,000 for a security bug.
On May 14, 2018, "around 200" apps were banned from Facebook as part of an investigation into if companies have abused APIs to harvest personal information. The company declined to provide a list of offending apps.
On May 22, 2018, Mark Zuckerberg testified, briefly, before the European Parliament about the data privacy scandal and Cambridge Analytica. The format of the testimony has been the subject of derision, as all of the questions were posed to Zuckerberg before he answered. Guy Verhofstadt, an EU Parliament member representing Belgium, said, "I asked you six 'yes' and 'no' questions, and I got not a single answer."
What did Mark Zuckerberg say in his testimony to Congress?
In his Senate testimony on April 10, 2018, Zuckerberg reiterated his apology, stating that "We didn't take a broad enough view of our responsibility, and that was a big mistake. And it was my mistake. And I'm sorry. I started Facebook, I run it, and I'm responsible for what happens here," adding in a response to Sen. John Thune that "we try not to make the same mistake multiple times.. in general, a lot of the mistakes are around how people connect to each other, just because of the nature of the service."
Sen. Amy Klobuchar asked if Facebook had determined whether Cambridge Analytica and the Internet Research Agency were targeting the same users. Zuckerberg replied, "We're investigating that now. We believe that it is entirely possible that there will be a connection there." According to NBC News, this was the first suggestion there is a link between the activities of Cambridge Analytica and the Russian disinformation campaign.
What is the 2016 US presidential election connection to the Facebook data privacy scandal?
In December 2015, The Guardian broke the story of Cambridge Analytica being contracted by Ted Cruz's campaign for the Republican Presidential Primary. Despite Cambridge Analytica CEO Alexander Nix's claim in an interview with TechRepublic that the company is "fundamentally politically agnostic and an apolitical organization," the primary financier of the Cruz campaign is Cambridge Analytica co-founder Robert Mercer—he donated $11 million to a pro-Cruz Super PAC. Following Cruz's withdrawal from the campaign in May 2016, the Mercer family began supporting Donald Trump.
In January 2016, Facebook COO Sheryl Sandberg told investors that the election was "a big deal in terms of ad spend," and that "Using Facebook and Instagram ads you can target by congressional district, you can target by interest, you can target by demographics or any combination of those."
In October 2017, Facebook announced changes to its advertising platform, requiring identity and location verification and prior authorization in order to run electoral advertising. In the wake of the fallout from the data privacy scandal, further restrictions were added in April 2018, making "issue ads" regarding topics of current interest similarly restricted.
In secretly recorded conversations by an undercover team from Channel 4 News, Cambridge Analytica's Nix claimed the firm was behind the "defeat crooked Hillary" advertising campaign, adding, "We just put information into the bloodstream of the internet and then watch it grow, give it a little push every now and again over time to watch it take shape," and that "this stuff infiltrates the online community, but with no branding, so it's unattributable, untrackable." The same exposé quotes Chief Data Officer Alex Tayler as saying, "When you think about the fact that Donald Trump lost the popular vote by 3 million votes but won the electoral college vote, that's down to the data and the research."
What is the Brexit tie-in to the Facebook data privacy scandal?
AggregateIQ was retained by Nigel Farage's Vote Leave organization in the Brexit campaign, and both The Guardianand BBC claim that the Canadian company is connected to Cambridge Analytica and its parent organization SCL Group. UpGuard, the organization that found a public GitLab instance with code from AggregateIQ, has extensively detailed its connection to Cambridge Analytica and its involvement in Brexit campaigning.
Additionally, The Guardian quotes Wylie as saying the company "was set up as a Canadian entity for people who wanted to work on SCL projects who didn't want to move to London."
How is Facebook affected by the GDPR?
Like any organization providing services to users in European Union countries, Facebook is bound by the EU General Data Protection Regulation (GDPR). Due to the scrutiny Facebook is already facing regarding the Cambridge Analytica scandal, as well as the general nature of the social media giant's product being personal information, its strategy for GDPR compliance is similarly receiving a great deal of focus from users and other companies looking for a model of compliance.
While in theory the GDPR is only applicable to people residing in the EU, Facebook will require users to review their data privacy settings. According to a ZDNet article, Facebook users will be asked if they want to see advertising based on partner information—in practice, websites that feature Facebook's "Like" buttons. Users globally will be asked if they wish to continue sharing political, religious, and relationship information, while users in Europe and Canada will be given the option of switching automatic facial recognition on again.
Facebook members outside the US and Canada have heretofore been governed by the company's terms of service in Ireland. This has reportedly been changed prior to the start of GDPR enforcement, as this would seemingly make Facebook liable for damages for users internationally, due to Ireland's status as an EU member.
What are Facebook "shadow profiles?"
"Shadow profiles" are stores of information that Facebook has obtained about other people—who are not necessarily Facebook users. The existence of "shadow profiles" was discovered as a result of a bug in 2013. When a user downloaded their Facebook history, that user would obtain not just his or her address book, but also the email addresses and phone numbers of their friends that other people had stored in their address books.
Facebook described the issue in an email to the affected users. This is an excerpt of the email, according to security site Packet Storm:
When people upload their contact lists or address books to Facebook, we try to match that data with the contact information of other people on Facebook in order to generate friend recommendations. Because of the bug, the email addresses and phone numbers used to make friend recommendations and reduce the number of invitations we send were inadvertently stored in their account on Facebook, along with their uploaded contacts. As a result, if a person went to download an archive of their Facebook account through our Download Your Information (DYI) tool, which included their uploaded contacts, they may have been provided with additional email addresses or telephone numbers.
Because of the way that Facebook synthesizes data in order to attribute collected data to existing profiles, data of people who do not have Facebook accounts congeals into dossiers, which are popularly called a "shadow profile." It is unclear what other sources of input are added to said "shadow profiles," a term that Facebook does not use, according to Zuckerberg in his Senate testimony.
How can I change my Facebook privacy settings?
According to Facebook, in 2014 the company removed the ability for apps that friends use to collect information about an individual user. If you wish to disable third-party use of Facebook altogether—including Login With Facebook and apps that rely on Facebook profiles such as Tinder—this can be done in the Settings menu under Apps And Websites. The Apps, Websites And Games field has an Edit button—click that, and then click Turn Off.
Facebook has been proactively notifying users who had their data collected by Cambridge Analytica, though users can manually check to see if their data was shared by going to this Facebook Help page.
Facebook is also developing a Clear History button, which the company indicates is "their database record of you." TechRepublic's Dan Patterson noted on CBSN that "there aren't a lot of specifics on what that clearing of the database will do, and of course, as soon as you log back in and start creating data again, you set a new cookie and you start the process again."
To gain a better understanding of how Facebook handles user data, including what options can and cannot be modified by end users, it may be helpful to review Facebook's Terms of Service, as well as its Data Policy and Cookies Policy.