Google, Lawsuits, and the Importance of Good Documentation

8 min read

This week, the Mississippi Attorney General sued Google, claiming that Google is mining student data. In this post, I'll share some general, personal thoughts, and some recommendations for Google.

To start, it's worth watching a statement from the press conference where the suit was announced - this video clip was shared by Anna Wolfe, a journalist who covered the event.

At 1:46 in the video, the AG describes the "tests" that were run. To be blunt, these tests don't sound like actual tests - it sounds more like browsing and looking at the screen. Unless the student account they were using was relatively new, had never done any searches on the topic being "tested," had never browsed while logged in to any non-Google site that had ad tracking, and all testing browsers had their cache, cookies, and browsing history cleared, there are a range of benign options that could explain behavior that looks like targeted ads. And that doesn't even take into account the difference between targeted ads based on past behavior, and content-based ads delivered because a page describes a specific subject.

Without additional detail from the Mississippi AG on how they tested for tracking, the current claims of tracking are less than persuasive.

G Suite Terms, and (a Lack of) Clarity

An area where Google can improve is highlighted in the suit: Google's terms, and the way Google describes how educational data are handled, are not easily accessible or comprehensible (all the necessary disclaimers apply: I am not a lawyer, this is not legal advice, etc, etc). This commentary is limited to transparency and clarity. With that said, Google could blunt a lot of the claims and criticisms they receive with better documentation. The people who are doing this work at Google are smart and talented - they should be allowed to describe the details of their work more effectively.

Google has built a "Trust" page for G Suite, formerly known as Google Apps for Education. The opening paragraphs of text on this page highlight the confusing complexity of Google's terms.

Opening text from Trust page

In this opening text, Google links to five different policies that govern use of Google products in education:

However, this list of five different legal documents leaves out five additional documents that potentially govern use of G Suite in Education:

Of these five additional documents, two (the Data Processing Amendment and the Model Contract Clauses) are optional. However, these ten documents are not listed together in a single, coherent list anywhere on the Google site that I have found. The trust page also links to this list of Google services that are not included in G Suite/Google Apps for Education, but that can be enabled within G Suite. The list includes over 40 individual services, which are all covered by different sets of terms.

Moving down the "Trust" page, we see several different words or phrases used to refer to the Education Terms: "contracts," "G Suite Agreement," and "agreements." These all link to the same document, but the different names for the same document make it more difficult to follow than it needs to be.

Some simple things Google could do on the "Trust" page:

  • list out all applicable terms and policies, with a simple description of what is covered;
  • list out the order of precedence among the different documents that govern G Suite use. If there is a contradiction between different any of these different documents, identify what document is authoritative. As just one example, the Data Processing Agreement and the G Suite Agreement define key terms like "affiliate" in slightly different ways;
  • highlight what documents are optional;
  • create a simple template for districts (or state departments of ed, or universities) to document the agreements governing a particular G Suite/Google Apps implementation;
  • standardize language used when referring to different policies;
  • define the differences between the Education-specific contracts and the Consumer contracts;
  • in each of their legal terms, create IDs that allow for linking directly to a section of a document.

While the above steps would be an improvement, creating standalone, education-specific terms that were fully independent of the consumer terms would add additional clarity. From a product development place, this legal review would force an internal review to ensure that legal terms and technical implementation were in sync. To be clear, this is an enormous undertaking, but if Google did this, it would add some much-needed clarity. Practically speaking, Google could use this step to generate some solid PR as well. The PR messaging on this practically writes itself: "Google has always prided itself on being a leader in security, data privacy, and transparency. As our products evolve and improve, we are always making sure that our agreemets evolve and improve as well."

G Suite and Advertising

Google has stated on multiple occasions that "There are no ads in the suite of G Suite core services." Here, it's worth noting that "core services" for education only includes Gmail, Google Calendar, Google Talk, Google Hangouts, Google Drive, Google Docs, Google Sheets, Google Slides, Google Forms, Google Sites, Google Contacts, and Google Vault. Other services - like Maps, Blogger, YouTube, History, and Custom Search - are not part of the core services, and are not covered under educational terms.

Ads text from Trust page

There are differences, however, between showing ads, targeting ads, and collecting data for use in profiles. Ads can be shown on the basis of the content of the page (ie, read an article about canoeing, see an ad for canoes), and this requires no information about the person reading the page.

Targeted ads use information collected from or about a user to target them, or their general demographic, with specific ads. However, while targeted ads are annoying and intrusive, they provide visual evidence that personal data is being collected and organized into a profile.

On their "Trust" page, as pictured above, Google states that "Google does not use any user personal information (or any information associated with a Google Account) to target ads."

In Google's Educational Terms, they state that they collect the following information from users of their educational services:

  • device information, such as the hardware model, operating system version, unique device identifiers, and mobile network information including phone number of the user;
  • log information, including details of how a user used our service, device event information, and the user's Internet protocol (IP) address;
  • location information, as determined by various technologies including IP address, GPS, and other sensors;

While it is great that Google states that they don't use information collected from educational users, Google also needs to provide a technical explanation that demonstrates how they ensure that IP addresses collected from students, unique IDs that are tied to student devices, and student phone numbers are explicitly excluded from advertising activity. Also, Google should clearly define what they mean when they say "advertising purposes", as this phrase is vague enough to take on many different meanings, often showing more about the opinions of the reader than the practice of Google.

This technical explanation should also include how the prohibitions against advertising based on data collected in Google Apps can square with this definition of advertising pulled from the optional Data Processing Agreement:

"'Advertising' means online advertisements displayed by Google to End Users, excluding any advertisements Customer expressly chooses to have Google or any Google Affiliate display in connection with the Services under a separate agreement (for example, Google AdSense advertisements implemented by Customer on a website created by Customer using the "Google Sites" functionality within the Services)."

There are many ways that all of these statements can be true simultaneously, but without a technically sound explanation of how this is accomplished, Google is essentially asking people to trust them with no demonstration of how this is possible.

Conclusion

Google has been working in the educational space for years, and they have put a lot of thought into their products. However, real questions still exist about how these products work, and about how data collected from kids in these products is handled. Google has created copious documentation, but - ironically - that is part of the problem, as the sheer volume of what they have created contains contradictions and repetitions with slight degrees of variance that impede understanding. Based on seeing both Google's terms evolve over the years and from seeing terms in multiple other products, these issues actually feel pretty normal. This doesn't mean that they don't need to be addressed, but I don't see malice in any of these shortcomings.

However, the concern is real, for Google and other EdTech companies: if your product supports learning today, it shouldn't support redlining and profiling tomorrow.