May 28, 2016

Librarians, IT Experts Respond to Adobe Spying Accusations

Adobe logoAdobe this week confirmed reports that it has been logging data on the reading activity of people who use the free Adobe Digital Editions service, and that the company has been transmitting those logs to its servers as unencrypted text files, raising privacy and security concerns. OverDrive, Baker & Taylor’s Axis 360 platform, and the 3M Cloud Library all use Adobe Digital Editions and Adobe digital rights management (DRM) to secure popular fiction and nonfiction ebook titles for downloadable lending.­­

Michael Bills, Baker & Taylor’s director of sales, digital products, acknowledged that many leading ebook platforms use Adobe DRM, and added that “privacy rights are of paramount concern to Baker & Taylor and its customers. Baker & Taylor is currently working with customers, and Adobe, on any questions or concerns.” OverDrive is also in contact with Adobe and monitoring the situation, according to David Burleigh, director of marketing and communication.

Several bloggers and journalists, including Nate Hoffelder, who broke the story at The-Digital-Reader.com, described Adobe’s activities as spying. However, to put the matter in perspective, many ebook conveniences that readers take for granted require servers to track and log basic information about a user’s reading activity.

“We need to be able to sync data across devices for the users. We also need to collect additional information on the [page] position of the book, the device, its ID, etc.” for syncing functions to work, explained Monique Sendze, associate director of information technology for Douglas County Libraries (DCL), CO, which pioneered the library-owned, library-managed ebook model by hosting ebooks on its own Adobe Content Server.

For example, in a common scenario, a patron might read an ebook on a tablet one evening and the next day continue reading that ebook on his or her smartphone. In order to sync content between those two devices, a remote server must have up-to-date records regarding which ebook is associated with a specific user ID, which device the patron most recently used for reading, and the location within the ebook where the reader stopped reading on that device the night before. Adobe Digital Editions, as well as any e-reader app with multi-device syncing capabilities turned on, will collect this type of information on any ebook that is opened within the app, including DRM-free ebooks.

Plain View

Regardless, Adobe’s transmission of these logs as unencrypted, plain text makes all of this information extremely vulnerable. As Ars Technica explained, this absence of security would allow “anyone who can monitor network traffic (such as the National Security Agency, Internet service providers and cable companies, or others sharing a public Wi-Fi network) to follow along over readers’ shoulders.”

“My biggest concern is that this is being transmitted all in clear text, and hopefully Adobe will address this quickly,” Sendze said, adding that she was planning to run tests to see what information DCL’s Adobe Content Server is sending to Adobe’s licensing servers when book orders are fulfilled.

“Sending this information in plain text undermines decades of efforts by libraries and bookstores to protect the privacy of their patrons and customers,” Electronic Frontier Foundation Corynne McSherry wrote in a blog post about the issue.

“Indeed, in 2011 EFF and a coalition of companies and public interest groups helped pass the Reader Privacy Act, which requires the government and civil litigants to demonstrate a compelling interest in obtaining reader records and show that the information contained in those records cannot be obtained by less intrusive means. But if readers are using Adobe’s software, it’s all too easy for folks to bypass those restrictions,” McSherry wrote.

On Tuesday, Adobe issued a statement to the public, describing Adobe Digital Editions as a tool for users “to view and manage ebooks and other digital publications across their preferred reading devices—whether they purchase or borrow them. All information collected from the user is collected solely for purposes such as license validation and to facilitate the implementation of different licensing models by publishers.”

The statement denied accusations that Adobe was collecting and transmitting data about every EPUB file on a user’s device, as initially reported by Hoffelder, who was then cited by numerous other outlets.

“This information is solely collected for the ebook currently being read by the user and not for any other ebook in the user’s library or read/available in any other reader,” Adobe wrote.

Data Collected

Eric Hellman, former OCLC executive and founder and developer of the ebook crowdfunding platform Unglue.it, acknowledged that some data logging is necessary for functions such as syncing, but told LJ he believes that those features “should be opt-in, even if [data logs are] encrypted.” He added, “Most ebook reading platforms leak privacy to some extent, mostly due to general ignorance about how to read a ‘privacy policy.'”

However, Hellman was dismissive of the notion that the logs were a necessary component of Adobe’s DRM system, stating in an email that “there is no need for Adobe to log this information to make the DRM work!”

Each time a user opens an EPUB file on a device with Adobe Digital Editions installed, the program logs the ebook’s title metadata, which is provided by publishers, along with the user ID, device ID, device IP address, certified app ID, distributor ID, and Adobe Content Server operator URL. It also logs the date that the ebook was purchased or downloaded, the duration for which the book has been read, and the percentage of the book that has been read.

Hellman pointed out that all of this information does not need to be collected and stored to make DRM work.

Adobe’s Tuesday mea culpa seemed to indicate that the company collects all of this information on all EPUB files opened with Adobe Digital Editions for the sake of expediency. This sweeping approach allows the company to check a file against every possible type of licensing arrangement that Adobe manages, from content that is region restricted, to titles that are limited to one device only, to ebooks that users or libraries pay for by percentage read. It doesn’t matter if the EPUB file is DRM-free, Creative Commons licensed, or public domain. Adobe will collect and store the information listed above. In an email to Ars Technica, an Adobe spokesperson said that the company was working on an update that would address the transmission of logs in plain text, but Adobe maintains that its collection of this data is covered under its user agreement. [UPDATE: On October 23, Adobe released a security patch that causes ADE 4 to transmit logs of this information in an encrypted format].

Privacy Concerns

For context, at the other end of the privacy-for-functionality tradeoff continuum, commercial ebook vendors such as Amazon.com and Barnes & Noble maintain detailed records about a user’s reading and ebook search histories to make personalized recommendations, and share with publishers aggregated data regarding average reading speeds, common points of disengagement by title, and popular passages that readers highlight, among other information.

Posting in response to Hoffelder’s original report, LJ infoDOCKET’s Gary Price noted that whenever a user checks out a library ebook on a Kindle device, Amazon gets access to their reading data as well (per the Kindle terms of use). He added that “the dedication and vigilance to user privacy that the public (both library users and non-users) appreciate from libraries is not the same in the digital world (for many reasons) and we need to do more.”

In an October 9 blog post, Hellman followed up on the case after analyzing the system further.

“It’s looking more like an incompetently-designed, half-finished synchronization system than a spy tool,” he wrote.

Librarian and coding contractor Andromeda Yelton, who analyzed the system with Hellman, notes in the comments below that it would be possible for Adobe to build a system that enables syncing, without compromising user privacy.

“It’s true that servers need to be involved if you want to be synchronizing reading data between devices. However, those servers do not ever have to know what you’re reading,” she writes. “The system can be designed so that the information is encrypted everywhere except on your device, and/or so that it’s impossible to go from the userID to the user’s actual name other personal information. (If you have a password manager that lets you access all your secure, impossible-to-remember passwords across multiple devices, you already have technology built like this.) In short, we do not have to give up reader privacy to get synchronization.”

On his blog Pattern Recognition, LibraryBox developer Jason Griffey describes a syncing scenario in which “you could architect the sync engine to key off of a locally-hashed UserID + BookID that never left the device, and only transmit the hash and the location information in a standardized format. This would give you anonymous page syncing between devices without having to even worry about encryption of the traffic, as long as you used an appropriate hash function.”

In Hellman’s post, he added that Adobe’s total failure to protect user information during transmission may be a violation of privacy laws.

“The ADE4 privacy policy is NOT a magic incantation that makes everything it does legal. For example, all 50 states have privacy laws that cover library records. When ADE4 is used for library ebooks, the fact that it broadcasts a user’s reading behavior makes it legally suspect. Even if the stream were encrypted, it’s not clear that it would be legal.”

McSherry at EFF, comparing the issue to the public furor over Sony’s rootkit software, expressed hope that some good may yet come of Adobe’s snafu.

“There may be a silver lining to all of this,” she wrote. “Several years ago, music fans were shocked and dismayed to discover that copy-protection software on music from Sony artists was actually allowing Sony to monitor the fans’ listening habits, sending information home to Sony, and creating a massive security vulnerability. Sound familiar? That discovery led to a public relations meltdown for Sony, not to mention numerous lawsuits. When the dust had cleared, Sony’s DRM cost it millions in fees and settlements, and, of course, did nothing to inhibit infringement. For Sony, and many others in the music industry, the price of DRM finally became too high, and it has since been largely abandoned.”

Share
View TDS Archive
On October 14, 2015 Library Journal, School Library Journal, and thousands of library professionals from around the world gathered for the 6th annual Digital Shift virtual conference to focus on the challenges and opportunities presented by the digital transition’s impact on libraries, their communities, and partners. Now available on-demand, this year’s program provides actionable answers to some of the biggest questions our profession faces for and from libraries of all types – school, academic, and public and features thought-provoking keynotes from John Palfrey, author of BiblioTech: Why Libraries Matter More Than Ever in the Age of Google, and Denise Jacobs, tech leader, author, and creativity evangelist.
Matt Enis About Matt Enis

Matt Enis (menis@mediasourceinc.com; @matthewenis on Twitter) is Associate Editor, Technology for Library Journal.

Comments

  1. Glad to see this getting detailed coverage here.

    I want to clarify a few technical things:

    1) It’s true that Adobe is not scanning your hard drive – the original reporting on that was incorrect. However, it’s false that the only information collected pertains to the book currently being read by the user. Nate Hoffelder and Galen Charlton have been looking into this further, and have both verified cases where ADE4 transmits data on books outside of the ADE library. This only happens with certain device/operating system combinations (which is why I didn’t see it when I ran tests), but it does happen. I don’t know the full technical details; you’d have to ask them.

    2) It’s true that servers need to be involved if you want to be synchronizing reading data between devices. *However*, those servers do not ever have to know what you’re reading. The system can be designed so that the information is encrypted everywhere except on your device, and/or so that it’s impossible to go from the userID to the user’s actual name other personal information. (If you have a password manager that lets you access all your secure, impossible-to-remember passwords across multiple devices, you already have technology built like this.)

    In short, we do not have to give up reader privacy to get synchronization.

    • Thank you Andromeda. I was not familiar with that. I’ll quote part of your comment within the story if you don’t mind.

    • In particular, what I found is that if certain physical ereader devices are connected to a computer via an USB cable while ADE is running, ADE can recognize the device as a source of ebooks. In the ADE user interface, a new “devices” tab shows up listing the attached ereader.

      In that case, ADE 4.0 can send metadata about the ebooks on the device in the clear, even if none of the ebooks on that device were imported into ADE’s “library”.

      I verified (https://gist.github.com/gmcharlt/50707d56ebcb3162e195) this with an old Sony Reader, and I assume that other physical devices on Adobe’s supported list (https://blogs.adobe.com/digitalpublishing/supported-devices) would behave in the same way.

      Note that I am emphasizing “physical” here because I haven’t yet seen a case where an ebook *app* on a device (such as a generic Android tablet) was recognized by ADE.

  2. From the article: “All ebook providers serving libraries use Adobe servers and DRM to allow the circulation and protection of ebooks in EPUB and PDF file formats,” said Michael Bills, Baker & Taylor’s director of sales, digital products.

    Ummm, sorry but this is NOT true.

    BiblioBoard does not use Adobe. Not only that, but we don’t store any personal information about library patrons.

    • Good point Andrew. I’ll paraphrase part of that quote. Michael was referring to OverDrive, Axis 360, 3M and others that use Adobe DRM.

  3. As Andromeda noted above, it is absolutely not technically necessary to collect personal information in order to accomplish page sync across devices. I detailed to different ways that could be accomplished on my blog post on the subject (http://jasongriffey.net/wp/2014/10/08/adobe-digital-editions-and-infoleaks/): one involving local-to-device encryption and the other involving unique hashes of the BookID and UserID. Collecting user data as a method for accomplishing page sync may be the easy way for vendors to provide that service, but it is assuredly not necessary nor preferred.

    • Jason, you are absolutely spot on. The ironic thing is that I don’t think any of the Adobe based eBook platforms actually do a good job of “sync” — “page sync” is one part of a good UX but what about sync of favorites, ratings, comments, notes, bookmarks etc.?

    • Thank you Jason. I pulled a quote from your blog post and added a link.

  4. One thing to emphasize regarding ADE 4.0 is that, as Eric Hellman implied with his “half-finished bit”, it does not appear to actually be doing page sync at all. In particular, while I’ve seen it transmit page data to Adobe, I haven’t seen it *receive* page data originating from another copy of ADE registered to the same Adobe ID.

Speak Your Mind

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.