At the end of last week I was at the Linked Open Data in Libraries Archives and Museums Summit in San Francisco. It also happened to be when the web site Schema.org was launched. This created such a stir that one of the breakout sessions was devoted to discussing it. What is it about? Let the web site tell you:
This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.
Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.
A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.
This strikes me as a game changer in the linked data world. When the three biggest search engines tell you to do something in a particular way, who is going to do it differently? Some of the people at the LOD-LAM Summit seemed to agree, others disagreed, while still others weren’t sure what the effect would be, if any. But let’s look at history.
Structured Generalized Markup Language (SGML) was the first major foray into developing a machine-processable way to encode text documents. It was incredibly complicated and difficult to use. The take-up of it was limited to scholars desperate for a way to add semantics to text and the geekiest of geeks. Then when the Web came along some people realized it might be useful to develop a profile of SGML that would simplify structured markup and be useful in a Web context. Thus was born XML, which now serves as the plumbing for much of what happens on the Web. So where is SGML? Nowhere. No one even remembers that XML began life as an SGML profile.
So maybe that’s why I’m skeptical when Mike Bergman claims that this development will actually boost RDF instead of kill it. He sees RDF’s role “as a canonical means for expressing data of any form, and not necessarily as a data exchange format.” But there’s the rub. Who needs anything but a data exchange format? Keep your data internally however you want (and as we do all the time). It’s only when you want to exchange it that you have to adhere to a standard. And increasingly that will be defined by Schema.org, and not the RDF cognoscenti. Of that you can be certain. There is no precedent I can remember of a time when people flocked to a more complicated way to achieve their ends when no additional benefit accrued.