At the end of last week I was at the Linked Open Data in Libraries Archives and Museums Summit in San Francisco. It also happened to be when the web site Schema.org was launched. This created such a stir that one of the breakout sessions was devoted to discussing it. What is it about? Let the web site tell you:
This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.
Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.
A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.
This strikes me as a game changer in the linked data world. When the three biggest search engines tell you to do something in a particular way, who is going to do it differently? Some of the people at the LOD-LAM Summit seemed to agree, others disagreed, while still others weren’t sure what the effect would be, if any. But let’s look at history.
Structured Generalized Markup Language (SGML) was the first major foray into developing a machine-processable way to encode text documents. It was incredibly complicated and difficult to use. The take-up of it was limited to scholars desperate for a way to add semantics to text and the geekiest of geeks. Then when the Web came along some people realized it might be useful to develop a profile of SGML that would simplify structured markup and be useful in a Web context. Thus was born XML, which now serves as the plumbing for much of what happens on the Web. So where is SGML? Nowhere. No one even remembers that XML began life as an SGML profile.
So maybe that’s why I’m skeptical when Mike Bergman claims that this development will actually boost RDF instead of kill it. He sees RDF’s role “as a canonical means for expressing data of any form, and not necessarily as a data exchange format.” But there’s the rub. Who needs anything but a data exchange format? Keep your data internally however you want (and as we do all the time). It’s only when you want to exchange it that you have to adhere to a standard. And increasingly that will be defined by Schema.org, and not the RDF cognoscenti. Of that you can be certain. There is no precedent I can remember of a time when people flocked to a more complicated way to achieve their ends when no additional benefit accrued.















I think that “The take-up of it was limited to scholars desperate for a way to add semantics to text and the geekiest of geeks.” might apply to RDF, but SGML was also taken up by legal and technical publishers and the military who spent big $$$ where there were VERY clear business cases for it. I’m not sure that’s true of RDF, which will make it even easier to kill off with something like Schema.org.
Schema.org’s model is basically RDF with a few tweaks here and there, so I can’t see how you simultaneously believe everyone will adopt, and that RDF is doomed by being over-complex.
The major weakness of Microdata is that it does not define its context, and each page with Microdata becomes just another record (I’d mention MARC here, but I won’t). Fine for search engines, but rather uninteresting for any kind of persistent data format.
RDF defines its context (its subject), relations and content, that’s enough.
Exchange formats? I think I need my data in something that is also reliably queryable, interpretable and usable. Oddly, RDF — because it is a relatively stable and widely used technology — fits the bill. Incidentally, it can also be used as an exchange format because it defines its context — microdata doesn’t, and therefore can’t.
Roy,
Now that RDF libraries can consume microdata, do you feel the same way?
https://github.com/edsu/rdflib-microdata
Microdata as a technique for embedding semantically meaningful information is not inconsistent with the RDF data model nor with the linked data pattern, as these libraries demonstrate.
What this says to me is that the RDF folks should use microdata as an opportunity to underscore that RDF is not a format but a model.
Mike: Do I still feel that all things being equal (or close enough), that simpler wins? Of course I do. Will the vast majority of web folk care about RDF? I’m not convinced. Not when they can ignore it and none of the search engines (read traffic, dollars, notice) will care.
Now does this mean that others like yourself should abandon RDF? Not if it does useful work for you, of course not. But just don’t expect to be leading a big parade.
I agree with the two Mikes that schema.org is quite good news for the RDF community. Here’s our contribution to that effort: http://schema.rdfs.org/
Regarding your historic perspective, it’s worth pointing out XML was developed by SGML veterans, and wouldn’t have been possible without the 30 years of structured markup experience collected by the GML and SGML community.
It’s also worth noting that XML is disappearing from the web, being displaced by JSON and HTML5:
http://blog.programmableweb.com/2011/05/25/1-in-5-apis-say-bye-xml/