Why Microdata, Not RDF, Will Power the Semantic Web

By Roy Tennant on February 28, 2012

Twelve years ago I basically called the Resource Description Framework (RDF) “dead on arrival”. That was perhaps too harsh of an assessment, but I had my reasons and since then I haven’t had a lot of motivation to regret those words. Clearly there is a great deal more data available in RDF-encoded form now than there was then.

But I’m still waiting for the killer app. Or really any app at all. Show me something that solves a problem or fulfills a need I have that requires RDF to function. Go ahead. I’ll wait.

Oh, you’ve got nothing? Well then, keep reading.

While RDF is complex, and designed to be implemented as a stand-alone depiction of metadata, it does have an implementation that is designed for embedding in web pages: RDFa. On the other hand, microdata is relatively simple and solely designed to be embedded in web pages. While the metadata cognoscenti are in the RDF camp, Google, Microsoft, and Yahoo! have thrown their lot in with microdata by launching the Schema.org effort. Were I a betting man, I wouldn’t be backing RDF at this point.

Here are the key reasons why I think microdata will end up powering the semantic web:

There is a clear incentive to use microdata. Unlike RDF, there is a clear incentive today to use microdata in your web pages. Search engines such as Google are using this data to make searching better.
It will increasingly come along for free. Content management systems that many of us use, such as Drupal, are moving to support microdata. This will increasingly mean that even those who don’t anything about microdata may still be exposing it on the web. This kind of frictionless implementation is what is needed to bring the kind of widespread use of semantics that would be required to create the “semantic web”.
The big guns are behind it. If you had to name the largest Internet companies it’s likely that Google, Yahoo! and Microsoft would be on the list. They have apparently ignored RDF and RDFa in favor of Schema.org.
There is no clear reason to use RDFa over microdata. What does RDFa buy you that microdata doesn’t? Heck if I know, but feel free to make me smarter with a comment below.
It’s simple. This is my favorite reason, actually. I’m a big fan of simple, especially when gets the job done just as well or nearly as well as something more complex.

I’m reminded of SGML vs. XML. For years SGML struggled to achieve widespread adoption. For every SGML afficionado there were a dozen others for whom it may have been useful who couldn’t or wouldn’t climb the learning curve. Then XML came along — a simpler and more tightly scoped standard that was the death knell for SGML. Sound familiar? It should.

Filed Under: Cataloging and Metadata, Digital Libraries, Information Technology, Roy Tennant: Digital Libraries, Standards

About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.

Comments

Rob Sanderson says:

February 28, 2012 at 6:58 pm

1. Extensibility. RDF is extensible to cope with different models, where as schema.org is not.
2. It’s not that complicated. If you can understand the simplest of relational models, you can just as easily understand RDF.
3. It exists. Wikipedia is in the process of incorporating dbpedia back into itself. Google Freebase holds billions of triples. The BBC uses it every day, as does the Library of Congress. Just go and look at the Linked Data Cloud.
4. The Big Guns are behind it. Like… the W3C, IBM, etc. This is about data and processing, not about SEO tricks.

— Rob
- eric says:
  
  February 29, 2012 at 1:59 pm
  
  1: In what sense are microdata not extensible? They’re fundamentally extensible.
  
  2: It doesn’t have to be very complicated to be infeasible by comparison with a format that people can use just by pulling down a style selector in their WYSIWYG content editor.
  
  3: Microdata exists. It’s existed since people started writing.
  
  4: Post-Gerstner, IBM will be behind anything it can sell, and will put money into anything it can defray development cost on. That is why they remain relevant.
  
  Microdata is not about “SEO tricks” — it’s about human usability. Microdata is both easy to create and inherently human-readable, human-discoverable. The killer data formats of the future will be ones that people and bots can discover without having to load it as a separate file or load a heavy schema.
Martin Haye says:

February 28, 2012 at 7:47 pm

You nailed it Roy. At code4lib I felt kind of alone in disliking RDF, now I don’t have to feel that way. In my complex environments in my experience, the simple solution wins in the end. Microdata is far simpler, and the web is really complex.
Bruce says:

February 28, 2012 at 8:29 pm

Didn’t schema.org and /or Google recently announce support for RDFa?
- Jason Ronallo says:
  
  February 28, 2012 at 8:54 pm
  
  Bruce, the schema.org partners announced support RDFa Lite, a simplified profile of RDFa.
  http://blog.schema.org/2011/11/using-rdfa-11-lite-with-schemaorg.html
  RDFa Lite is very similar to Microdata.
Gabriela says:

February 28, 2012 at 9:30 pm

How good is to chose simplicity and using a language used by the principal search engines over using a more expressive and powerful one?
Ed Summers says:

February 29, 2012 at 12:46 am

I think that the crawling of metadata that is going on at Google,Yahoo, Microsoft and Facebook is actually one of the first widely deployed killer apps for the “semantic web” we have seen. Paste the URL for The Artist into your status update box on Facebook (http://www.imdb.com/title/tt1655442/). What happens? Facebook grabs the URL, parses RDFa in the page, and determines that the URL is a movie resource, with a particular title, description and thumbnail. The Facebook URL Linter lets you see how this metadata was extracted and from where: https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt1655442%2F I could be wrong, but I think this is what the Web is going to look like more and more. Not that RDFa or Microdata is going to win; but that semantics published along with our HTML is going to drive new functionality in the applications we use every day…and we probably won’t recognize it as the Semantic Web. Take a look at Facebook’s introduction of ‘verbs’ for typing how Web resources are related on the Web: https://developers.facebook.com/docs/opengraph/keyconcepts/#actions-objects if you want to see where things are headed.

As Manu Sporny pointed out recently, Google is actively crawling RDFa + Schema.org data in addition to microdata + schema.org. I think Mark is right, I think you are setting up a false dichotomy: these are ultimately complimentary technologies. RDF is a data model for talking about descriptions of Web resources, and how these resources are related. microdata is a syntax for expressing that data in HTML. RDFa is another way. RDF/XML is a way of expressing the same data in XML. There are formats for doing the same in JSON. I think pitting these technologies against each other is an easy way out of trying to understand how they work and where they are useful. For a really thoughtful talk on this topic check out Jeni Tennison’s recent keynote at XML Prague, which has video available.

I do agree that, at least for me, microdata is easier to compose than RDFa, and I think it will fit most HTML authors brains a bit easier than RDFa. As Rob points out microdata is not as extensible as RDFa. But this was intentionally done, and may serve it well in some scenarios, where authors would prefer some guidance on what they can say instead of being able to say anything they like. RDFa1.1 may very well end up packaging things up so that it’s easier to digest. But some people may never be able to see past the “RDF” in “RDFa” because they’ve already made up their mind…maybe 12 years ago :-) But that’s OK, because they can happily use microdata, and little do they know, they are doing pretty much the same thing :-)

I think this is what they call a win-win…or something. The Web is a complex, complicated and beautiful ecosystem. Let’t not trivialize it with us vs them arguments just to get a blog post out. You’re bigger than that royt :-)
- Roy Tennant says:
  
  February 29, 2012 at 10:51 am
  
  Ed: Thanks for an awesome reply. I’m tempted to make you write my blog from here on out. But before that happens, I want to say something in my defense. My real purpose for this post wasn’t to say that RDF is dead, although that apparently came through louder and clearer than I intended, but that the idea that RDF is the sole path to the Semantic Web, as it has been depicted by many, is wrong.
  
  Rather, in the end I believe it will be microdata which will lead us to a (lowercase) semantic web, which arguably has forced the RDF community to accommodate it by coming up with a microdata-style version of itself. Sure, there will be a rich ecology going forward, but expecting everyone with useful data to do RDF is not a productive way forward. Providing simple syntaxes and tools so that everyone can get involved is.
- Ross Singer says:
  
  February 29, 2012 at 11:37 am
  
  I think the “idea that RDF was the sole path to the Semantic Web” was mainly borne out due to the fact that, until schema.org came along to figure out how to standardize using microdata, there were simply no other alternatives for effectively making a web of data (microformats was never really a contender here, because it was still, effectively, document-based, not graph-based).
  
  Given that an RDFS mapping has been made for schema.org (http://schema.rdfs.org/), what we’re looking at, actually, is a massive infusion of data available now to the *RDF* community (although not necessarily the other way around).
  
  I’ve said elsewhere that I picture microdata/schema.org as the gateway drug to RDF. For majority of people’s use cases, it might be perfectly sufficient, however, if they run into limitations (whether it be that the vocabularies aren’t sophisticated enough or simply that they have no desire to render their data solely as HTML), it’s easy then to transition to RDF (especially if the data is well-modeled in the first place).
  
  It realize it seems an easy analogy to compare microdata and RDF to XML and SGML, but it’s just as easy to make the comparison to HTML and XML, and the latter is actually probably a much better analogy.
  
  Like Ed mentioned, your killer app is here – between Facebook and Google, you’re seeing what the giant global graph can do, and it’s pretty breathtaking, in my opinion. To be sustainable, both companies are going to need as much structured data as they can get and as other players (esp. regular old developers) see the benefits of getting an API for free, you’re only going to see more and more sophisticated uses of this. And a rising tide lifts all boats, so you’re also going to start seeing a lot more RDF as people’s needs extend beyond HTML/microdata.
  
  I agree, this is win-win.
Roy Lachica says:

February 29, 2012 at 4:53 am

RDF is already in widespread use. (See http://www.webnodes.com/who-uses-semantic-tech-today). Microdata and Schema.org is only part of a bigger picture. When that is said I agree that Microdata is probably going to be dominant when it comes to search engines and search result pages.

When it comes to killer apps for the Semantic Web it is just like trying to answer the question of what is the killer app for NoSQL databases or what is the killer app for Cloud Computing? It doesn’t make sense to answer that because they are at the lowest level in the technology stack. You can use it for whatever you want and they help to solve some generic issues such as scalability. They build on previous technologies and enable better scalable apps and services to be developed faster.

RDF is not yet mainstream when it comes to storing, sharing and integrating data. For RDF to be useful you have to have other RDF producers/consumers since RDF is all about making sense of data that is shared. As with the railroad system it is not very useful until businesses/vendors start to develop a market around it. We are still early in the adoption phase. To replace the databases out there takes many years. Facebook is not very useful when few people are using it. However when the uptake reaches a certain critical mass the growth is exponential. My impression is that RDF adoption shows no signs of slowing down.
celsowm says:

February 29, 2012 at 10:35 am

RDF and Annotations (RDFa, Microdata…) always “coexist” ! Each “technology” have a differente place in the “semantic stack”
Dan Brickley says:

March 7, 2012 at 2:01 pm

Roy, this is a disappointingly tabloid-style treatment of a much more nuanced situation.

Just reading http://schema.org/docs/datamodel.html shows the fundamental affinity with RDF – “The data model used is very generic and derived from RDF Schema”. Not to mention the hours of dialog that went into improving RDFa 1.1, introducing a “Lite” flavour http://blog.schema.org/2011/11/using-rdfa-11-lite-with-schemaorg.html that builds upon Microdata’s simplicity, which itself was initially built upon a rejigged RDFa, see http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-May/019681.html

Microdata and RDFa are very close cousins, and schema.org’s usage (attaching a schema system based on RDF/S) brings them closer. There are many more interesting things you could explore around this deployment, instead of using it as a stick to hit RDF or the Semantic Web community with. The reality of standards work isn’t “X is dead”, or “Y killed Z”; it’s debate, dialog and incremental progress. Sorry if that doesn’t give you good headline fodder!

Meanwhile, … “E-books that require a particular device to read them are only now hitting the market, which means I’m marking them dead on arrival.”

Now you’re talking!
Danny Ayers says:

March 7, 2012 at 3:54 pm

Regarding microdata vs. RDF – what they said!

But one aspect that I think is worth flagging up is that, the main product yielded by both microdata and RDF technologies is data about *things* in a Web-friendly form. It’s closer to the material contained in traditional databases. This is a paradigm shift from traditional Web documents, primarily designed for human comprehension. The paradigm shift is what the Semantic Web is all about.

It’s a paradigm shift that’s been happening steadily over the last twelve years with developments around RDF, and microdata is just another step in the direction of the Web as a general-purpose information store.

HTML-embedded data is something that’s easy to create and can provide fairly immediate benefits, like Google’s Rich Snippets. But this is a very superficial application of the technology. Where it starts to get interesting is when you start combining and remixing information from different sources.

This is already well-established in various specialized domains – notably biotechnology – but is only slowly creeping into the mainstream. There is already an abundance of data to draw on, check the Linked Open Data cloud.

Microdata offers an easy way in to Semantic Web technologies for builders of HTML-oriented applications. It’s likely to provide a huge amount more data on the Web. The work that’s been done around RDF in the past 12 years offers approaches to doing useful things once you have such data.

I can only reiterate what others have said regarding killer apps. But I can give you an example of what RDF did for me today: I was able to use live data from the Web to answer an incredibly arbitrary question (do popular musicians die at the age of 27 more than any other age). I ran a SPARQL query through a browser into dbPedia, copied the results into a local spreadsheet to generate a chart. Very soon after I mentioned this on G+, Kingsley Idehen demonstrated how I didn’t actually need the local spreadsheet. Because the information was available as linked data, it could be hooked into Google Spreadsheets directly. Ok, this was just for fun. But there are a lot of questions of a similar nature that might come under the umbrella of Business Intelligence.

http://dannyayers.com/2012/03/07/Debunking-the-27-Club-with-SPARQL
Juan Sequeda says:

March 14, 2012 at 10:09 am

The Semantic Web has gone mainstream, wanna bet?
http://semanticweb.com/the-semantic-web-has-gone-mainstream-wanna-bet_b27329
Mark Andrews says:

March 14, 2012 at 7:04 pm

I’m only interested in what Cycorp is up to.

Why Microdata, Not RDF, Will Power the Semantic Web

Comments

Search the Shift

Recent & Popular

Advertisement

Job Zone

On Twitter

On Facebook

About the Shift

Why Microdata, Not RDF, Will Power the Semantic Web

Comments

Search the Shift

Recent & Popular

Advertisement

Job Zone

Tag the Shift

On Twitter

On Facebook

About the Shift