May 19, 2013

"Scholarly" HTML

In an attempt to free scholarship from the bindings of Adobe Acrobat, there is an ad hoc group working to define a “scholarly” version of HTML5. Why does there need to be a special version of HTML5, you ask? Good question. It is so the data in the article can be marked up with semantic information that can allow the data to be understood and re-purposed using software.

For example, markup like this tells you that what is between the tags is the name of the creator of the piece according to the use of that term by the Dublin Core community:

<span rel=”http://purl.org/dc/terms/creator“>Sam Adams</span>

Here is an example of what a “full” ScHTML document might look like:

<html property=”http://scholarly-html.org/schtml“>
<body>

<span property=”http://purl.org/dc/terms/title“>Title</span>

<span property=”http://purl.org/dc/terms/creator“>Author</span>

<span type=xsd:date>2011-03-12</span>
</body>
</html>
The group has the following principles:
  1. ScHTML is declarative about the information it contains and not imperative about the way the information is displayed or consumed by the scholar.
  2. ScHTML is scholarly because it addresses the every-day problems scholars have in conveying the outputs of their work and providing education. Fundamental to the standard is a community-led process of creating broad range of tools for producing or consuming ScHTML for each specific requirement.
  3. ScHTML is a domain-specific application of the W3C HTML standard and tracks that standard so long as it supports the requirements of the communicating knowledge in a declarative way within the scholarly community.
  4. ScHTML is not owned by anyone but is developed by the community through a democratic process. The community will be continually invovled in developing vocabularies. ScHTML will define a very small  core – the absolute minimum infrastructure (perhaps author and date) and  how to define conventions.

It will be interesting to see where this group goes, and how rapidly it gets there.

PrintFriendlyEmailTwitterLinkedInGoogle+FacebookTumblrShare
Roy Tennant About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.

Comments

  1. Nick Ruest says:

    What happens when the black hat folks get a hold of the spec and exploit it?

    Viagra

  2. Not to sound like an old beard but couldn’t this be done with LaTEX http://www.latex-project.org/

    That is markup that leads to non proprietary PDFs or HTML or Postscript for that matter. Scholarly HTML is a good idea but couldn’t it be just another output from a more basic markup. After all, we are still stuck with the “bindings” of Microsoft Word.

  3. Peter Sefton says:

    @edward

    Scholarly HTML is a format – so of course you could create it with LaTeX. I look forward do seeing what the LaTeX communitiy can contribute to this effort. Another back end format that is perfect for creating Scholarly HTML would be the NLM XML schema.

  4. Sheryl Stahl says:

    I’m just starting to learn about RDA – but isn’t that what that schema is trying to solve also?

Speak Your Mind

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.