Sunday, October 23, 2005

CURIEs: A Compact URI Syntax / Semantic Links

It's increasingly common to see software using QNames to abbreviate URIs. But strictly speaking, QNames cannot represent all possible URIs since they were devised only to name XML elements. I recently proposed 'CURIEs' as a solution to this.

Introduction

If you use Wiki software or RDF, you'll be very familiar with using QNames to represent a full URL. For example, many Wikis support a feature where a prefix like isbn can be substituted for something like:

http://www.amazon.com/?isbn=
or:

http://www.barnesandnoble.com/?q=
When a Wiki author wants to make use of the substitution, they simply type into their posting they would use syntax similar to this:

Go and buy T. V. Raman's [[isbn:0321154991][book on XForms]].
and the Wiki software will automatically generate:

Go and buy T. V. Raman's
<a href="http://www.amazon.com/?isbn=0321154991">book
on XForms</a>.

The Problem

The problem is that isbn:0321154991 is not a valid QName. The reason for this is simply that QNames were originally devised to provide a way of scoping XML element names with namespaces, and an element name cannot begin with a number.

Another example--and indeed the one that kicked off the whole thing when I was doing some metadata consultancy--involves the IPTC subject codes. The IPTC's taxonomy has a set of subject codes for news articles; to pick an example, the code 15002000 represents alpine skiing. The IPTC would like to be able to represent these codes in a convenient form in their documents, in such a way that they are not only compact, but it's also easy for news organisations to add their own codes. The obvious choice was to use QNames for this, since they allow different organisations to adopt their own namespaces to qualify the values. But as with our ISBN example, iptc:15002000 is not a valid QName.

Why these examples are not valid should become a little clearer if we look at the background to QNames. In ordinary XML (pre-namespaces) the following are all valid element names:

abcd
ab:de
ab::::cd:efg:h
They are valid because the colon is just another legitimate element name character. When namespaces were added to XML the authors were looking for a way to split a name up, and decided that the colon should be used to do this. These new element names--with a 'context' to precede the element name--were called 'qualified names', and the mechanism for 'expanding' them is simply to take the namespace represented by the part before the colon, and then append to it the part after the colon.

This is probably all easy-peasy stuff to you, but here is an example anyway; an element name like xforms:input translates to be the input element, but within the namespace defined by the prefix xforms.

The 'common convention' that has arisen is that this same syntax can be used to pass around a URI. So for example, if in RDF I want to use the Dublin Core creator property, then all I need do is this:

dc:creator
and provided that I have the dc namespace prefix defined as http://purl.org/dc/elements/1.1/, I have effectively represented the following URI:

http://purl.org/dc/elements/1.1/creator
To recap, what we have done is used a technique that was devised specifically to scope XML elements, like this:

<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">
...
</dc:creator>
to represent a URI--in this case, this:

http://purl.org/dc/elements/1.1/creator
But as we've already observed, you cannot represent all possible URIs with QNames.

The Solution

We can resolve this problem pretty simply, by inventing a new datatype which allows the part after the colon to be anything. That's exactly what I have done with my Compact URI syntax (abbreviated to CURIE), making all of the following examples acceptable:

home:#start
joseki:
google:xforms or 'xml forms'
:#start
:
:xforms or 'xml forms'
Note that any valid QName remains a valid CURIE, but of course not all valid CURIEs are valid QNames.

RDF/A, CURIEs and URIs

This is all pretty straightforward, and as I mentioned before, in relation to software like Wikis this is merely a codification of existing practice (I'm not saying that's a bad thing).

However, it all gets a little trickier when you put CURIEs into their intended context, which is a new syntax for RDF called RDF/A. I'll discuss it in another blog, but I'll just mention it now because a new draft has been produced, as part of the ongoing process of incorporating RDF/A into XHTML 2. (A number of smart people from the RDF in XHTML Taskforce have been contributing to the thinking, and as part of this process of incorporation.)

Anyway, as a brief overview, the main idea of RDF/A is to allow XML documents to 'carry' their own metadata. There are many situations where you don't really need a separate document to carry your RDF/XML, since the main document carries much of the same data. This often happens with clumps of metadata like FoaF, but also often happens with RSS.

But whilst RDF often makes good use of QNames (and so too, RDF/A makes use of CURIEs), unfortunately when you move back to the XHTML side, you have to spell everything out as full URIs in your links. (One of the goals of RDF/A is, as Ben Adida memorably put it, to "bridge the Semantic and clickable webs".) So RDF/A allows the use of CURIEs to express ordinary inline links, but allows a processor to differentiate a CURIE from a URI by using square brackets:

<div about="">
This document is licensed under a
<a rel="[cc:license]" href="[lic:by-sa/2.0/]">
Creative Commons License
</a>
which, among other things, requires that you provide
attribution to the author,
<a rel="[dc:creator]" href="http://ben.adida.net">Ben Adida</a>.
</div>
This chunk of XHTML will generate the following triples:

<> cc:license
<> dc:creator
As I discuss in the document, this is the part that I feel is most exciting about CURIEs, since now it means that navigable links can themselves be passed semantically. More on this and our work on a 'semantic web browser', in a future blog.


Tags: | | | | | | | | | |

5 Comments:

Anonymous Julien Couvreur said...

The ability for a user to control how links should be "resolved" is pretty interesting. I had been pondering on something like in a post on Open-ended links and link re-writing.

Two things I don't get about CURIE references:
-is the browser supposed to resolve these references, or is the server software (wiki or other) supposed to do it?
-how do you support more than one parameter?

firstname.lastname@gmail.com

October 25, 2005 10:42 PM  
Blogger Mark Birbeck said...

Julien,

I read you article and it is spot-on. It's exactly what I have been thinking about. The only difference with CURIEs is that I'm taking a sub-set of what you could do, and defining it for a set of pretty well-defined use cases.

So this means that there are no parameters in this proposal, although I do agree that you need them. The idea of CURIEs is not really to do 'redirecting'.

And on where is the resolving done, I would say that it's application specific. You could do it on a server, and pass the transformed links to a browser as XHTML 1 (or even HTML 4). Or you could pass them through unchanged, and let the browser take care of it all.

Anyway, thanks for your comment, and once again, I liked your article.

Mark

October 26, 2005 2:05 PM  
Blogger cow said...

Hi,

I recommend OCLC for ISBN linking which is more neutral with regards to commerce. :)

October 30, 2005 5:13 PM  
Anonymous Steve Pomeroy said...

Perhaps I'm a bit confused here, but isn't someone who does:


Go and buy T. V. Raman's [[isbn:0321154991][book on XForms]].


attempting to use isbn: as an unofficial URI scheme, not a QName? At least for this example, this appears to be more of a place for URNs than a new syntax:

http://www.w3.org/TR/uri-clarification/

It seems like the wiki is not attempting to do an abbreviation of the full Amazon ISBN links, but instead are attempting to come up with an internal representation of ISBN numbers such that they could be possibly rendered to URLs when served to web browsers. The Wikimaster could, if they wanted, change all their ISBN renderings to another website by changing the URL template.

November 01, 2005 3:48 PM  
Anonymous oo3 man said...

I liked your article.

thanks

November 22, 2005 12:51 AM  

Post a Comment

Links to this post:

Create a Link

<< Home