Monday, December 10, 2007

RDFa, @profile, and following your nose

One of the issues that people continually return to in relation to metadata in HTML (and with RDFa in particular), is how to know when it is safe to interpret some mark-up as 'additional' metadata. This often gets termed 'following your nose', since the idea is that you should be able to obtain from the document you are processing the information you need to add an extra layer of interpretation to the mark-up.

(This shouldn't be confused with the other form of 'following your nose', practised by Gogol's Major Kovalyov, which is an altogether more fraught experience.)

For example, if we were processing an HTML document that contained the hCard microformat, we might have:

<span class="fn">Mark Birbeck</span>
The nose-related question being posed is, how do we know that it is legitimate to interpret the value called out by "fn" as a full-name? Is it not possible that on some occasions an author has simply created a @class value of "fn" to help their CSS?

@profile

One way this question is answered in the world of microformats is to use the profile attribute to 'scope' these values; if the author has added the following, to the top of their document:

<head profile="http://www.w3.org/2006/03/hcard">
then it is pretty clear that they want "fn" values to be interpreted as a 'full name', and it would be quite legitimate for a parser to extract this value and do something with it. The complete mark-up might look like this:

<html>
<head profile="http://www.w3.org/2006/03/hcard">
.
.
.
</head>
<body>
<span class="fn">Mark Birbeck</span>
</body>
</html>
This is a good solution for microformats, since there are quite a few of them. Each microformat has different rules for parsing, so indicating the presence of one or more microformats in the profile attribute works well, and is very much in the spirit of the original meaning of @profile in HTML.

But what of a generic solution like RDFa? Does the same situation arise?

RDFa is unambiguous

RDFa uses new attributes, as well as full identifiers for properties and values, so the problem of ambiguity does not arise. To make use of the FOAF value name an author would declare a namespace prefix, and then use any values that are enabled by that prefix, as follows:

<html xmlns:foaf="http://xmlns.com/foaf/0.1/">
.
.
.
<body>
<span property="foaf:name">Mark Birbeck</span>
</body>
</html>
This is about as good as it gets, since it makes use of existing, well understood XML techniques to define properties unambiguously. And since it is using the new property attribute, it is nigh on impossible that we would happen upon this mark-up in a situation where the author intended some non-RDFa meaning. In other words, RDFa provides a far better solution than the usual 'follow your nose' approach, which would have us load some kind of profile (as practised by microformats and GRDDL), since it requires no other indications in the document to help discern meaning, other than a few simple, generic, rules.

Indicating the presence of metadata

However, there is a separate problem that is often raised in conjunction with RDFa, which is how to know whether to actually parse the document or not; if an author has not included any RDFa, why spend time processing it?

This is not a problem unique to RDFa, being constantly raised in relation to all sorts of document types, since there are of course many situations where it would be preferable to know something about the content of a document before opening it and processing it.

But that should flag up to us that just because it would be desirable in some situations to avoid the extra processing, it does not mean that a flag to indicate the presence of RDFa should be hard-coded into the syntax and made mandatory, as many are suggesting. In fact, some argue that without such an indicator, the whole structure of RDFa will collapse, which is simply not the case. Since RDFa is unambiguous, detecting its presence is merely about saving on processing time, and is therefore best seen as a use-case...and as a use-case it is no more or less important than other use-cases, such as those that might involve the processing of every single HTML document.

The confusion that has arisen around the use of @profile in RDFa is caused by the merging the two--distinct--issues, that of ambiguity of meaning, and that of 'detecting' the presence of RDFa.

Confusion

When using microformats, the question of ambiguity and the 'presence' of microformats are one and the same. This is necessarily the case because microformats make use of existing HTML attributes, such as class and abbr, but populates them with values that are indistinguishable from the ordinary use of such attributes. It is therefore imperative that a microformat parser (or indeed a GRDDL processor) does not process the values unless it is given some indicator that it is safe to do so.

RDFa on the other hand uses new attributes, so the presence or otherwise of RDFa is clear. And in the cases where existing HTML attributes are used (namely @rel and @rev), RDFa values are 'scoped' anyway, so the ambiguity question does not arise.

This means that if an RDFa processor were to process every document it found, it would be extremely unlikely to come to any 'conclusions' that were invalid. That doesn't mean that every processor should parse every document, but it does show that the question of 'presence' relates to optimisation rather than being a fundamental issue.

Conclusion

RDFa does not have the same problem as microformats in relation to ambiguity. So whilst the use of @profile in microformats is a good way to resolve the ambiguity problem in that context, it is simply irrelevant for RDFa. That doesn't mean we might not wish to indicate when it is worth parsing a document for RDFa, and one way to do this may be to use @profile. But we have to be clear that ambiguity of terms and detecting the presence of content, are two separate problems.

Labels: , , , , , , , , , , , , , , ,

2 Comments:

Blogger ila nivas said...

GRDDL transformation to extract RDF triples has to be included in the RDFa document in order to make RDFa useful for semantic web applications. The presence of GRDDL transformation in @profile is an indication to suggest that the document must be submitted to GRDDL agent in order to get RDF triples. This is true for both RDFa and microformats. Therefore it is not necessary to make an explicit provision to indicate that a document has RDFa.

For the agents that must know if a document is RDFa document they can know this from the DOCTYPE.

Since not all documents on WWW may adhere to W3C XHTML+RDFa validity seal or may not include GRDDL transformation but still include RDFa, therefore the intelligent search agents that are aware of new attributes defined in RDFa specifications shall glean RDF triples from the document to build taxonomy. The Search Engine may be intelligent to either submit this RDFa document to RDFa agent in order to extract valid RDF triples or may simply associate the document URI as the subject for the property.

December 13, 2007 2:53 PM  
Blogger Mark Birbeck said...

Hi Ila,

GRDDL is just one way of extracting information, but there is nothing about RDFa that says it must be parsed with a GRDDL processor. In fact, there are about four or five parsers that I'm aware of, that have nothing to do with GRDDL.

Regards,

Mark

December 13, 2007 3:35 PM  

Post a Comment

Links to this post:

Create a Link

<< Home