- RDF made me feel stupid -- Maybe you won't have to
- Why have I been so interested in learning about RDF?
- Specific Problems to Solve with RDF
- semantic interoperability among various XML specifications
- representing disparate data formats and materials in the Scholar's Box
- extensibility of XML specifications
- Besides solving specific problems, RDF is getting some traction in a number of places. Hence it might be time to see what all the fuss is about.
- A quick note on my own background
- My current understanding of RDF
- RDF is promising technology in spite of all the confusing hype around it.
- RDF is not a monolithic topic. RDF can be used independently of the Semantic Web. RDF is not inherently tied to XML.
- The RDF triple concept is a simple, elegant, and seemingly powerful one at its heart.
- RDF/XML is obscure to the uninitiated and makes it easy to confuse the relationship between RDF and XML.
- RDF Tools help a lot to make RDF understandable -- and usable.
- RSS 1.0 is a good place to start with RDF.
- Non-hype filled assessments of RDF and the somewhat related Semantic Web are hard to find.
- Blending RDF vocabularies is probably easier than blending XML vocabularies but it's not magic either! Some human must do the mapping of meanings between vocabularies.
- Too much abstraction and confusion might kill off RDF.
- Comments and Feedback
RDF made me feel stupid -- Maybe you won't have to
Ever have some subject area that you think that you should be able to understand but can't quite manage despite valient attempts? You know that it shouldn't be that hard but, for some reason, it eludes you. You end up saying, "But I'm not stupid....ok, well, maybe I am -- not it's the subject that's stupid!."
The SemanticWeb (SW) and all the many terms rightfully or unfairly associated with it is exactly such a subject area for me. Last week, as I was preparing my talk about RSS to the "Technical Solutions Group" at the California Digital Library, I made my third serious attempt over the last several years to understand RDF -- one of the lower stacks of the "semantic web layer cake". Even though I have been using RSS for several years, I've largely been ignoring one of the important flavors of RSS: RSS 1.0 because of its use of RDF. I remember looking at plenty of RSS 1.0 files and being puzzled by exactly what it meant. I figured that this was a good time to try to get to the bottom of what RDF was.
There's been a few eureka moments but no Damascus Road experience yet. Call me a honest seeker on the road to RDF/Semantic Web enlightment. Although I didn't manage to figure out all the various pieces of the puzzle, I wanted to present some tentative conclusions and outline my current understanding of the topic. Hopefully, my write-up will help others have an easier time making sense of RDF and the semantic web.
Why have I been so interested in learning about RDF?
I've been doggedly pursuing RDF because I have suspected that it would be a very useful technology, that it helps to solves some very basic problems that I am tackling. Three specific problems come to mind:
Specific Problems to Solve with RDF
semantic interoperability among various XML specifications
We at the IU and the UCB Library been studying how to translate materials encoded in various XML specifications useful in the library and educational technology world. The basic approach I've been taking is to sit down and write crosswalks -- often in XSLT -- to translate one specification to another. See "A Preliminary Crosswalk from METS to IMS-CP" that summarizes work by Rick Beaubien and myself. Now, we are looking at the next steps to take.
--> For doing these kinds of crosswalks, I think you'd also need OWL, although that's currently a level beyond me. --Tom Hoffman
I don't think that there is an alternative to the painstaking, laborious hand-generated crosswalks between specifications that we have been pursuing. If there is a better, even automagical, approach, we want to know about it! The rhetoric behind RDF seems to promise a better solution than what we have been pursuing. For example, the answer to the question "What is RDF?" is:
RDF--the Resource Description Framework --is a universal format for data on the Web. Using a simple relational model, it allows structured and semi-structured data to be mixed, exported and shared accross different applications. RDF data describe all sorts of things, and where XML schemas just describe documents, RDF and OWL schemas ("ontologies") talk about the actual things. This gives greater re-use. Where XML provides interoperability within one application (e.g. bank statements) using a given schema, RDF provdies interoperability across applications (eg import your bank statements into your calendar).
Such statements make RDF sound wonderful (especially at the expense of XML, I might add) -- but is RDF too wonderful to be true? The fact that key projects such as SIMILE and HARMONY directed at solving semantic interoperability issues use RDF gives credence to these assertions. So the question then becomes, "OK, RDF might help -- but exactly how does it help? With RDF, do we need to map elements between specifications or does RDF somehow magically take care of mappings? If so then why the hype of RDF over XML?" I could see in the abstract how RDF might be useful but I couldn't see how RDF is going to make life gloriously easy. And does using RDF mean first recasting METS and IMS-CP (two XML specifications we are looking at) into RDF/XML?
representing disparate data formats and materials in the Scholar's Box
The architecture for the Scholar's Box needs to handle the blending of multiple formats of data from many disparate sources. RDF is supposed to make such blending easier. But is it easier only if we are trying to blend RDF data? Does one and how does one retrofit XML data to fit into such a scheme? Increasing numbers of repositories are creating XML data (Amazon.com, the California Digital Library, a lot of non RSS 1.0 data feeds) -- and I want to blend them. Does RDF help me?
--> I think it depends on exactly where your data stops and your metadata starts. If you want to create a repository for metadata about these sources of XML data, that is, a central index that points to the other XML resources, RDF does that quite well out of the box. You absolutely can whip up some plain old XML that will do the same thing, but you'll be in effect recreating the functionality of RDF, which may be harder than you think and make you look stupid in a few years if RDF continues to grow. --TEH
extensibility of XML specifications
In RSS 2.0, METS, and IMS-CP, one can extend the metadata elements by adding elements in non-native XML namespaces. Is this mechanism of extensibility practically much worse than that of RDF (and the specific instance of RSS 1.0)? I'd love to see real examples to demonstrate one way or the other. If RDF-type extensibility is so much better, then how do we retrofit all this non-RDF XML to take advantage of superior RDF-type extensibility?
--> Practically speaking, this may have more to do with RDF-based applications' ability to ignore elements they don't understand without failing than anything else. --TEH
--> On the other hand, I keep trying to come up with a plain XML analogue to Brownsauce. Even when it doesn't have an RDF schema to draw from, it allows one to browse arbitrary RDF in a meaningful manner. It seems like you should be able to make a similar XML browser, but I can't think of one. Later... actually just a DOM browser like Mozilla's is probably the proper analogue. Never mind... --TEH
Besides solving specific problems, RDF is getting some traction in a number of places. Hence it might be time to see what all the fuss is about.
Though I've been working with RSS for years now, I never fully understood RSS 1.0, a major flavor of RSS, because of its RDF basis; all the other flavors of RSS seem straightforward by comparison. Nevertheless, that a number of applications (RSS 1.0, Chandler, Adobe XMP is using RDF indicates to me that RDF is on the verge of critical adoption and has moved beyond a lab experiment. So even if I'm skeptical, I want to understand (I've wanted to use those three applications for a while now.)
RDF and the semantic web seems to have hit the education world too. Terry Anderson's talk at Merlot2003 "Beyond Learning Objects: Towards the Educational Semantic Web" as reported by Greg Ritter is the first detailed example I've seen of work in this direction. I want now to understand the connections between learning objects and the "Educational Semantic Web"
A quick note on my own background
I come at trying to understand RDF with a solid background with the XML family of technologies (XML, XSLT, XML schemas) and some specific applications of XML (RSS 0.9x, RSS 2.0, METS, IMS-CP, IMS-MD). However, I have little knowledge of artificial intelligence, knowledge representation, and functional languages. Your background is probably different -- some things that might be clear to you might be unclear to me (and vice versa) because of different backgrounds.
My current understanding of RDF
Here I outline a set of tentative conclusions. I don't attempt to provide a tutorial on RDF below, save in passing, though I do point to useful resources for understanding RDF.
RDF is promising technology in spite of all the confusing hype around it.
Though I don't think that RDF has yet proven itself, I can definitely see its potential. The PIE wiki nails this central point:
...the benefits of using RDF are nebulous and under-specified. The main ones [benefits] are: extensions have clear requirements [and] RDF can be more easily merged with other RDF
I've been frustrated that it's taken a long time to come to this very basic (and hardly earth-shattering) conclusion. The whole RDF scene is incredibly confusing -- and a lot of people are confused.
Why is RDF so hard to get? In RDF, What's It Good For?, Kendall Clark argues that RDF might be a victim of bad technology evangelism:
RDF is like my eccentric old uncle. I don't know him as well as I'd like, which is partly his fault, since his eccentricities can be off-putting. Of course they're what make him so interesting and are the reason I want to get to know him better in the first place. Yeah, RDF is just like that.
The Resource Description Framework is still among the most interesting of W3C technologies. But it's got persistent troubles, including having had its reputation beaten up unfairly as a result of the many and often nasty fights about RSS. But, just like my eccentric old uncle, RDF is not entirely blameless. In a previous XML-Deviant article ("Go Tell It On the Mountain") I argued that RDF's trouble might have something to do with it having been the victim of poor technical evangelism.
In some sense that's still true. Recently I googled for a comprehensive, up-to-date RDF tutorial, which proved as elusive as finding Uncle's dentures the morning after nickel beer night at the bingo hall. In fact, I was hard pressed to find an RDF tutorial which looked like it had been updated this year. And one which I did find simply listed 13 different ways to express the same basic set of assertions, which not only makes a terrible tutorial, but also exemplifies another of RDF's persistent troubles: its XML serialization.
RDF is not a monolithic topic. RDF can be used independently of the Semantic Web. RDF is not inherently tied to XML.
I found the way that Mark Pilgrim's essay "Should Atom Use RDF?" laying out "four related but completely independent issues" extremely helpful. Let me quote the beginning of the essay:
Here are four related but completely independent issues:
The RDF model: statements are triples; use graphs not trees
The RDF/XML serialization: a popular syntax for expressing individual RDF documents
The Semantic Web
The RDF conceptual model is overkill for specific applications, or is always overkill, or is simply the wrong model.
The RDF/XML serialization is wretchedly complex and breaks the "view-source" principle for RDF documents.
No RDF tools exist for my favorite language.
The Semantic Web is an unattainable pipe dream, or is too fluidly defined to ever come about, or something.
--> I think that RDF is <em>always</em> overkill for a single, clearly defined, isolated application. It is only in the interaction between different applications and distributed data sources that it becomes worthwhile. --Tom
The problem with discussing RDF (where that means, "I think this data format should be RDF") is that you can support any four of these RDF issues (model, syntax, tools, vision), in any combination, while vigorously arguing against the others.
I had conflated these issues which made everything difficult to understand. Take the issues individually, and things won't be so confusing.
The RDF triple concept is a simple, elegant, and seemingly powerful one at its heart.
For some reason, it took me a long time to get the key concept behind RDF. It took me a while to get -- not because the concept is that difficult -- but because a lot of other things seem to obscure it.
Let me first try to explain in my own words (though the following explanation might not be quite right):
A RDF document is just a series of statements about resources in the a subject-predicate-object (triplet) form. Another way to put it, they are statements of the form a Resource R has a Property P of Value V -- a triplet (R,P,V) ("Raymond Yee", "has age of ", 36)
RDF Vocabularies give ways to talk such things as types of resources, terms for properties. For example, a geneological vocabulary would define properlies like "is mother of" , "is sister of"
Once we have these types of RPVs around, we can to the mix various logical propositions. If V > 30 of a RPV with P="has age of", then (R, "has to trust status", No). In other words, a computer program should be able to deduce that Raymond Yee should not be trusted since he is older than 30 and since one must not trust anyone over 30.
Tim Bray's "What is RDF?" was the first essay that I read in my attempts to understand RDF. It's still very good. However, I think that the triples idea was still unclear to me after reading the essay. (And I don't blame Tim Bray for that since the idea is clearly in the essay). So I would say to readers that one should follow up Bray's essay with reading something like Aaron Schwartz's "RDF Primer Primer". The two complement each other.
RDF/XML is obscure to the uninitiated and makes it easy to confuse the relationship between RDF and XML.
RSS 1.0 was my first encounter with a RDF-based format. I looked at sample RSS 1.0 document and really didn't understand how to work with it. (For instance, I did not know whether I could embed arbitrary XML fragments in other namespaces into a RSS 1.0 document.) Coming from a non-RDF XML background and seeing other XML documents did not prepare me for understanding RSS 1.0.
Since then, I've learned the following:
RDF, as a set of triples, can be written out ("serialized") in a number of popular ways. (e.g., N-Triples and N3). It's helpful to look at these serializations first -- without paying any attention to XML.
RDF does not have to be written out in XML. However, the W3C has settled on the RDF/XML serialization as a standard way of writing RDF because XML solves a lot of infrastructural issues that other serializations might not.
When one encounters RDF/XML, don't try to understand RDF in terms of non-RDF XML. You might get confused because without any help, you might not be able to see the graphs (of RDF) in the tree structure of the XML. Learn the RDF/XML syntax. (One tutorial was helpful to me.)
XML serializations don't have to be as obscure as RDF/XML. Read Tim Bray's essay on his RPV serialization of RDF and Kendall Clark's comment on RPV) -- where Bray argues (fairly convincingly) that the RDF/XML syntax did not have to be as obscure as it is. RPV is in XML and certainly makes seeing the RDF triples easier.
RDF Tools help a lot to make RDF understandable -- and usable.
The W3 RDF Validator Service helped me to see the underlying simplicity of RDF. I dropped the RSS 1.0 feed from my personal blog into the Validator and then related the triples that emerged to the actual RDF/XML document -- that helped a lot.
RSS 1.0 is a good place to start with RDF.
Using RSS 1.0 as a place to study RDF is good since the conceptual model behind RSS is simple -- and if you understand other flavors of RSS (such as RSS 0.91 and 2.0) and use it (in blogging or RSS aggregation), you will know at least what RSS is supposed to be about.
Non-hype filled assessments of RDF and the somewhat related Semantic Web are hard to find.
Proponents of promising technologies such like XML and RDF often damage the reputation of their technologies by over-selling what those technologies can do. If you believe all the hype, you would have first believed that XML was magic and that it was going to solve all our interoperability problems. Then came along the RDF folks who then said XML didn't solve all those problems but RDF will.
Consider, for example, the statement "XML is syntax, RDF is semantics" from the Semaview "At-a-Glance" Illustration Series: RDF and XML I can see how this statement is true in the specific case of RDF/XML serialization where the XML is being used to express the RDF. However, as a general statement, this is nonsense: Is a XML rendition of a DocBook just syntax? (As Mark Butler says, "The following statements are nonsense: 'RDF is more semantic than XML', 'RDF allows us to reason concretely about the real world', 'The power of RDF is its semantic model')
Given the hype around the Semantic Web (and just around RDF), I've been wondering these technologies relate to other efforts. There is a need for a non-hype filled assessment of RDF and its relationship to a lot of other stuff that it gets associated with (fairly or unfairly) -- such as the semantic web in general.(to help place the context and answer questions. What is their relationship to AI? Knowledge representation? What philosophical presuppositions lie behind RDF and the Semantic Web?
My co-worker TomSchirmer pointed out how much RDF reminded him of Prolog. I could believe it -- but this fact is not commonly mentioned in discussions of RDF and the Semantic Web. It was, therefore, gratifying to read "An Introduction to Prolog and RDF" by Bijan Parsia in which he makes the helpful point that the SW is AI (so what don't people just say so?!):
Many Semantic Web advocates have gone out of their way to disassociate their visions and projects from the Artificial Intelligence moniker. No surprise, since the AI label has been the kiss of, if not death, at least scorn, since Lisp machines were frozen out of the marketplace during the great "AI winter" of the mid-1980s. Lisp still suffers from its association with the AI label, though it does well by being connected with the actual technologies.
He goes on:
An aside -- one interesting phenomenon is that a lot of AI ends up, after fleeing the CS department, in Information and Library Sciences. And, of course, librarians, even the non-techie ones, are really into cataloging, searching, sharing, correlating, using metadata, intelligent agents... to wit, all the elements of the Semantic Web. AI folks don't end up in library departments because librarians are pushovers (as my overdue fines attest), but because there's a pretty good fit between what (some) AI-ers like to do, what the library folks want, and between what the librarians want and what the Semantic Web requires.
So the Semantic Web is an AI project, and we should be proud of that fact. Not only is it more honest, but it means that we can be clearer about what constitutes prior art, relevant research and literature, similar projects, and available technology. As I've written before, narrowness of understanding is a pernicious barrier to sensible progress. Reinventing the wheel isn't nearly as bad as having to continually reconceptualize it: "not thought here" generally causes more systematic problems than "not invented here".
Thanks for clearing up the confusion!
In a similar vein, one of the single most helpful resources that have helped me is Mark Butler's "Is the Semantic Web Hype?" to which I have already refered. The slides are terse but very insightful, extemely helpful in sorting through the hype. We need more of the types of level-head evaluations of the technologies -- and they shouldn't be so hard to find!
Blending RDF vocabularies is probably easier than blending XML vocabularies but it's not magic either! Some human must do the mapping of meanings between vocabularies.
As I wrote above, a major reason I'm looking into RDF is seeing whether RDF makes it easier to blend and interconvert various data and metadata formats. My current conclusion is that blending RDF data is easier than blending XML data but that the blended RDF data doesn't magically reconcile various vocabularies.
This is an area with major confusion, partly fueled by hype from RDF advocates (see the RDF FAQ). A thread on Sam Ruby's blog illustrates this confusion but also helps to bring some light into the topic. Let me trace parts of the conversation:
Mark Pilgrim kicks it off:
Yes, there are 100s of XML vocabularies -- wasn't that the point of XML, that anyone could make one? There are precious few mappings between them, although Tim among others is working on that. If we were all using RDF, there would be 100s of RDF vocabularies, and precious few mappings between them.
Mark Baker disagrees:
If there were hundreds of RDF vocabularies, you wouldn't need any mappings between them. That's the point of RDF.
FWIW, I misunderstood the question. Had I taken the time to read the whole thread, my answer would have been; "The mapping is on the Web, and itself an RDF graph".
"If we were all using RDF, there would be 100s of RDF vocabularies, and precious few mappings between them."
I recommend the RDF Primer : http://www.w3.org/TR/rdf-primer/
All RDF vocabularies automatically have significant mappings between them thanks to the model. That's the whole point of using RDF for exchange. XML lacks this.
Mark Pilgrim doesn't buy it:
If you use Dublin Core for predicates in your RDF statements, and I make up a new vocabulary that includes concepts like "created by" and "has a language of" but has its own set or URIs (is not Dublin Core), you'll have no idea what my statements mean without some sort of mapping. No amount of syntax can change that.
Yes, I know there are ways of creating this mapping that are machine-readable, but the mapping has to exist, and if I was dumb enough or evil enough to create my own replacement for Dublin Core in the first place, I'm probably not smart enough or generous enough to create the mapping to Dublin Core.
Why would I do that? Evilness (attempting lock-in, even if someone else creates a mapping I can use real-world clout to ensure that most people use my ontology and not yours, and besides, people running around creating mappings to my ontologies is a good way to ensure that they're not working on anything useful). Unwillingness to play nicely with others. Ego. Ignorance. Any number of reasons.
Similarly, I can use different URIs than you to express that same subjects or objects (what's the "correct" URI for a concept like "CSS"? "Web standards"? "Truth"?) and we'll need mappings to correlate them. This first hit me when I was creating my FOAF profile (by hand, in vi, but never mind that) and wanted to list my interests. If I'm interested in Zen and you're interested in Zen but we use different URIs to express that concept, who's right? (Answer: both of us.) So how does a client know it's the same concept? A mapping. And who creates that mapping, and how do clients learn about it when neither of us includes it in our FOAF profiles?
Tim, I apologise but felt obliged to respond to a comment that does nothing but reinforce certain preconceptions of RDF, as Dave's comment shows. Mark, didn't mean to be patronising, but I'm surprised at your statement - it doesn't make sense to compare RDF and XML vocabularies in this way. Every XML vocabulary is effectively created from scratch. Every RDF vocabulary shares the same basic information model, and they are thus already mapped at a useful level. You are right that is possible to create vocabularies in a way that minimises interoperability. The argument that it is possible to do things badly can be applied to anything. The baseline for communication is much higher with RDF than plain XML. Your URI for Zen might be different than mine, but there might well be properties in common to our data through which a mapping may be automatically discovered. It's much easier to make mappings within a well-defined architecture than it is with anything-goes syntax.
Michael Bernstein elaborates on these inference bots:
As I understand it, as long as someone, somewhere (and the world does not lack for obsessive-compulsive annotators) makes the assertion that both URIs refer to the same concept, then an RDF crawling engine can apply the mapping to the otherwise isolated vocabularies.
Such a (currently hypothetical, I believe) inferring RDF crawler can be a centralized resource or a desktop client application (or both). This raises problems that as far as I'm aware, no-one has addressed yet such as potential spam mappings, or deliberately corrupt ones, but aside from trust related issues, it's not difficult to imagine a Google-like RDF crawler that can answer the question: "what other URIs refer to this concept?", or even (though this might make some people cringe) "what are the highest relevance URIs for the string 'Zen'?" (perhaps first consulting a dictionary-like resource), and proceed from there.
Dan Brinkley has a nice summary:
The thing about comparing 100 XML vocab versus 100 RDF vocabs, is that RDF was designed to allow such independently developed vocabularies to be mixed, without prior coordination.
As pointed out above, this won't automagically map everything, but it does create an environment where you can draw upon complementary vocabularies in a more fluid way, since the vocab creators don't get to dictate/predict the exact XML structures that their terms will be deployed in. This is good for decentralisation and unexpected re-use.
Outside of this thread, Mark Butler backs up this conclusion:
In the SW, someone has to provide a mapping to allow different vocabularies to interoperate.
Jon Udell, I think, concurs:
Exactly. Now, what the RDF advocates appear to be saying is that if extensions show up as sets of RDF triples, then the problem is solved. An aggregator that can consume job-related triples already "knows what to do with" vacation-related triples.
I'm with Patrick Logan here: you can't finesse the symbol grounding problem so easily. When I write an RDF query involving job-related and vacation-related RDF triples, I'll need to know which predicates exist in these vocabularies, what they are documented to mean, and how to construe operations that combine them.
Too much abstraction and confusion might kill off RDF.
I've seen how hard it was for people to pick up XML. It's been hard for me to understand RDF. Granted, it doesn't need to be that hard -- and there is a real need for killer apps and good teaching materials. (This rant isn't it -- but my write-up might help somebody else with my background make sense of RDF.)
I like what Sean McGrath proposes....not everyone has to learn the most abstract stuff -- but then, we'll have to make connections between the different levels of abstractions:
I think the answer lies in what I call semantic shadows. Let's say I am working with XML and want to have customers and products in my data model. I want to think in terms of plain vanilla customer tags and product tags - stuff specific to my problem. I don't want to have to think in a more abstract model than necessary to get my job done. At this point 94% of the IT people are happy. To keep the other 6% happy, we get them to create an up-translation from the domain-specific model of customers and products into whatever ontology is required for the purposes of the Semantic Web vision. Generate the semantic shadow files from the domain specific files. That way, we can have chocolate on both sides.
I think it's time for the Semantic Web proponents to stop trying to teach us all to think at their level of abstraction. We can't (or won't). Instead, the Semantic Web proponents should look at mapping transparently from the RSS 0.91, XFML 1.0 specifications that 94% of us are happy with, into the more abstract, generalized models that the other 6% need, for the applications they are all dying to take advantage of.
In RSS in particular, we have seen how attempting to make the more abstract model a core part of the specification (in RSS 1.0) has led to significant unrest among the natives. There is a lesson there to be learned.
The route to the Semantic Web lies in letting a thousand flowers bloom, not forcing us all to instantiate multicellular organisms based on a gene pool ontology.
--> I agree with all the above, except the last. If RDF was going to die, it would already be dead. But the role it fits is necessary and inevitable. If it dies it will have to be recreated in almost exactly the same form, with all the same annoying complexities and contradictions.
Comments and Feedback
Looking good! You might want to check out the ESW Wiki, they're working on RDF FAQs and Dan Brickley's FOAF blog - may be a Wiki too) - sorry can't get links at the moment, my connection's playing up. DannyAyers
Why not WikiAsYouLearn?
It's good that you've got notes here, but you should integrate them with the ESW wiki.
That way you learn, we learn, we all learn.
-- 22.214.171.124 2004-07-03 04:32:40