- Why integrate Scholar's Box with NSDL?
- Technical Notes on How we might integrate
- Dean Krafft's response
- Demonstrating a search of NSDL via WebDAV
- Making sense of the Search results
Why integrate Scholar's Box with NSDL?
Well, the NSDL is supposed to be aggregating a huge amount of science, math, engineering related learning content -- so there's a likelihood that users of the ScholarsBox interested in science materials will want to get able to gather/create/share those materials. From a development point of view, the NSDL seems to be a great testbed for our work on interoperability. It should be an ideal environment for building the ScholarsBox because a key goal of the NSDL is provide a rich infrastructure that glues together collections and services. Finally, there's substantial amount of funding for NSDL, for example, in the NationalScienceDigitalLibrary/ProgramSolicitation2004.
Technical Notes on How we might integrate
I've been studying the NationalScienceDigitalLibrary/TechnicalArchitecture to figure out how to integrate the ScholarsBox with NSDL. It's been a bit tricky to find the latest information. The session on
NSDL Services Interoperability and Web Services from the
NSDL Annual Meeting 2003 seems particularly helpful in getting the latest. Let's look at individual talks.
Core Integration Web Services by
Dean B. Krafft of Cornell. It's good to see agreement from the Core Integration people that the current CI infrastructure is heavy and that there is a interest (commitment) to moving towards a more Web-friendly access system:
-
"Open, service-friendly infrastructure"
-
"User access: multiple portals, browser extensions, standard web search" (the browser extensions can be a header for the ScholarsBox in many ways"
-
"Enable many forms of access and contribution ? including ones we haven?t thought of yet"
Dean Krafft gives an example of a RESTful approach to getting the OAI Record of a NSDL record -- given the OAI ID. For example,
Why It's Essential, a lesson plan on seasons, is indexed by NSDL with an OAI ID of oai:nsdl.org:dlese.org:oai:dlese.org:DLESE-000-000-004-326. You can thus get the
corresponding NSDL metadata record. The obvious immediate question is how to do a query to get the OAI ID in the first place, a question that seems to be in the process of being asked and answered ("What other queries should we support?") Alternatives mentioned: search engine style, SQL, or XQuery. (Note the ImsGlobal/DigitalRepositoriesSpec and the ECL implementation by EduSource uses XQuery....)
Current conclusion: the NationalScienceDigitalLibrary/ProgramSolicitation2004 still looks promising. I just send some email to Dean Krafft to ask him some questions:
-
any other services implemented via REST or SOAP that we can use so far? (We can prototype access via the ScholarsBox.)
-
in the formal materials for NSDL, it seems that there is an implicit assumption that end-user access to the NSDL is going always be mediated by "portals" that are created by institutions. The quote "Enable many forms of access and contribution ? including ones we haven?t thought of yet" seems to imply a move away from that stance. Is that right?
Dean Krafft's response
Dean Krafft kindly replied with a thoughtful and detailed email. With his permission, I quote it an excerpt:
Here's a very quick summary of where things stand. In addtion to the REST access to the OAI server, there is currently a (not very well documented) WebDAV access to the NSDL standard search (using Lucene, being done as a subcontract for Core Integration by the folks at UMass). I've included a simple PHP example script that does a search lookup. That would let you search the NSDL and have your Scholar's Box user be able to pull out interesting bibliographic records to include in their personal collection. We definitely have a SOAP/WSDL interface to search on the project plan - hopefully within the next 6 months. We also have an Archive project, which is archiving snapshots of NSDL content. That keeps a permananent record of potentially ephemeral content sites. Given the OAI ID (same as you would use for the REST lookup of the metadata), you can get an HTML page of the archive. The URL is http://srb.npaci.edu/cgi-bin/nsdl-find.cgi?identifier=oai:nsdl.org:internetsc out:oai:scout.wisc.edu:ScoutArchives-10433 (replace the identifier argument with what you want to get). Unfortunately, that's just a UI version. We are working on a SOAP/WSDL interface that will let you select among the monthly snapshots (and other stuff) - should be out in the next couple of months. We're very actively working on a relationship store and architecture to support stuff like annotations, augmented metadata, and user formed collections. That stuff is at least 6 months off, but it might work very well together with the Scholar's Box once we've got it. In terms of your own repository, have you taken a look at the Fedora Digital Repository work (http://www.fedora.info)? It might fit in to what you need (bias alert - the developers are just a couple of cubicles over). While we've focused on institutional portals for particular communities in most of our descriptions, there is certainly no problem with a "personal portal" or application like Scholar's Box interacting directly with NSDL CI services (search, archive, eventually annotation and declaring relationships). You might want to take a look at a page that CI has put together with information for NSDL proposal writers: http://cinews.comm.nsdlib.org/cgi-bin/wiki.pl?For_Proposal_Writers
Demonstrating a search of NSDL via WebDAV
This leads then to my looking into PythonLanguage/WebDavTechniques to see how we will be able to use WebDAV to do a search. However, as Dean K. then pointed out to me in a subsequent email:
The PHP is pretty incidental to the WebDAV search - mostly you just need to submit the right HTTP request, which you should be able to read out of the code pretty easily. What comes back is a bunch of XML (the search result). I can give you an XSLT that translates it into our own "search results list" on the site, but it should be fairly self-explanatory.
and looking at an except of the PHP code he sent me:
<?php
} else {
$host = "search1.nsdl.org";
$port = 8080;
$path = "/searchserver";
// $clientSID = rand(2000, 10000);
$search = htmlspecialchars(stripslashes($search));
$searchstring = '<ns0:searchrequest xmlns:ns2="NSDL_1.0:"' .
' xmlns:ns1="http://interlib.org/SDLIP/1.0#" xmlns:ns0="DAV:">' .
'<ns1:SearchRequest><ns1:numDocs>20</ns1:numDocs><ns1:query><ns2:request>' .
'<ns2:query>rankBy(avg(' . $search . '))</ns2:query><ns2:fields/>' .
'<ns2:numberToSkip>0</ns2:numberToSkip></ns2:request></ns1:query>' .
// '<ns1:clientSID>' . $clientSID . '</ns1:clientSID><ns1:stateTimeoutReq>0</ns1:stateTimeoutReq>' .
'<ns1:clientSID>1008</ns1:clientSID><ns1:stateTimeoutReq>0</ns1:stateTimeoutReq>' .
'</ns1:SearchRequest></ns0:searchrequest>';
$fp = fsockopen($host, $port, $errno, $errstr, $timeout = 30);
if(!$fp){
//error tell us
echo "$errstr ($errno)\n";
}else{
//send the server request
fputs($fp, "SEARCH $path HTTP/1.1\r\n");
fputs($fp, "Content-Encoding: utf-8\r\n");
fputs($fp, "Content-Type: text/xml\r\n");
fputs($fp, "Content-Length: ".strlen($searchstring)."\r\n\r\n");
fputs($fp, $searchstring . "\r\n\r\n");
//Echo the header on through
fgets($fp); // Substitute for initial header
header("HTTP/1.1 200 OK");
header(fgets($fp));
header(fgets($fp));
//loop through the response from the server
while(!feof($fp)) {
echo fgets($fp, 4096);
}
//close fp - we are done with it
fclose($fp);
}
}
?>
that we might not even have to look too deeply into WebDAV but just mimic the PHP code in Python and get some XML coming out....
See
Search & Discovery Services for the NSDL (word doc) for documentation of SDS.
Making sense of the Search results
We're trying to understand the type of tags we get back from the NSDL metadata repository.
The
NSDL Metadata Primer : NSDL standard metadata indicates:
-
The NSDL Standards Working Group determined that the Dublin Core set of 15 basic elements, their associated element refinements plus the three IEEE elements recommended by the DC Education Working Group, will be the standard set used by the NSDL metadata repository (the "MR"). The MR also stores (and can make available for harvesting by others), any "native" metadata made available by collections or projects.
Given that statement, we expect DC and some DC ED extensions. We get such tags with
sample record but not from the search results (from a search for "season") where we get tags like:
<brandIconURL>http://content.nsdl.org/brands/dlese.org.gif</brandIconURL>
<subject..GEM>Earth science
~^ Geography
~^ Physical sciences</subject..GEM>
<format..IMT>text/html</format..IMT>
<brandWidth>54</brandWidth>
<language..RFC3066>en</language..RFC3066>
<rights..>Copyright 2001 National Geographic Society. All rights reserved.</rights..>
<title..>Why It's Essential</title..>
<relation.conformsTo.>Supports National Council for Geographic Education (NCGE) standard: Physical Systems:The physical processes that shape the patterns of Earth's surface</relation.conformsTo.>
<nsdlUniqueId>oai:nsdl.org:dlese.org:oai:dlese.org:DLESE-000-000-004-326</nsdlUniqueId>
<type..DCMIType>InteractiveResource</type..DCMIType>
<category>item</category>
<brandHeight>30</brandHeight>
<publisher..>National Geographic Society</publisher..>
<brandTitle>DLESE</brandTitle>
<description..>This lesson plan asks students to think about aspects of the changing seasons in their region such as temperature variations, seasonal household chores, changes in foods available at the market, and the length of the days. Students will discuss their experiences with and knowledge of the four seasons; look at pictures of the four seasons and compare those pictures to the seasons in their home region; plan and hold a party commemorating the four seasons; and write stories depicting themselves showing a visitor some of the things they like best about their favorite season.</description..>
There is no namespace qualification and the tags look like but are not identical to DC. Is this tag set a transitional one?
Jon Phipps (one of the developers and formerly the Metadata Repository tech lead at the NSDL):
What you're seeing is the NSDL Search Service's Search Response XML format which is optimized for the NSDL search interface and is a legacy of the limitations imposed by our original search engine. Right at the moment, the best way to get useful metadata is to grab the contents of the <nsdlUniqueId> tag and perform an OAI-PMH GetRecord request: http://services.nsdl.org:8080/nsdloai/OAI?verb=GetRecord&identifier=oai:nsdl.org:dlese.org:oai:dlese.org:DLESE-000-000-004-326&metadataPrefix=nsdl_dc as you indicated above. This will retrieve a Qualified Dublin Core record and this is what we do in our own search results to display the unqualified DC metadata in More Info. We at the NSDL agree that this is, umm, suboptimal and as Dean indicated, we hope to make a number of improvements.
