- David Walker's suggestions
- Trying out the OpenURL Referrer extension
- Reading about OpenURL
- Specific example to compare 0.1 to 1.0
- Learning about MARC 21, MARCXML, OpenURL 1.0, 0.1 etc
- MARC examples from MetaLib
- MODS versions of the MARCXML examples from MetaLib
- Paths for generalization of this work
- use of 773 for journals
- OpenURL 0.1 vs 1.0
- Correspondence with Walt Crawford on MARCXML and OpenURL
- A start at mapping MARC to OpenURL
- Examples of stuff coming out of 773$g
- MARCXML, MODS, and representation of serials metadata
- Our best effort at a mapping
- author handling
- title handling
- volume, number, page handling
- other fields: ignore?
- other bibliographic info that might be useful for citation but not necessarily for OpenURL (listed in CDL)
- Conclusions with respect to MARC and OpenURL
- Current Next Steps
As part of integrating the Scholar's Box with MetaLib and part of understanding the interrelationships among bibliographic metadata, we are trying to figure out how to construct an OpenURL from the MARC XML coming from the MetaLib X-Server. Here's a query I sent out to a MetaLib X-Server development list:
-
How have folks created OpenURLs from the metalib X-server search results? [....] Did folks create the OpenURL from the OAI MARC (or original MARC) output from X-Server? Is there a call to SFX that one can make to do that work?
David Walker's suggestions
David Walker from CSU San Marcos has done some work in this area. He has kindly shared some info, which I quote here with his permission:
-
The trick is that not all of the information in the MARC record is sufficiently parsed to make a direct OpenURL request to SFX. You can pretty easily get at the ISSN/ISBN, book or article title, journal title, and year, since those reside in separate, distinct fields.
Some of the most pertinent information for constructing a full-text link, however -- including the volume, issue, and start page of a journal article -- are usually included in a single field by most databases, and these then need to be parsed out to construct an OpenURL.
Trying out the OpenURL Referrer extension
I need to remind myself of the intricacies of OpenURLs(s) and the MARC XML spec, so some of what I write here is geared to bringing myself up to speed again.
To that end, I have tried installing the
Openly's OpenURL Referrer FireFoxBrowser extension:
-
Perhaps your local library subscribes to an electronic database that carries the article you are interested in. OpenURL Referrer, a new extension for the Firefox web browser, adds a link to GoogleScholar's results page that points to your library's full-text copy of the article.
This extension is interesting to me because of its implementation of both the 0.1 and 1.0 versions of OpenURL -- and because it does so in the context of GoogleScholar, the intriguing new kid on the block when it comes to MetaSearch.
OpenURL Referrer problems
Thomas P. Ventimiglia, the author of the extension, has been extremely helpful in tracking down the issues. We've still not gotten to the bottom of the problem(s) yet (some of which may be more Firefox problems or that of another extension -- we don't know yet. There is behavior that bring up the matter of how Firefox extensions interact with each other, a topic little discussed in my limited view of things.
At any rate, the extension is working well enough for me to use to hook up to the UC E-links server and also look at the format of OpenURL 1.0.
Reading about OpenURL
In the meantime, I did download the extension and printed out some code to study it.
I've also printed out
Ex Libris - OpenURL Syntax to formally study the OpenURL 0.1 syntax. Searches for good, simple info on OpenURL 0.1 and the putatively much more complicated 1.0 led me to Walt Crawford, his
OpenURL - Brief Bibliography, which, in turn, points to a
2-page description of OpenUrl. To help me understand OpenURL version 1.0, I plan to read
Z39.88-2004: The OpenURL Framework for Context-Sensitive Services The Key/Encoded-Value (KEV) Format Implementation Guidelines
Specific example to compare 0.1 to 1.0
In the tradition of google vanity searches, I will use a
search for Yee and Beaubien on GoogleScholar as an example. With the
Openly's OpenURL Referrer extension installed, I get the following OpenURLs:
Now if I break down the pieces of respective OpenURLs to look how the key/value pairs compare between the two versions of OpenURL. The first table contains elements that have analogs between the two versions of OpenURL
| Thing | 0.1 | Thing | 1.0 |
| resolver | http://ucelinks.cdlib.org:8888/sfx_local | 1.0 resolver | http://sirsi-resolver.sirsi.net/ |
| sid | openly:openurlref | rfr_id | info:sid/openly.com:openurlref |
| genre | article | rft.genre | article |
| title | Library%20Hi%20Tech | rft.jtitle | Library%20Hi%20Tech |
| date | 2004 | rft.date | 2004 |
| atitle | A%20preliminary%20crosswalk%20from%20METS%20to%20IMS%20content%20packaging | rft.atitle | A%20preliminary%20crosswalk%20from%20METS%20to%20IMS%20content%20packaging |
| aulast | Yee | rft.aulast | Yee |
| auinit | R | rft.auinit | R |
There are also extra terms in the OpenURL 1.0:
| url_ver | Z39.88-2004 |
| rft_val_fmt | info:ofi/fmt:kev:mtx:journal |
| rfe_id | http%3A%2F%2Fscholar.google.com%2Fscholar%3Fhl%3Den%26lr%3D%26q%3Dyee%2Bbeaubien%26btnG%3DSearch |
| rft_id | http%3A%2F%2Fwww.ingenta.com%2Fisis%2Fsearching%2FExpand%2Fingenta%3Fpub%3Dinfobike%3A%2F%2Fmcb%2F238%2F2004%2F00000022%2F00000001%2Fart00008 |
| url_ctx_fmt | info:ofi/fmt:kev:mtx:ctx |
While we're at it, we should take a look at the type of URLs returned by GoogleScholar
-
http://scholar.google.com/url?sa=U&q=http://ieeexplore.ieee.org/iel6/22/24953/01127145.pdf
-
http://scholar.google.com/url?sa=U&q=http://dx.doi.org/10.1126%252Fscience.280.5360.109 -- one with a DOI, which gets an extra id=doi:10.1126%2Fscience.280.5360.109 in OpenURL v0.1 and an extra rft_id=info:doi/10.1126%2Fscience.280.5360.109 in OpenURL 1.0
Once I now have seen working OpenURLs up close, it's good to take a look at the full range of possibilities:
|
value |
description |
|
|
genre |
bundles: |
|
|
journal |
a journal, volume of a journal, issue of a journal |
|
|
book |
a book |
|
|
conference |
a publication bundling proceedings of a conference |
|
|
individual items: |
||
|
article |
a journal article |
|
|
preprint |
a preprint |
|
|
proceeding |
a conference proceeding |
|
|
bookitem |
an item that is part of a book |
|
|
aulast |
A string with the first author's last name |
|
|
aufirst |
A string with the first author's first name |
|
|
auinit |
A string with the first author's first and middle initials |
|
|
auinit1 |
A string with the first author's first initial |
|
|
auinitm |
A string with the first author's middle initials |
|
|
issn |
An ISSN number |
|
|
eissn |
An electronic ISSN number |
|
|
coden |
A CODEN |
|
|
isbn |
An ISBN number |
|
|
sici |
A SICI of a journal article, volume or issue. Compliant with ANSI/NISO Z39.56-1996 Version 2 (see http://sunsite.berkeley.edu/SICI/) |
|
|
bici |
A BICI for a section of a book, to which an ISBN has been assigned. Compliant with http://www.niso.org/bici.html |
|
|
title |
The title of a bundle (journal, book, conference) |
|
|
stitle |
The abbreviated title of a bundle |
|
|
atitle |
The title of an individual item (article, preprint, conference proceeding, part of a book ) |
|
|
volume |
The volume of a bundle |
|
|
part |
The part of a bundle |
|
|
issue |
The issue of a bundle |
|
|
spage |
The start page of an individual item in a bundle |
|
|
epage |
The end page of an individual item in a bundle |
|
|
pages |
Pages covered by an individual item in a bundle. The format of this field is ' spage-epage' |
|
|
artnum |
The number of an individual item, in cases where there are no pages available. |
|
|
date |
YYYY-MM-DD YYYY-MM YYYY |
The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYY-MM-DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year. |
|
ssn |
winter | spring | summer | fall |
The season of publication |
|
quarter |
1 | 2 | 3 | 4 |
The quarter of publication |
Learning about MARC 21, MARCXML, OpenURL 1.0, 0.1 etc
The next big goal of my crosswalking MARC XML to OpenURL work is to produce a table that maps out how we are going to construct OpenURLs from the MetaLib MARC XML.
MARC examples from MetaLib
Let me include three examples and then pull out the salient details for translating them into OpenURLs:
-
"ATE battles soaring IC device complexity" (an article in a series)
-
Web of Science reference (which yields a URL but not much parsable bibliographic metadata)
-
Milosz's ABC (a book)
MODS versions of the MARCXML examples from MetaLib
Using
MODS v3 to MARC21Slim transformation, and applying them to the three examples:
| MARCXML | MODS |
|
|
|
|
|
|
|
|
|
Note from
"ATE battles soaring IC device complexity" the use of relatedItem->part->text in MODS to store the information in 773$g from MARC.
Paths for generalization of this work
Down the road, we want to figure out how to generalize our work in at least three directions:
-
how to get rid of the hardcoding of the OpenURL resolver (through, for example, tapping into DanChudnov and JeremyFrumkin's work on
Service Autodiscovery for Rapid Information Movement and
Appropriate Resolvers, Dynamically: Adding rel and title attributes to OpenURLs. A Prototype.
-
figure out how much the MARC XML coming out of MetaLib is actually representative of MARC XML metadata (either from other MetaSearch platforms or other MARC sources like Melvyl.)
-
figure out how not to have to handle on a source-by-source basis search results coming from the wide variety of sources we have hooked up to MetaLib.
-
coming up with APIs that are not MetaLib-specific, ones that can span various MetaSearch platforms
Before I dive into documenting the MARC examples coming from MetaLib, I want to make sure I have a reasonably solid understanding of the MarcSpec, MarcXmlSpec, OpenURL 0.1 and OpenURL 1.0. I know that I whip something together to translate among the various metadata formats -- but since I'm interested in interoperability among bibliographic metadata, I'm taking time now to carefully explicate the various models.
Is there an XML representation of the OpenURLs (for representing the data elements? for the embedding of OpenURLs as bibliographic metadata? I've seen DanChudnov pull one together. Is there an official respresentation?
What's the significance of the word "slim" in the various MARCXML schemas I see. Is the following an answer?
Cover Pages: Library of Congress Publishes MARC 21 XML Schema and Transformation Tools.: W
-
With the slim approach, schema-driven validation is only possible at the highest structural level. The Network Development and MARC Standards Office will therefore maintain downloadable tag, subfield, and value validation software on the web site that will enable users to build validation programs for their needs. Use of these standard validations represent another attempt to assure standardization of records to support effective record interchange.
To make sense of the various datafields and subfields in MARC 21, I'm consulting
MARC 21 Concise Format for Bibliographic Data. For example, 856 $u is the URL/URN:
Holdings, Location, Alternate Graphics, etc. Fields (841- 88X):
-
$u - Uniform Resource Identifier (R) The URI, which provides standard syntax for locating an object using existing Internet protocols. Field 856 is structured to allow for the creation of a URL from the concatenation of other separate 856 subfields. Subfield $u may be used instead of those separate subfields or in addition to them.
Subfield $u may be repeated only if both a URN or a URL or more than one URN are recorded.
Understanding MARC Bibliographic: Parts 7 to 10 includes
A Summary of Commonly Used MARC 21 Fields and
A List of Other Fields Often Seen in MARC Records.
use of 773 for journals
Linking Entry Fields (76X-78X):
-
Information concerning the host item for the constituent unit described in the record (vertical relationship). In the case of host items that are serial or multi-volume in nature, information in subfields $g and $q is necessary to point to the exact location of the component part within the bibliographic item.
OpenURL 0.1 vs 1.0
/usr/lib/info || Comments || OpenURL Standard goes to ballot is KarenCoyle's nice distillation of pros and cons:
-
Depending on your point of view, the following might be considered "negatives" of the new OpenURL standard:
-
The simplicity of the old standard is gone. This one is highly formalized and rather difficult to read. It takes a while to get used to the terminology (referent, referrer, referring entity...).
-
The registry and profiles are required for the functioning of the OpenURL. This adds overhead to the management of the standard itself.
-
The rules for profiles and registry entries are such that once they are established they cannot be modified. Any changes or additions become a new entry. This could lead to a proliferation of profiles or metadata formats.
-
You are no longer limited to the very simple metadata elements of OpenURL 0.1. In fact, it is now possible to exchange full MARC records using the "by-reference" function.
-
New metadata formats can be added fairly easily. The registry today has formats for patents and for dissertations, as examples.
-
The old OpenURL still works. The standard was designed to allow the old format to be continued, but it is frozen with its current set of metadata and cannot be added to.
Conceptual foundation for OpenURL 1.0:
Generalizing the OpenURL Framework beyond References to Scholarly Works: The Bison-Futé Model
Correspondence with Walt Crawford on MARCXML and OpenURL
I posed the question of whether there are
any MARCXML to OpenURL crosswalk? on the
OpenURL Mailing list, to which WaltCrawford
replied:
-
I don't know about "definitive work," but the Eureka "mapping"--the MARC elements used to generate OpenURL 0.1 elements (and 1.0 elements, since there's no real difference at this point), is publicly available from our website. The current address is: http://www.rlg.org/openurl.html
Go to "Structure" section.
I followed up with Walt in private email with the following question:
-
Thanks for your response -- your "mapping" is very helpful to me!
Do you think that there is any interest (or value) in looking carefully at crosswalking MARC and the OpenURL elements? That is, we can do pragmatic mappings of various sorts, but can we expect to get better agreement on the mappings across the relevant communities?
Walt then answered in his email (which he kindly gave me permission to quote):
That's a tough question, and I'm not sure I would have a reasonable answer. I think this is inherently a one-way mapping: There would be very little point in translating OpenURL metadata back to MARC21, as far as I can see. (I may be missing something: That frequently happens.) As to general agreement on the mappings, well, we've put ours out there in public, and it's based on my 25 years of experience with MARC (before doing the mapping, that is). I've pointed at least one would-be OpenURL source to that page. If I was new to the game, and I had a mapping available, I sure wouldn't reinvent that particular wheel. I can't imagine that RLG would object to having that portion of the page copied to a more general site (although I'd have to ask!) and be offered as a general model, since I don't believe I've seen any other general models for MARC=>OpenURL mapping. My mapping is as complete as possible--and that may be because we didn't partner with anybody in doing our OpenURL implementation (except, that is, colleagues at California Digital Library and the University of Chicago to look over my spec and see whether it made sense). Thus, we weren't aware of any shortcuts we could take, so we didn't take any. Most OpenURL sources except online catalogs--particularly article-level databases--almost certainly don't store data in MARC21. For them, the mapping is useless. I don't know whether OCLC has published their mapping for FirstSearch (I couldn't readily find it). Online catalog vendors tend to regard everything as proprietary information, although this might be an exception. I'm not sure that I see "pragmatic mapping" as a problem. For that matter, the Eureka mapping was done with deep familiarity of the MARC formats, and was based on where the data should reside, rather than an analysis of actual databases: It's as much a theoretical mapping as a pragmatic mapping. The big and, I believe, somewhat insoluble problem in MARC=>OpenURL mapping is the 773$g. Because the syntax for that field, which combines year, volume, issue, and pagination, is either undefined or ill-defined (MARC21 rarely specifies internal syntax for a textual subfield!), all mappings are inherently pragmatic. We've refined our algorithms somewhat as we discover nuances of unusual databases, but I've accepted that we will never get the mapping right in 100% of article-level records. What we can do and have done is encourage database providers to follow data entry practices that make extraction feasible. (Here, again, most database producers may not have this problem: They probably store the data in separate elements, where we store in MARC21.)
I then answered:
Thanks, Walt, for your very helpful answer. I'm certainly coming at this problem without much experience with MARC and how it is actually used -- so your long experience is what is needed, I think, to get at the relevant issues. All I've been working with so far is the MARCXML coming from Ex-Libris' MetaLib product. Hence I didn't know whether it is common to use MARC at all to hold article level metadata. Thanks, also, for confirming what I had perceived to be the problematic nature of 773$g. (That makes me wonder: is this problem shared by MODS too?) In terms of going from OpenURL to MARC21 -- it might not be that useful. I am interested in the issue of how to pull together bibliographic metadata from disparate sources (including places like Amazon.com and Google Scholar). I have been wondering about the use of MARCXML or MODS as a hub format and therefore the pragmatics of translating OpenURLs into MARCXML or MODS. Can you find out whether it would be ok me to quote your mapping on my wiki (with appropriate attribution, of course)? It's interesting to me that "Online catalog vendors tend to regard everything as proprietary information, although this might be an exception." Also, your email was so helpful to me -- and I think it might be helpful to others -- can I quote your email on my blog?
Walt then responded:
You can certainly quote my email on your blog (although it's just my *sense* that things like mappings tend to be regarded as proprietary, based possibly on my attempts to find out what "relevance" means to various vendors). You can certainly link to the Eureka page with the mapping, without permission: It's part of the open web. I'll send a quick note to the parties that would be involved to see about copying the section--I can't imagine it would be a difficulty, but it may take a while to get an answer. A caveat here: I don't know much about MARCXML, I know even less about MODS (but others here know more), and thus I may be in over my head at times. But we're all learning in different ways, I guess. Anyway, I'll ask about quoting the mapping on your wiki and get back to you as soon as possible. If you haven't heard from me in a week or two, bug me...
A start at mapping MARC to OpenURL
The following chart does not pull together everything we know about crosswalking MARC and OpenURL -- but is my own start at looking at the relationships. (I've drawn also from DavidWalker's XSLT.)
In the MetaLib MARC XML:
-
the YR element is a year (of publication?)
-
the SID element is for the ID of the source; SID$b gives the source name
-
title and subtitle come from 245$a and 245$b respectively
-
the edition comes from 250$a
-
the author is from 100$a
-
260$a gives the place of publication
-
260$b is the publisher
-
260$c is a date
-
300$a gives the extent of the pages (number of pages) (R)
-
the ISBN can be pulled from 20$a
-
20$b gives the "terms of availability" (often a price)
-
20$z lists cancelled/invalid ISBN
-
use of 773 to indicate journal information
-
773$a Main entry heading
-
773$t Title
-
773$g Relationship information -- might include dates, volume, etc -- source specific
-
773$h Physical description
-
773$x for ISSN
I've not attempted to reconcile this mapping with that of WaltCrawford's yet -- or for that matter, a
Rules for constructing a MARC record from an OpenURL written by Mary Heath (?) of the CDL. I will also be working with TomSchirmer as he implements our mappings to document that work.
Examples of stuff coming out of 773$g
TomSchirmer has pulled together the following list of sample 773$g entries that come from MetaLib in an effort to figure out a good parsing strategy:
Feb 2005, v143 i2, p84(2) Feb 10, 2005, pNA Jan 27, 2005, pNA Jan 27, 2005, pNA Jan 24, 2005, pNA Jan 19, 2005, pNA Jan 14, 2005, pNA Jan 13, 2005, pNA Jan 6, 2005, pNA Feb 2005, v143 i2, p84(2) Jan 11, 2005, pA18(L), col 06 (6 col in) Jan 7, 2005, pA22(L), col 04 (4 col in) Jan 4, 2005, pA19, col 02 (19 col in) Dec 27, 2004, v76 i53, p19(1) Winter 2004, v7 i4, p50(1) Jan 20, 2005, pB2, col 03 (15 col in) Jan 15, 2005, pB14, col 01 (11 col in) Jan 15, 2005, pB14, col 01 (11 col in) Jan 9, 2005, pBU26, col 01 (28 col in) Jan 2, 2005, pAR30(L), col 03 (24 col in)
MARCXML, MODS, and representation of serials metadata
With MARCXML, we got the 773$g problem. (That is, metadata such as volume, issue number, page range are glommed together). How is the possible conflation of volume, number, year handled in MODS?
I'm working to understand how well journal citations are handled in MODS, as opposed to MARC XML -- in this case, MODS seems to be richer and MARC21/MARC XML
It might be useful to just convert MARCXML to MODS first, making it easier to understand.
Presumably using human-friendly tags in MODS will make it easier for me to handle the bibliographic metadata. Moreover, there is a lot of communal wisdom presumably the human documented
MARC Mapping to MODS (Library of Congress) and
MODS to MARC Mapping (Library of Congress).
I was under the impression that the MARCXML model is richer (a proper superset) of the MODS semantic model. That is, anything that is expressible in MODS is expressible in MARCXML -- but not the other way around. Perhaps I'm confusing richness with granularity -- since conceivably one can stick in all sorts of stuff into fields. The following comment from
MODS to MARC Mapping (Library of Congress) is apropos:
-
This document is intended for use in first converting a MODS record to MARCXML and then to a MARC 21 (ISO 2709) record. Since MODS data is not as granular as MARC 21 data and some MARC 21 subfields are concatenated into one MODS field, some data may map to inappropriate MARC 21 fields or subfields. Consequently, some conversions will result in some loss of data identification.
To help me understand concretely the relationship MODS and MARCXML specifically in regards to the handling of serials metadata (and the 773 tags in MARC), I used the XSLT
MODS v3 to MARC21Slim transformation to transform the
sample MODS encoding for an "article in a serial" (Neil Brenner. "
The Urban Question: Reflections on Henri Lefebvre, Urban Theory and the Politics of scale" International Journal of Urban and Regional Research, June 2000, vol. 24, no. 2, pp. 361-378 (27 pages in length) to get
MARCXML version. Notice the loss of the end page info in the MARCXML version. Is there no explicit slot in MARCXML for the end page, and therefore the end page is dropped in the mapping? Or is there an oversight in the translation? (I would think that the writer of the XSLT could have chosen to mixed the end page into the 773$g slot.
The following extract from the
MODS User Guidelines seems relevant here:
-
"part" contains data to enable detailed coding for generating citations about the location of a part within a host/parent item. It is only used when the related item type attribute is "host" and may be in either parsed or textual form. <part> is roughly equivalent to MARC 21 field 773, subfields $g (Relationship information) and $q (Enumeration and first page), but allows for additional parsing of data.
Our best effort at a mapping
The following MarcXml to OpenUrl mapping represents my synthesis of the work of TomSchirmer, DavidWalker, WaltCrawford (as written in
Setting up OpenURL for Eureka) and
Mary Heath (I believe) of the CaliforniaDigitalLibrary () and
Ex Libris - OpenURL Syntax.
To borrow from WaltCrawford, we form an OpenURL of the following form:
-
http://{baseurl}?[sid=(sid)][&genre={genre}][&sici={sici}][&isbn={isbn}][&issn={issn}] [&title={title}][&stitle={stitle}][&volume={volume}][&issue={issue}][&spage={spage}][&pages={pages}][&date={date}] [&aulast={aulast}][&aufirst={aufirst}][&auinit={auinit}][&auinit1={auinit1}][&auinitm={auinitm}] [&atitle={atitle}]
baseurl: hardwired (for the user's institution) or dynamically associated (see OpenUrl/RepurposabilityDemo)
sid: needs to be generated to give the resolver the appropriate sense of the generator of the OpenURL. (I'm not totally clear on how to use it yet)
genre: The possible values are one of: journal, book, article, bookitem, conference, proceeding, preprint
The RLG approach: '"article," "journal," "book," or "bookitem" if the record indicates one of those genres. Omitted otherwise. Based on MARC21 field 773 data and leader data.'
The CDL approach: " looks for the presence of a 773$t, not the genre, to determine how to handle an item. If a 773$t exists in the record, PIR treats the item as a piece of a larger work."
author handling
aulast The basic place to look is 100$a. Crawford mentions
700$a: "first author's last name, typically 100$a or 700$a up to but not including the first comma."
I guess that it's not too hard to look for the last name (look for the name before the comma) -- but it can be a real challenge to get at the first and middle names. There are many names that don't fit the last name/first name + middle initial model. For example, here are some examples of names from Melvyl:
-
Aguilar, Pedro de
Fernández de Andrada, Pedro
Rush, Bonnie.
Yang, Hai-ying,
Erra Pater
For the given names, the possible fields are:
-
aufirst
auinit
(first and middle initials) auinit1
auinitm
First thing to note as Crawford does is the range of permitted values for the given names: "An OpenURL query may contain aufirst and auinitm; aufirst by itself; auinit, auninit1, or none of these. It will not contain any other combination."
One of the challenging issues with names is that the various name fields have to be parsed out. Crawford wrote that they are "parsed from 100$a or 700$a". The general proper parsing of names in all their varieties, both within an single linguistic or across many linguistic/social contexts is a hard problem.
issn Crawford wrote: "does include hyphen. Taken from 773$x if present, 022$a otherwise."
isbn Crawford wrote: "does not include hyphens. Taken from 773$z if present or otherwise from 020$a." (if genre is book)
title handling
There are two major cases. For an article, the article title is in 245$a, and the journal title should be in 773$t. For books, the titile is in 245$a.
title. Crawford wrote "taken from 773$t if present; 245$a otherwise".
atitle Crawford wrote "taken from 245$a if 773 is present."
stitle ("The abbreviated title of a bundle") Crawford wrote "abbreviated title, taken from 773$p if present, 210$a otherwise."
volume, number, page handling
-
volume
-
RLG: "parsed from 773$g" Heath:
-
don't know what is volSuppl CDL: use 773$v (hmmm....not listed in http://www.loc.gov/marc/bibliographic/ecbdlink.html#mrcb773)
-
RLG: " parsed from 773$g." CDL: 773$g
-
epage
-
RLG: "parsed from 773$g and included only if pagination does not include a dash." CDL: puts stuff into 300$a
-
RLG: "parsed from 773$g and included only if pagination does not include a dash." CDL: puts stuff into 300$a
date
-
Crawford: "parsed from 773$g; if unable to parse or 773 does not exist, taken from 260$c, 261$d, 262$c, or 008."
Mary Heath: publication year: 773$d, also look in 260$c
sici
-
Crawford: "present only if record contains 024$a with 1st indicator "4"."
other fields: ignore?
-
artnum "The number of an individual item, in cases where there are no pages available."
bici "A BICI for a section of a book, to which an ISBN has been assigned. Compliant with http://www.niso.org/bici.html"
ssn (winter | spring | summer | fall)
quarter (1 | 2 | 3 | 4)
other bibliographic info that might be useful for citation but not necessarily for OpenURL (listed in CDL)
-
publisher 260$b
placeOfPublication 260$a
edition 250$a
Conclusions with respect to MARC and OpenURL
I am working right now to finish up my documentation of my work on crosswalking MARC XML to OpenURL. Not that I've figured out everything there is to know or to discover with respect to translating MARC XML to OpenURLs. Rather, I've learned enough and want to move on to other problems.
Let me just start with some conclusions and then work backwards to justify them:
-
There is no perfect mapping of MARC XML to the elements in OpenURl, certainly in practice and probably in theory. That's really not surprising. The specifications were not developed at the same time or for the same purposes. Since when have kindred specification ever been totally interoperable? So immediately, there are bounds on theoretically how interoperable the specifications are. On top of that, there are the funny, real-life practical things people do with specifications. In particular, in MARC, it would seems that 773$g is a big challenge. As WaltCrawford wrote me in email (which he kindly permitted me to quote here):
-
TomSchirmer is going to implement essentially the mapping given on the RLG site: http://www.rlg.org/openurl.html#pointers I will want to document how we might do things differently. (DavidWalker has provided some good insight into the matter, which I will write here.)
-
I've decided that I will do a very simple scraping of the MARCXML metadata to get out a working OpenURL for now -- but put my hope on working with others to get Ex-Libris to give us an OpenURL.
-
Converting MARCXML to MODS first provides some better human-friendly view on data -- and also allows us to capture the huge amount of work that the Library of Congress has already done on interpreting MARC in terms of MODS. But it doesn't solve the 773$g problem.
-
The big and, I believe, somewhat insoluble problem in MARC=>OpenURL mapping is the 773$g. Because the syntax for that field, which combines year, volume, issue, and pagination, is either undefined or ill-defined (MARC21 rarely specifies internal syntax for a textual subfield!), all mappings are inherently pragmatic. We've refined our algorithms somewhat as we discover nuances of unusual databases, but I've accepted that we will never get the mapping right in 100% of article-level records. What we can do and have done is encourage database providers to follow data entry practices that make extraction feasible. (Here, again, most database producers may not have this problem: They probably store the data in separate elements, where we store in MARC21.)
I still want to provide the details behinds these conclusions.
Current Next Steps
-
track down the issues with the OpenURL extension (in process)
-
finish reading and distilling the documentation around OpenURLs (in process)
-
look at MARC examples
-
ask some colleagues about what they know about MARC to OpenURL crosswalks
