- Purpose of this document
- Some places I find citations
- Common data representations of citations
- Styling
- NLM
- Information about citation styles
Purpose of this document
We want to find a unified way to handle bibliographic citations. Users should be able to move citations between different computation environments seamlessly and format them into the desired styles easily.
I'm trying to summarize where we are as a community at moving towards such a goal.
Some places I find citations
-
URLs in web pages
-
OpenUrl coming from library systems
-
EndNoteProgram and related bibliographic data managers
-
RefWorks, which the UcBerkeley campus has a cite license for.
-
OpenOfficeOrg, especially in its OpenOfficeOrg/BibliographicProject, managed by BruceDarcus
People will want to format the citations for different purposes.
A (seemingly) reasonable approach is to come up with a common data representation and then transform from that format into the various styles. I'm trying to figure out how far people have gotten with this approach and where we can go next.
from Bruce: actually, a better approach (seen in BiblioX) is have an internal formatting data model that one maps different bib formats to. It's impossible to ask everyone to standardize on only one data representation, in part because bibliographic metadata serves somewhat different purposes than the data model of the citations that get formatted with it.
Handling bibliographic citations is an important issue in the ScholarsBox development realm.
Common data representations of citations
If I use various journal sites that allow for teh download of a citation, one can get at a list of some formats that are perceived to be important. For an example article in
Science, we see the options for:
-
Endnote
-
Reference Manager (RIS format)
-
ProCite
-
BibTeX
-
Medlars
Looking at
PLoS Biology: Citation Formats (Lander AD (2004) A Calculus of Purpose. PLoS Biol 2(6): e164
10.1371/journal.pbio.0020164), we get the following format:
-
BibTeX
-
RIS
-
Refer
-
RefWorks (which is particularly intriguing)
Note from SteveToub:
BibUtils, written in C, can transform between MODS, Endnote, RIS and BibTeX data formats.
Styling
Naively, I would think that one should be able to create citations in a common XML format and then write XSLT stylesheets to transform them into common formats. But the comment from Judith Bush of RedLightGreen gives me pause (
Tuna Breath: Updates):
-
Shouldn't a simple stylesheet be a trivial thing? I think so, but apparently the commercial citation formats have historically required much care and feeding. (Insert grumble about proprietary systems and the virtue of Open systems, standards, and sources.) I have to admit, just developing the style sheets for generating the citations themselves have been trick as library delimited and marked up data originates from rather different rules and principles than citations.
Note from Bruce: I've written an XSLT stylesheet to format MODS records, and it's not that hard, aside from handling examples like (Doe, 1999a, 1999b). It does require a good bibliographic format (which MODS is) properly coded, which is the problem that Judith seems to be pointing to. Consider, for example, that a MODS record sourced from the LoC may have a title that is coded like so: <title>Some title [additional stuff]</title>, have funky punctuation, etc.. That additional stuff doesn't go in the citation (and arguably doesn't belong in the XML either. Note, though: the problem here has nothing at all to do with XML or XSLT; it's a problem of the MARC source data, which Endnote users also have to deal with if they use its z39.50 support. Indeed, when I used that app, I spent almost as much time cleaning up the data sourced from MARC as I did entering new records from scratch!
BTW, BiblioX is a project to do XSLT-based formatting, and has an XML style spec language analogous to BibTeX .bst files, or the binary style files in Endnote.
RedLightGreen offers the following formats:
-
MLA
-
Chicago
-
Turabian
-
APA
Others of interest:
-
NLM
-
CBE:
CBE Citation Guide
I wonder how hard it is to get a repository of machine-actionable style files. Buy it from
Refworks Web Based Bibliographic Management Software?
NLM
Sample PubMed Central Citations - XML Tagged:
-
This version shows how the citations should be tagged according to NLM's Journal Publishing DTD using the <nlm-citation> element. An
See also:
<nlm-citation> seems to be used in the PublicLibraryOfScience journals. For example, if you look in the
XML version of
PLoS Biology: A Calculus of Purpose, you will see the first reference:
-
Alon U, Surette MG, Barkai N, Leibler S (1999) Robustness in bacterial chemotaxis. Nature 397: 168–171.
based on the following XML:
<ref id="pbio-0020164-Alon1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alon</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Surette</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Barkai</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Leibler</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Robustness in bacterial chemotaxis</article-title>
<source>Nature</source>
<year>1999</year>
<volume>397</volume>
<fpage>168</fpage>
<lpage>171</lpage>
</nlm-citation>
</ref>
Wow, I did not know that the biologists have been so active/engaged in the world of XML-marked up citations. A little poking around led me to the following links (which i plan to follow up at some point):
-
Journal Publishing DTD (used in the PLoS article)
-
The National Center for Biotechnology Information (NCBI), a center of the National Library of Medicine (NLM), created the Journal Publishing Document Type Definition (DTD) with the intent of providing a common format for the creation of journal content in XML.
-
The National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM) created the Journal Archiving and Interchange Document Type Definition (DTD) with the intent of providing a common format in which publishers and archives can exchange journal content.
