Library of Congress

Authorities & Vocabularies

The Library of Congress > Linked Data Service > Technical Center: Downloads

Downloads

Bulk downloads are available from the main downloads page. There are compressed files for both RDF/XML and N-Triples serializations for each hosted authority and vocabulary.

Downloads for individual concepts or headings are possible by:

  1. Using a conventional, graphical web browser when viewing a concept or heading of interest. Links are provided to enable a download.
  2. Rendering the XHTML source for a given concept or heading through a RDFa-aware processing tool, such as the RDFa Distiller Offsite link offered by the W3C. For example, "Bahia grass", when processed by the RDFa distiller, yields this result.
  3. Using a non-graphical user agent via the URIs assigned to the Concepts and ConceptSchemes. Examples include command-line tools like cURL Offsite link and wget Offsite link. Other alternatives include user agent libraries specific to programming languages. e.g., Perl: LWP::UserAgent Offsite link, Java: HttpClient Offsite link, Python: httplib Offsite link, etc.

The non-graphical approach is advantageous, and represents the main design principle for the Linked Data Service. It allows programmatic access through machine-readable scripting instead of requiring human interaction. Using this method, it is possible to use HTTP content negotiation Offsite link over a single URI to obtain a concept or heading, and to optionally serialize it in a variety of formats. This is achieved through use of the HTTP request header called "Accept". When used on a request with the MIME type of interest, the server response will send the desired format back to satisfy the request. See the supported MIME types and serializations page for more information.

Some command-line usage examples include:

cURL

The first example will download the LC Subject Heading for "ActionScript (Computer program language)" in the JSON serialization. The second example will download the "creation" concept from the Preservation Events vocabulary, also as JSON.

  • curl -H 'Accept: application/json' http://id.loc.gov/authorities/subjects/sh00000011
  • curl -H 'Accept: application/json' http://id.loc.gov/vocabulary/preservationEvents/creation
wget

The first example will download the LC Subject Heading for "ActionScript (Computer program language)" in the RDF/XML serialization. The second example will download the "creation" concept from the Preservation Events vocabulary, also as RDF/XML.

  • wget -S -d 'http://id.loc.gov/authorities/subjects/sh00000011' \
    --header="Accept: application/rdf+xml"
  • wget -S -d 'http://id.loc.gov/vocabulary/preservationEvents/creation' \
    --header="Accept: application/rdf+xml"

Data updates

An update feed is available for each dataset officially published in ID.LOC.GOV. The feeds are serialized in the ATOM format with each page including 100 links to resources ordered by descending modification date. The first page of the ATOM update feed for a given dataset is accessible by appending "/feed/1" to the scheme URI. Examples:

Extracting labels from HTTP response headers

It is possible to determine the preferred label for a given concept or heading of interest without necessarily needing to download the entire RDF content. Requesting a concept URI with a HTTP HEAD method exposes a private header called "X-PrefLabel", that is a URL-encoded representation of the preferred label.

For example, running cURL with the "-I" argument on the URI for "Bahia grass" performs a HTTP request using the HEAD method.

  • curl -I http://id.loc.gov/authorities/subjects/sh93007391

HTTP HEAD requests return the HTTP response only, sans the body of the RDF or XHTML content. Among these headers, one would see a given header "X-PrefLabel: Bahia%20grass". It is possible to use a HTTP library within the programming language of your choice to access this header. URL-decoding the value of the X-PrefLabel header yields the string "Bahia grass".

Why use dereferenceable URIs?

Having a dereferenceable URI for these Concepts -- and the ConceptSchemes to which they belong -- greatly enhances the ability to provide web services for consumers of our standards, whether working in XML, or in another technology.

When building a web service, you need dereferenceable URIs to use the service. This URI might as well represent the data you are aiming to use. Although you could use non-dereferenceable URIs in conjunction with dereferenceable Handles Offsite link or the like, it is unnecessary weight to manage. Handles, etc., are not always needed for URIs under your own control. This holds true if you have the ability to rewrite URIs at the server level as data location changes over time, and build with the notion of content negotiation over RESTful URIs from the start.

Special Note about the Library of Congress Subject Headings

The Library of Congress Subject Headings was the first inclusion for the LC Linked Data Service. It was an almost verbatim re-release of the system and content once found at the popular prototype lcsh.info service. The primary exception is the form of the URIs.

Old:
http://lcsh.info/{identifier}
New:
http://id.loc.gov/authorities/{identifier}

If you have used the legacy lcsh.info metadata in an application, we advise updating to the new URIs, as we cannot guarantee a permanent redirect from old lcsh.info URIs to the new URIs at id.loc.gov.