Library of Congress

Authorities & Vocabularies

The Library of Congress > Linked Data Service > Dataset Descriptions

Available Datasets

The Linked Data Service is to provide access to commonly found standards and vocabularies promulgated by the Library of Congress. This includes data values and the controlled vocabularies that house them. Below are descriptions of each included vocabulary and the ability to search the vocabularies individually.


Library of Congress Subject Headings

Library of Congress Subject Headings (LCSH) has been actively maintained since 1898 to catalog materials held at the Library of Congress. By virtue of cooperative cataloging other libraries around the United States also use LCSH to provide subject access to their collections. In addition LCSH is used internationally, often in translation. LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children's (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the "red books") with inclusion of validation strings.

*Validation strings: Some authority records are for headings that have been built by adding subdivisions. These records are the result of an ongoing project to programmatically create authority records for valid subject strings from subject heading strings found in bibliographic records. The authority records for these subject strings were created so the entire string could be machine-validated. The strings do not have broader, narrower, or related terms.

View Scheme

Library of Congress Name Authority File

The Library of Congress Name Authority File (NAF) file provides authoritative data for names of persons, organizations, events, places, and titles. Its purpose is the identification of these entities and, through the use of such controlled vocabulary, to provide uniform access to bibliographic resources. Names descriptions also provide access to a controlled form of name through references from unused forms, e.g. a search under: Snodgrass, Quintus Curtius, 1835-1910 will lead users to the authoritative name for Mark Twain, which is, "Twain, Mark, 1835-1910." Names may also be used as subjects in bibliographic descriptions, so they may be combined with controlled values from subject heading schemes, such as LCSH.

Library of Congress Names includes over 8 million descriptions created over many decades and according to different cataloging policies. LC Names is officially called the NACO Authority File and is a cooperative effort in which participants follow a common set of standards and guidelines.

View Scheme

Library of Congress Classification

The Library of Congress Classification (LCC) is a classification system that was first developed in the late nineteenth and early twentieth centuries to organize and arrange the book collections of the Library of Congress. Over the course of the twentieth century, the system was adopted for use by other libraries as well, especially large academic libraries in the United States. It is currently one of the most widely used library classification systems in the world.

The system divides all knowledge into twenty-one basic classes, each identified by a single letter of the alphabet. Most of these alphabetical classes are further divided into more specific subclasses, identified by two-letter, or occasionally three-letter, combinations. For example, class N, Arti>, has subclasses NA, Architecturei>; NB, Sculpturei>, ND, Paintingi>; as well as several other subclasses. Each subclass includes a loosely hierarchical arrangement of the topics pertinent to the subclass, going from the general to the more specific. Individual topics are often broken down by specific places, time periods, or bibliographic forms (such as periodicals, biographies, etc.). Each topic (often referred to as a caption) is assigned a single number or a span of numbers. Whole numbers used in LCC may range from one to four digits in length, and may be further extended by the use of decimal numbers. Some subtopics appear in alphabetical, rather than hierarchical, lists and are represented by decimal numbers that combine a letter of the alphabet with a numeral , e.g. .B72 or .K535. Relationships among topics in LCC are shown not by the numbers that are assigned to them, but by indenting subtopics under the larger topics that they are a part of, much like an outline. In this respect, it is different from more strictly hierarchical classification systems, such as the Dewey Decimal Classificationi>, where hierarchical relationships among topics are shown by numbers that can be continuously subdivided.

View Scheme

Library of Congress Children's Subject Headings

The Library of Congress Subject Headings Supplemental Vocabularies: Children’s Headings (LCSHAC) is a thesaurus which is used in conjunction with LCSH. It is not a self-contained vocabulary, but is instead designed to complement LCSH and provide tailored subject access to children and young adults when LCSH does not provide suitable terminology, form, or scope for children. LCSHAC records can be identified by the LCCN prefix "sj".

View Scheme

Library of Congress Genre/Form Terms

The Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT) is a thesaurus that describes what a work is versus what it is about. For instance, the subject heading Horror films, with appropriate subdivisions, would be assigned to a book about horror films. A cataloger assigning headings to the movie The Texas Chainsaw Massacre would also use Horror films, but it would be a genre/form term since the movie is a horror film, not a movie about horror films.

The thesaurus combines both genres and forms. Form is defined as a characteristic of works with a particular format and/or purpose. A "short" is a particular form, for example, as is "animation." Genre refers to categories of works that are characterized by similar plots, themes, settings, situations, and characters. Examples of genres are westerns and thrillers. In the term Horror films "horror" is the genre and "films" is the form.

LCGFT assumed its title in June 2010 and in May 2011 the LCCN prefix "gf" was implemented to identify Genre/form terms as part of the change to separate LCGFT from LCSH. The "gf" prefix is one way by which a record can be identified as a genre/form authority record. Further information about the LCGFT thesaurus and its relationship to the LCSH data set may be found in Library of Congress to Reissue Genre/Form Authority Records (Revised May 9, 2011) and in a FAQ on the topic.

View Scheme

Library of Congress Medium of Performance Thesaurus for Music

The Library of Congress Medium of Performance Thesaurus (LCMPT) is a stand-alone vocabulary that provides terminology to describe the instruments, voices, etc., used in the performance of musical works. The authorized terms are intended to be used in field 382 of MARC 21 bibliographic and authority records, and may be assigned in both AACR2 and RDA records.

The core terms in LCMPT are based chiefly on existing LC subject headings, but some additional terms that do not already appear in LCSH have also been included. Authorized terms and references in LCMPT generally consist of single words and phrases, but parenthetical qualifiers are occasionally employed to differentiate among homonyms. All terms and references are in the singular form and are lowercased unless they are proper nouns (e.g., flute; saxophone ensemble; but Irish harp).

The thesaurus has three broadest terms: ensemble, instrument, and performer. Each of the other terms is hierarchically subordinate to one or more of these terms and exhibits the class/class member relationship. Most of the authorized terms have Used For (UF) references for synonyms. Scope notes are also provided in many cases, and may describe the medium’s physical structure, the time period in which it was popular, and/or its geographic origin.

Further information about the LCMPT and its relationship to the LCSH data set may be found in Library of Congress Releases Tentative List of Medium of Performance Terms for Music.

View Scheme

Thesaurus for Graphic Materials

The Thesaurus for Graphic Materials is a tool for indexing visual materials by subject and genre/format. The thesaurus includes more than 7,000 subject terms to index topic shown or reflected in pictures, and 650 genre/format terms to index types of photographs, prints, design drawings, ephemera and other categories. New terms are added regularly. TGM is searchable through the Prints and Photographs Online Catalog (PPOC).

View Scheme

AFS Ethnographic Thesaurus

The AFS Ethnographic Thesaurus is a vocabulary that can be used to improve access to information about folklore, ethnomusicology, ethnology, and related fields. The American Folklore Society developed the Thesaurus in cooperation with the American Folklife Center of the Library of Congress and supported by a generous grant from the Scholarly Communications Program of the Andrew W. Mellon Foundation.

View Scheme

Cultural Heritage Organizations

The Code List for Cultural Heritage Organizations contains short alphabetic codes used to represent names of libraries and other kinds of organizations that need to be identified in the bibliographic environment. This code list is an essential reference tool for those dealing with MARC records, for systems reporting library holdings, for many interlibrary loan systems, and for those who may be organizing cooperative projects on a regional, national, or international scale. There are a number of data elements in the MARC formats that call for institutional identifiers, the chief ones being those that identify the organization assigning the record control number, the agency responsible for creating or modifying a record, and the agency holding a copy of the item. In particular, this list is a key to codes for holding institutions represented in the Library of Congress National Union Catalog (NUC) and other union list publications which contain holdings for reporting institutions.

This code list for cultural heritage organizations , which was begun in 1932 as part of a community project, includes new codes assigned on an on-going basis. Over time, a small number of existing codes have been changed or made obsolete. In all cases, previously valid codes are given as references. The large number of codes can be attributed to continuing expansion of the use of standard identifiers, nationally by school libraries (particularly for statewide projects) and internationally as information is shared globally via the Internet.

While this list of organizations focuses on US institutions, with over 30,000 defined, it also includes codes for institutions in other countries that have requested them. However, MARC codes are not assigned for institutions for Canada, Germany, or the United Kingdom unless the institution is a branch of a US institution.

The list contains over 41,000 entries.

The code list, including a detailed explanation of the codes' history and structure, is part of a database where the codes may also be searched. See the MARC Code List for Organizations website.

View Scheme

MARC List of Relator Terms

Relator terms and their associated codes designate the relationship between a name and a bibliographic resource. The relator codes are three-character lowercase alphabetic strings that serve as identifiers. Either the term or the code may be used as controlled values.

View Scheme

MARC List of Countries

MARC Countries list identifies current national entities, states of the United States, provinces and territories of Canada and Australia, divisions of the United Kingdom, and internationally recognized dependencies. The list's codes are two- or three-character lowercase alphabetic strings that serve as identifiers. The MARC country codes are not the same as the ISO 3166 country codes, although the lists are entity-compatible so that a simple translation could relate codes for the same entity. The records for the codes contain references to the equivalent ISO 3166 codes.

The list contains over 350 discrete codes. This list is also searchable at: MARC Code List for Countries.

View Scheme

MARC List of Geographic Areas

Geographic Areas list identifies separate countries, first order political divisions of some countries, regions, geographic features, areas in outer space, and celestial bodies. The list's codes are one-to-seven lowercase alphabetic strings that serve as identifiers.

The list contains over 550 discrete codes. This list is also available at: MARC Code List for Geographic Areas.

View Scheme

MARC List of Languages

MARC List for Languages provides three-character lowercase alphabetic strings that serve as the identifiers of languages and language groups. The codes are usually based on the first three letters of the English form or, in some cases, vernacular form of the corresponding language name. The codes are varied where necessary to resolve conflicts and are not intended to be abbreviations of a language name. When the name of a language is changed in the list, the original code is generally retained.

The codes in this list are equivalent to those of ISO 639-2 (Bibliographic) codes and some codes from ISO 639-5, although the language name labels may differ. They are linked to the equivalent codes in ISO 639-2 and ISO 639-5 and the corresponding two-character codes in ISO 639-1.

The list contains over 480 discrete codes. It is also searchable at: MARC Code List for Languages.

View Scheme

ISO 639-1: Codes for the Representation of Names of Languages - Part 1: Two-letter codes for languages

ISO 639-1 is the first part of the ISO 639 international-standard language-code family. ISO 639-1 provides two-character lowercase alphabetic strings that serve as identifiers of languages. The list contains approximately 180 discrete codes. All ISO 639-1 languages also have ISO 639-2 three-character code representations. These codes are linked to codes for the same languages in ISO 639-2 and the MARC Language Codes.

View Scheme

ISO 639-2: Codes for the Representation of Names of Languages - Part 2: Alpha-3 Code for the Names of Languages

ISO 639-2 is part of the ISO 639 language code family, which provides also a two-character code set (ISO 639-1) for the representation of names of languages. ISO 639-2 contains codes for all languages contained in ISO 639-1 and several hundred additional languages. The ISO 639-2 (Bibliographic) codes were devised for use in bibliographic metadata, e.g., for libraries, information services, and publishers, and ISO 639-2 (Terminology) targets terminology, lexicography, and linguistic applications. The lists are the same except for 20 languages that have different Bibliographic and Terminology codes. The list contains over 500 discrete codes.

The ISO 639-2 (Bibliographic) codes are equivalent to the MARC Language Codes. The ISO 639-2 codes are linked to two-character codes for the same language in ISO 639-1, to the MARC Language Codes, and to equivalent codes for language groups in ISO 639-5.

View Scheme

ISO 639-5 Codes for the Representation of Names of Languages - Part 5: Alpha-3 Code for Language Families and Groups

ISO 639-5 provides three-character lowercase alphabetic strings that serve as identifiers for the representation of names of living and extinct language families and language groups. The list contains over 100 discrete codes.

The codes on this list include all of the codes for language groups in the MARC Language Code scheme and over 40 additional groups. The codes are linked to their equivalent codes on the MARC Language Code list and ISO 639-2.

View Scheme

Extended Date/Time Format

Extended Date/Time Format Datatypes Scheme collects the three different datatypes, one each pertaining to a EDTF level.

View Scheme

Standard Identifiers

Standard Identifier Scheme lists standard number or code systems and assigns a URI to each database or publication that defines or contains the identifiers. The purpose of these source codes is to enable the type of standard numbers or codes in resource descriptions to be indicated by URI.

View Scheme


Carriers Scheme is derived from a controlled list of coded values representing carrier types principally used in RDA cataloging.

View Scheme

Content Types

Content Types Scheme is derived from a controlled list of coded values representing content types principally used in RDA cataloging.

View Scheme

Media Types

Media Types Scheme is derived from a controlled list of coded values representing content types principally used in RDA cataloging.

View Scheme