Minutes of the OCLC Enhance and Expert Community Sharing Session
ALA Midwinter Conference
Friday, 2017 January 20
10:30 a.m.-12:00 p.m.
Georgia World Congress Center, Atlanta, Georgia
The ALA Midwinter 2017 edition of Breaking Through: What’s New and Next from OCLC and the compilation of News From OCLC were distributed. Two additional items were mentioned: (1) OCLC’s introduction of Tipasa, the first cloud-based interlibrary loan management system that automates routine borrowing and lending functions for individual libraries and, (2) OCLC is acquiring Relais International, a leading interlibrary loan solutions provider based in Ottawa, Ontario, Canada, to increase resource sharing options and capabilities for both Relais customers and OCLC member libraries and groups worldwide. From News From OCLC, three items were mentioned: (1) the 15 small libraries chosen for the “Small Libraries Create Smart Spaces” project, in cooperation with the Association for Rural and Small Libraries, (2) OCLC and Internet Archive have announced the results of a year-long cooperative effort to ensure the future sustainability of persistent URLs (PURLs), and (3) the Culinary Institute of America using CONTENTdm to share historical menus from all 50 states and 80 countries, dating back to 1855 (http://ciadigitalcollections.culinary.edu/cdm/landingpage/collection/p16940coll1).
The floor was then opened for questions, answered by Robert Bremer (Senior Consulting Database Specialist, WorldCat Quality); Hayley Moreno (Database Specialist II, Global Product Management); Rosanna O’Neil (Senior Library Services Consultant, Library Services for Americas); Nathan Putnam (Director, Metadata Quality, Global Product Management); Laura Ramsey (Section Manager, Quality Control); Roy Tennant (Senior Program Officer, OCLC Research Library Partnership); Jay Weitz (Senior Consulting Database Specialist, WorldCat Quality); and Cynthia Whitacre (Manager, WorldCat Quality).
We prefer that all headings be controlled, linking them to the authority record, which would in turn associate them with any identifiers appearing in the authority record. That should be enough for linked data purposes. Identifiers in subfields $0 in the bibliographic record may go away once the heading is controlled and linked to the authority record. You are free to add identifiers in the bibliographic record, but OCLC has no policy on it at this time.
Our colleague Rick Bennett, one of those in OCLC Research who works on FAST and its processing offers the following explanation:
FAST headings added by processes other than itself are maintained on a heading-by-heading basis. FAST headings added by the FAST process are kept in sync with the LCSH in the record as a set. If the process added the FAST heading, maintenance of the headings will be to make all changes needed to keep the FAST updated and to reflect what is in the LCSH. If an LCSH heading is added to or deleted from the record, the corresponding change will be made to the FAST headings. If the FAST headings deviate from what the process added, I will only keep the individual FAST headings up to date. If an LCSH heading is added to or deleted from the record, I won’t do anything to the FAST headings based on that change.
In other words, FAST processing will recognize, retain, and keep updated any FAST headings added by a process other than itself. If you are changing or adding any existing LC subject headings, delete all of the corresponding FAST headings and they will be regenerated.
Duplicate Detection and Resolution (DDR) tries to strike a delicate balance between the proper elimination of duplicate records and the equally proper retention of legitimately different records that may look similar. We work constantly to improve the matching process, with the matching team meeting at least twice most weeks to discuss problems brought to our attention by members of the cooperative or that we discover ourselves, as well as to discuss and test ongoing issues in matching. Between 2005 and 2010, we developed and extensively tested what is now the basic DDR program that has been in continual service since 2010. Since 2010, DDR has eliminated over 20 million duplicate records in all bibliographic formats. Please report to bibchange@oclc.org or to AskQC@oclc.org any incorrectly merged records that you find. If an incorrect merge occurred recently enough, we can roll it back and restore the original records. Just as important, we try to learn from each and every incorrect merge and to adjust the algorithms so that similar merges do not recur, if at all possible.
That depends on the individual records involved. There are several varieties of what we call “sparse” records, some of which we allow DDR to deal with and some of which we can deal with manually, case-by-case, all according to specific criteria.
No. DDR takes into account the presence of field 502, but not its formatting. Traditionally in the United States, an original typescript copy of a thesis has been considered unpublished. In some European countries, however, publishers have arrangements by which they produce both the official thesis version of the document and its subsequent formal publication. This European practice has made it more difficult for DDR to distinguish these kinds of resources, but we’ve done our best to put in place routines to make the distinction.
OCLC generally does not strip out valid fields. During the course of the transition from Institution Records (IRs) to Local Bibliographic Data (LBD) in 2015 and 2016, as I understand things, institutions had several options regarding the disposition of the data in their IRs. They could choose to transfer certain fields to LBDs, to transfer certain fields into the master record associated with the IR, or combinations of the two. Customizable options were also available in discussions between each institution and OCLC, as I recall. If a current master record does not include fields that had been present in a former IR, my guess is that this was the choice of the institution, within the limitations of LBDs and the field transfer process.
Not necessarily. Some bibliographic formats have little or no backlog of reported duplicates, whereas others do have a backlog. A delay may simply mean that the report is in a backlog.
If you’re not sure, you have the option of reporting it to bibchange@oclc.org and letting us figure it out.
Very. But seriously, we fully understand that no one has time to report every error and every duplicate. Do what you can. All of it is appreciated by both other catalogers and us. If you can let us know the symbols for the three offending libraries many we can investigate and try to resolve the problems.
That is a promising idea that we’ve talked about many times in the past and will consider again.
We used to use the 936 for non-serials also, especially for recording the OCLC numbers of parallel records. That did not work out well and the practice was discontinued in 2012. We hesitate to expand the field’s use for other purposes.
There’s nothing specific to report about GLIMIR. Just like with DDR, the GLIMIR matching algorithms are constantly under construction.
There is currently no end-of-life date set for Connexion. Record Manager remains very much a work-in-progress. OCLC will give members of the cooperative plenty of notice before Connexion is phased out, but that will not be anytime soon.
There should be only one provider-neutral record. Others can be reported as duplicates. The Program for Cooperative Cataloging (PCC) Provider-Neutral E-Resource MARC Record Guidelines can be found at http://www.loc.gov/aba/pcc/scs/documents/PCC-PN-guidelines.html.
This was an unintended consequence of OCLC’s recent move to support all Unicode characters. The Latin Letter Alveolar (U+01C2), which the Connexion client has long used for the subfield delimiter symbol (ǂ), must now be treated by the system the same as any other Unicode character. The simple action of reformatting the record (Edit/Reformat in the client) before you attempt to control headings will add the appropriate spaces automatically and allow you to proceed without any problem. This was recently documented in Connexion Client Problems and Troubleshooting under “Controlling Headings” at https://www.oclc.org/support/services/connexion/client_known_problems.en.html#controllingheadings.
The current increase in transfers of 6XX fields is a result of differences in our new Data Ingest processing, compared to the old batchload processing of Metadata Capture (MDC). Data Ingest, at least for now, has been automatically transferring all subject headings in schemes not already present on the WorldCat record to which the incoming record matches, based on the 6XX Second Indicator values and subfield $2 codes when applicable. Fields 6XX with Second Indicator 4 and fields 6XX with Second Indicator 7 lacking a subfield $2 are considered to be their own scheme, and so may be subject to transfer under the right circumstances. There is currently no option for turning that off in the new Data Ingest process. We are considering these early days of Data Ingest processing to be a work in progress and are hoping that (based both on user feedback and on our own analyses of the results) we can fine-tune certain things later. These sorts of 6XX transfers, although intended to minimize the loss of potentially useful access points, are a prime candidate for future changes. We are already at working trying to rectify this, although it won't happen immediately. Members of the cooperative should feel free to delete from the master record any such redundant headings in the meantime. Additionally we have been having some discussions on a possible clean-up project to help alleviate this problem for our users. We hope to begin that effort in the very near future.
This is a question that has been asked before. But remember that the authority file is actually under the jurisdiction of the Library of Congress and NACO participants.
At our next possible opportunity, we plan to add the following fields to the Keyword Index: 250 subfields $a and $b, 254 subfield $a, and 258 subfields $a and $b.
Respectfully submitted by
Doris Seely
University of Minnesota
2017 January 26
With edits by Jay Weitz.
OCLC
2017 March 9