OCLC Expert Cataloging Community Sharing Session minutes, January 2017

Minutes of the OCLC Enhance and Expert Community Sharing Session
ALA Midwinter Conference
Friday, 2017 January 20
10:30 a.m.-12:00 p.m.
Georgia World Congress Center, Atlanta, Georgia

The ALA Midwinter 2017 edition of Breaking Through: What’s New and Next from OCLC and the compilation of News From OCLC were distributed. Two additional items were mentioned: (1) OCLC’s introduction of Tipasa, the first cloud-based interlibrary loan management system that automates routine borrowing and lending functions for individual libraries and, (2) OCLC is acquiring Relais International, a leading interlibrary loan solutions provider based in Ottawa, Ontario, Canada, to increase resource sharing options and capabilities for both Relais customers and OCLC member libraries and groups worldwide. From News From OCLC, three items were mentioned: (1) the 15 small libraries chosen for the “Small Libraries Create Smart Spaces” project, in cooperation with the Association for Rural and Small Libraries, (2) OCLC and Internet Archive have announced the results of a year-long cooperative effort to ensure the future sustainability of persistent URLs (PURLs), and (3) the Culinary Institute of America using CONTENTdm to share historical menus from all 50 states and 80 countries, dating back to 1855 (http://ciadigitalcollections.culinary.edu/cdm/landingpage/collection/p16940coll1).

The floor was then opened for questions, answered by Robert Bremer (Senior Consulting Database Specialist, WorldCat Quality); Hayley Moreno (Database Specialist II, Global Product Management); Rosanna O’Neil (Senior Library Services Consultant, Library Services for Americas); Nathan Putnam (Director, Metadata Quality, Global Product Management); Laura Ramsey (Section Manager, Quality Control); Roy Tennant (Senior Program Officer, OCLC Research Library Partnership); Jay Weitz (Senior Consulting Database Specialist, WorldCat Quality); and Cynthia Whitacre (Manager, WorldCat Quality).

Do you have any advice about the addition of identifiers in bibliographic records? Where should they go, in subfields $0, $4?

We prefer that all headings be controlled, linking them to the authority record, which would in turn associate them with any identifiers appearing in the authority record. That should be enough for linked data purposes. Identifiers in subfields $0 in the bibliographic record may go away once the heading is controlled and linked to the authority record. You are free to add identifiers in the bibliographic record, but OCLC has no policy on it at this time.

If I remove one subject heading should I remove only the FAST headings that derive from that subject heading?

Our colleague Rick Bennett, one of those in OCLC Research who works on FAST and its processing offers the following explanation:

FAST headings added by processes other than itself are maintained on a heading-by-heading basis. FAST headings added by the FAST process are kept in sync with the LCSH in the record as a set. If the process added the FAST heading, maintenance of the headings will be to make all changes needed to keep the FAST updated and to reflect what is in the LCSH. If an LCSH heading is added to or deleted from the record, the corresponding change will be made to the FAST headings. If the FAST headings deviate from what the process added, I will only keep the individual FAST headings up to date. If an LCSH heading is added to or deleted from the record, I won’t do anything to the FAST headings based on that change.

In other words, FAST processing will recognize, retain, and keep updated any FAST headings added by a process other than itself. If you are changing or adding any existing LC subject headings, delete all of the corresponding FAST headings and they will be regenerated.

How is the duplicate clean-up in WorldCat progressing?

Duplicate Detection and Resolution (DDR) tries to strike a delicate balance between the proper elimination of duplicate records and the equally proper retention of legitimately different records that may look similar. We work constantly to improve the matching process, with the matching team meeting at least twice most weeks to discuss problems brought to our attention by members of the cooperative or that we discover ourselves, as well as to discuss and test ongoing issues in matching. Between 2005 and 2010, we developed and extensively tested what is now the basic DDR program that has been in continual service since 2010. Since 2010, DDR has eliminated over 20 million duplicate records in all bibliographic formats. Please report to bibchange@oclc.org or to AskQC@oclc.org any incorrectly merged records that you find. If an incorrect merge occurred recently enough, we can roll it back and restore the original records. Just as important, we try to learn from each and every incorrect merge and to adjust the algorithms so that similar merges do not recur, if at all possible.

If the duplicates we report are skimpy records, do you try to resolve them or just ignore them?

That depends on the individual records involved. There are several varieties of what we call “sparse” records, some of which we allow DDR to deal with and some of which we can deal with manually, case-by-case, all according to specific criteria.

Does it matter whether the 502 field is formatted or unformatted for the process of merging records?

No. DDR takes into account the presence of field 502, but not its formatting. Traditionally in the United States, an original typescript copy of a thesis has been considered unpublished. In some European countries, however, publishers have arrangements by which they produce both the official thesis version of the document and its subsequent formal publication. This European practice has made it more difficult for DDR to distinguish these kinds of resources, but we’ve done our best to put in place routines to make the distinction.

A number of bibliographic records are missing fields that were in the original record. Did OCLC strip them out? I miss Institution Records.

OCLC generally does not strip out valid fields. During the course of the transition from Institution Records (IRs) to Local Bibliographic Data (LBD) in 2015 and 2016, as I understand things, institutions had several options regarding the disposition of the data in their IRs. They could choose to transfer certain fields to LBDs, to transfer certain fields into the master record associated with the IR, or combinations of the two. Customizable options were also available in discussions between each institution and OCLC, as I recall. If a current master record does not include fields that had been present in a former IR, my guess is that this was the choice of the institution, within the limitations of LBDs and the field transfer process.

Sometimes when I request a merge it is updated right away; other times months go by and nothing happens. Does this mean the request was rejected?

Not necessarily. Some bibliographic formats have little or no backlog of reported duplicates, whereas others do have a backlog. A delay may simply mean that the report is in a backlog.

Sometimes I think that a record must be wrong but I can’t verify it and don’t know if I dare to change it.

If you’re not sure, you have the option of reporting it to bibchange@oclc.org and letting us figure it out.

My library is a selected government documents repository and not all the documents have been cataloged. The cataloging has been transferred from GovDocs to me and I am cataloging the backlog and correcting old records. When I search by the SUDOC number I find many duplicates for the print versions. Often there are two or three good records with links to Google Books and HathiTrust and a number of bad records with different links. Sometimes the links are present in PCC records and sometimes not. There seems to be a pattern of three institutions contributing most of the bad records. I try to report what I can, but since there are many thousands that I have to do I can’t take the time to report them all. How guilty should I feel?

Very. But seriously, we fully understand that no one has time to report every error and every duplicate. Do what you can. All of it is appreciated by both other catalogers and us. If you can let us know the symbols for the three offending libraries many we can investigate and try to resolve the problems.

Is there a way that OCLC can mark the records that have already been reported and are being worked on?

That is a promising idea that we’ve talked about many times in the past and will consider again.

When problems are reported to CONSER they use the 936 field to tell us what problems are being worked on. Might that work for other records?

We used to use the 936 for non-serials also, especially for recording the OCLC numbers of parallel records. That did not work out well and the practice was discontinued in 2012. We hesitate to expand the field’s use for other purposes.

Anything new on GLIMIR clustering?

There’s nothing specific to report about GLIMIR. Just like with DDR, the GLIMIR matching algorithms are constantly under construction.

Has a date been set for the end of Connexion?

There is currently no end-of-life date set for Connexion. Record Manager remains very much a work-in-progress. OCLC will give members of the cooperative plenty of notice before Connexion is phased out, but that will not be anytime soon.

Are records for electronic resources from different providers allowed or should they be reported as duplicates?

There should be only one provider-neutral record. Others can be reported as duplicates. The Program for Cooperative Cataloging (PCC) Provider-Neutral E-Resource MARC Record Guidelines can be found at http://www.loc.gov/aba/pcc/scs/documents/PCC-PN-guidelines.html.

We have been finding that subject headings and name headings that we know are in the authority file won’t control in Connexion unless we insert the spaces between the subfield delimiters and the text.

This was an unintended consequence of OCLC’s recent move to support all Unicode characters. The Latin Letter Alveolar (U+01C2), which the Connexion client has long used for the subfield delimiter symbol (ǂ), must now be treated by the system the same as any other Unicode character. The simple action of reformatting the record (Edit/Reformat in the client) before you attempt to control headings will add the appropriate spaces automatically and allow you to proceed without any problem. This was recently documented in Connexion Client Problems and Troubleshooting under “Controlling Headings” at https://www.oclc.org/support/services/connexion/client_known_problems.en.html#controllingheadings.

Recently, there has been a noticeable increase in apparently local 6XX fields that duplicate existing fields. Can you do something about these?

The current increase in transfers of 6XX fields is a result of differences in our new Data Ingest processing, compared to the old batchload processing of Metadata Capture (MDC). Data Ingest, at least for now, has been automatically transferring all subject headings in schemes not already present on the WorldCat record to which the incoming record matches, based on the 6XX Second Indicator values and subfield $2 codes when applicable. Fields 6XX with Second Indicator 4 and fields 6XX with Second Indicator 7 lacking a subfield $2 are considered to be their own scheme, and so may be subject to transfer under the right circumstances. There is currently no option for turning that off in the new Data Ingest process. We are considering these early days of Data Ingest processing to be a work in progress and are hoping that (based both on user feedback and on our own analyses of the results) we can fine-tune certain things later. These sorts of 6XX transfers, although intended to minimize the loss of potentially useful access points, are a prime candidate for future changes. We are already at working trying to rectify this, although it won't happen immediately. Members of the cooperative should feel free to delete from the master record any such redundant headings in the meantime. Additionally we have been having some discussions on a possible clean-up project to help alleviate this problem for our users. We hope to begin that effort in the very near future.

Have you considered controlling fields in the authority records?

This is a question that has been asked before. But remember that the authority file is actually under the jurisdiction of the Library of Congress and NACO participants.

Would you consider indexing the 250 field?

At our next possible opportunity, we plan to add the following fields to the Keyword Index: 250 subfields $a and $b, 254 subfield $a, and 258 subfields $a and $b.

Respectfully submitted by
Doris Seely
University of Minnesota
2017 January 26

With edits by Jay Weitz.

OCLC
2017 March 9