OCR Is Counting Pages Twice when using the Create Print PDF setting in Project Client

Last updated
Save as PDF

Applies to

CONTENTdm Project Client

Answer

This is an expected behavior caused by the version of ABBYY SDK FineReader that is integrated with the current version of the Project Client. When selecting both "Create Print PDF" while Processing OCR the ABBYY software creates copies of the ingested files to use for the Print PDF. It then processes both the originals and the copies for OCR (accounting for the doubling experienced), and results is the Print PDF file having the embedded transcript metadata.

It is possible to create a print PDF without doubling the OCR count and still get the transcript text via OCR using the following workflow:

Ingest compound object with the options No transcripts and Create print PDF.
- This will create the Print PDF without using OCR.
Open the compound object in a new tab.
Select More Actions > Add OCR Text
OCR entire compound object and leave Create PDF unselected.
Perform OCR
- This will result in the record being populated with transcript metadata, but the Print PDF created during ingest will remain unchanged.
Upload for Approval as normal.

With the above workflow, users are able to both create the Print PDF and add the transcript metadata without doubling the OCR used. The difference in this method is that the Print PDF will not have embedded text.

Page ID

48891

Need help?

Follow OCLC

Support

Related sites

Stay in the know.