Recommended Settings for Better OCR Results
Applies to: All versions of Ephesoft Transact
This page lists recommended settings that may improve optical character recognition (OCR) results in Transact. These settings may also resolve common issues with the Recostar_HOCR_Plugin, such as if color images cause:
- Errors or failure when creating HOCR.xml files.
- Transact to crash or hang during RecoStar batch image process.
The following factors can affect OCR quality:
- Image quality
- Compression parameters
- Other plugins
The following settings are optimal when the PDF to TIFF Conversion Process is set to Ghostscript and the Image Conversion Process is set to ImageMagick.* If you are processing both color and black and white images, use the settings recommended in Color Images. If your use case is exclusively greyscale, use the settings for Black and White Images.
Note: Other combinations of engines for PDF to TIFF Conversion Process and Image Conversion Process may result in better performance and quality depending on your use case.
Color Images
For color images, Ephesoft recommends the following configurations:
- Documents should have a minimum of 200 DPI.
- In the IMPORT_MULTIPAGE_FILES plugin:
- Ensure
-limit area 100MB
is added to the IM Convert Input Image Parameters.
- Ensure
Note: This field is case sensitive, “MB” must be capitalized.
-
- Ensure
-compress LZW
is added to the IM Convert Output Image Parameters. - Ensure
-sCompression=lzw
is added to the GhostScript Image Parameters.
- Ensure
Figure 1. Import Multipage Files Plugin
- Ensure the CREATE_OCR_INPUT plugin exists in the Page Process module.
- Turn the Recostar color switch to ON in the RECOSTAR_HOCR plugin, located in the Page Process module.
Figure 2. RecoStar HOCR Plugin
Black and White Images
For black and white images, Ephesoft recommends the following configurations:
- Documents should have a minimum of 200 DPI.
- In the IMPORT_MULTIPAGE_FILES plugin:
- Ensure
-limit area 100MB
is added to the IM Convert Input Image Parameters.
- Ensure
Note: This field is case sensitive, “MB” must be capitalized.
-
- Ensure
-compress LZW
is added to the IM Convert Output Image Parameters. - Ensure
-sCompression=lzw
is added to the GhostScript Image Parameters.
- Ensure
FIgure 3. Import Multipage Files Plugin
- Turn the Recostar color switch to OFF in the RECOSTAR_HOCR plugin, located in the Page Process module.
Figure 4. RecoStar HOCR Plugin
- Remove the CREATE_OCR_INPUT plugin from the Page Processing module.