Using Optical Character Recognition - Tips

Applies to TestComplete 15.69, last modified on November 13, 2024
In version 12.60, the Optical Character Recognition feature was upgraded to a new plugin powered by Google Cloud Vision API. To learn more, see Optical Character Recognition.
The deprecated OCR plugin was removed from TestComplete in version 12.60. If you need to use the deprecated OCR plugin with this version of TestComplete, please contact our Customer Care team. The deprecated OCR plugin was restored in TestComplete version 14.0. To use the plugin with this or later TestComplete version, you need to install and enable the plugin manually.

In general, optical character recognition cannot ensure 100% recognition accuracy in all circumstances, because it depends on a number of unpredictable factors. The TestComplete OCR is not an exclusion from this rule. The factors that affect the recognition accuracy in TestComplete include, but are not limited to, fonts, font style and size, front and background colors, and the text to be recognized.

To enhance the recognition accuracy by predicting what character combinations can be encountered in English words, TestComplete utilizes results of frequency-domain analysis for the use of various characters in English words; so, Latin characters used in other languages, say, German, will be recognized with less accuracy. Due to the same reason the OCR engine may fail to recognize words that are “uncommon” for English lexis: identifiers in program languages, system setting names, product names and others.

One more factor that affects the TestComplete recognition accuracy is font smoothing. TestComplete assumes that each character is drawn with one font color. This means that if the font smoothing is enabled, TestComplete will recognize some characters with less accuracy than if the font smoothing is disabled.

Below, are some tips that will help you improve the recognition accuracy. Before you read them, we would like to note that quite often the use of OCR methods is just a step of a bigger test that should compare an object’s text against a baseline value or simulate a mouse click on the text. TestComplete offers a special technology - Text Recognition - whose principles of working with text objects are other than trying to “read” their text from screen. In many cases, this technology works faster and provides better results. To learn how to use it to perform OCR tasks, see Optical Character Recognition vs. Text Recognition Technology. If you have to use the OCR engine, read this topic to learn how you can improve recognition results.

To improve the recognition accuracy of the OCR engine, follow the tips below:

  • Make sure that the areas holding the text to be recognized are as small as possible.

    Correct:

    Incorrect:

  • Make sure that these areas include as little graphic elements of the user interface as possible. For instance, try to avoid images of scrollbars and icons in controls’ images.

    Correct:

    Incorrect:

  • If possible, set up the same font color for all text in the selected rectangle.

  • Specify optional recognition parameters, font color in particular. This will help the OCR engine distinguish characters from the background. Specifying font color is recommended if the image contains pixels of different colors.

  • Create templates not only for regular, but also for bold and italic font styles. A template created for a regular style may produce erroneous results when recognizing bold or italic text and vice versa.

    Don’t forget to append the created templates to the recognition settings. For instance, you can add the bold and italic font templates to the default recognition settings or create a new setting collection with the OCRObject.CreateOptions method and then pass this collection as a parameter to the OCRObject.GetText method.

  • Do not only use one font in templates. Use several fonts. This will help the OCR engine create better recognition criteria and will provide more accurate results.

  • Reduce the number of recognizable characters. By using the OCROptions.ActiveRecognitionSet option you can enable and disable the recognition of digits, special characters, uppercase or lowercase letters. Thus, when you need to get, for example, a phone number, you can set it so that only numbers are recognizable. This will improve the operation performance.

  • If possible, disable font smoothing in Windows’ settings or use the Standard smoothing. To change the smoothing settings:

    • In Windows 8.1 and Windows 10:

      • Open the Start menu.

      • Type Adjust visual in the Search box.

      • Click Settings.

      • Click Adjust the appearance and performance of Windows in the Settings area.

      • In the dialog, select Custom and then clear the Smooth edges of screen fonts check box.
      • Click OK to save the changes.
    • In Windows 7:
      • Open the Control Panel.
      • Type Adjust visual effect in the Search box that is in the top-right corner of the Control Panel’s window.
      • Click Adjust the appearance and performance of Windows in the search results. This will invoke the Performance Options dialog box.
      • In the dialog, select Custom and then clear the Smooth edges of screen fonts check box.
      • Click OK to save the changes.
    • In Windows Server 2008 and later:
      • Open the Control Panel | Personalization | Window Color and Appearance dialog. The Appearance Settings dialog will appear.
      • In the dialog, click Effects.
      • In the subsequent dialog, uncheck the Use the following method to smooth edges of screen fonts check box, or leave this check box enabled and select the Standard smoothing type from the combo box beneath the check box.

See Also

Using Optical Character Recognition
Using Optical Character Recognition - Overview
Optical Character Recognition vs. Text Recognition Technology
Using Text Recognition Technology - Overview

Highlight search results