PDF.ConvertToText Method

Applies to TestComplete 15.40, last modified on March 25, 2022

Description

The PDF.ConvertToText method extracts the contents of the specified PDF file, recognizes the text using optical character recognition and returns the recognized text.

Requirements
  • Your TestComplete version must be 14.20 or later.

  • Your computer must have access to the ocr.api.dev.smartbear.com web service.

    If you have firewalls or proxies running in your network, they should allow your computer to access the web service. This web service is used to recognize the text content of PDF files.

  • Your firewall must allow traffic through port 443.

  • You need an active license for the TestComplete Intelligent Quality add-on.

  • The Intelligent Quality add-on must be enabled in TestComplete.

    You can enable the add-on during TestComplete installation. If you did not enable the add-on during the installation, you can do this at any moment later via the File > Install Extensions dialog. Select File > Install Extensions from the TestComplete main menu and enable the Intelligent Quality > Intelligent Quality Core plugin in the resulting dialog.

  • PDF to Text support must be enabled in TestComplete.

    By default, it is enabled automatically if you enable the Intelligent Quality add-on during TestComplete installation.

    If you experience issues with PDF support in your tests, select File > Install Extensions from the TestComplete main menu and make sure the PDF to Text plugin is enabled (you can find it in the Intelligent Quality group). If the plugin is disabled, enable it. In the confirmation message, click the link to read a third-party license agreement. If you agree to the license terms, click Enable OCR.

Declaration

PDF.ConvertToText(PathToPDF)

PathToPDF [in]    Required    String    
Result String

Applies To

The method is applied to the following object:

Parameters

The method has the following parameter:

PathToPDF

Specifies the fully-qualified path to the PDF file whose text content you want to get.

If the specified file is not a PDF file, an error will occur.

Result Value

A string value containing the recognized text.

Example

The example below downloads a PDF file, recognizes the file’s text content and posts it to the test log.

JavaScript, JScript

function GetPDFText()
{
  // Get a PDF file and save it
  var url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf";
  var path = Project.Path + "sample.pdf";
  var request = aqHttp.CreateGetRequest(url);
  var responsePDF = request.Send();
  responsePDF.SaveToFile(path);

  // Extract and recognize the PDF file text content
  var str = PDF.ConvertToText(path);
  if (aqFile.Exists(path))
    Log.Message("View the recognized text in the Details panel of the log", str);
}

Python

def GetPDFText():
  # Get a PDF file and save it
  url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
  path = Project.Path + "sample.pdf"
  request = aqHttp.CreateGetRequest(url)
  responsePDF = request.Send()
  responsePDF.SaveToFile(path)

  # Extract and recognize the PDF file text contents
  str = PDF.ConvertToText(path)
  if (aqFile.Exists(path)):
     Log.Message("View the recognized text in the Details panel of the log", str)

VBScript

Sub GetPDFText
  ' Get a PDF file and save it
  url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
  path = Project.Path & "sample.pdf"
  Set request = aqHttp.CreateGetRequest(url)
  Set responsePDF = request.Send
  responsePDF.SaveToFile(path)

  ' Extract and recognize the PDF file text content
  str = PDF.ConvertToText(path)
  If aqFile.Exists(path) Then
    Call Log.Message("View the recognized text in the Details panel of the log", str)
  End If
End Sub

DelphiScript

procedure GetPDFText();
var url, path, request, responsePDF, str;
begin
  // Get a PDF file and save it
  url := 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf';
  path := Project.Path + 'sample.pdf';
  request := aqHttp.CreateGetRequest(url);
  responsePDF := request.Send();
  responsePDF.SaveToFile(path);

  // Extract and recognize the PDF file text content
  str := PDF.ConvertToText(path);
  if aqFile.Exists(path) then
    Log.Message('View the recognized text in the Details panel of the log', str);
end;

C++Script, C#Script

function GetPDFText()
{
  // Get a PDF file and save it
  var url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf";
  var path = Project["Path"] + "sample.pdf";
  var request = aqHttp["CreateGetRequest"](url);
  var responsePDF = request["Send"]();
  responsePDF["SaveToFile"](path);

  // Extract and recognize the PDF file text content
  var str = PDF["ConvertToText"](path);
  if (aqFile["Exists"](path))
    Log["Message"]("View the recognized text in the Details panel of the log", str);
}

See Also

PDF Object

Highlight search results