Regular Expressions Syntax

Applies to TestComplete 14.40, last modified on April 22, 2021

Regular expressions are coded strings that define an infinite number of possible matches. You can use regular expressions to specify:

Furthermore, you can use regular expressions in scripts: scripting engines of JavaScript, JScript, Python, VBScript, C#Script and C++Script have native support for regular expressions, the support for DelphiScript is provided by HISUtils plugin (see Using Regular Expressions in Scripts).

Note: In this topic certain text fragments are marked with color. The following convention is used:

Regular expression pattern is highlighted with purple color.

Literal text to which the regular expression is applied is highlighted with olive color.

Literal text that matches the regular expression is highlighted with teal color.

Regular Expression Tokens

The syntax of regular expression patterns that can be used in TestComplete may slightly differ, since the regular expression engines used in TestComplete are different. In the aqString.StrMatches, Find and some other methods, TestComplete uses its own non-native regular expressions. Native regular expressions use regular expression objects built into scripting language engines. For a list of tokens that are handled by various TestComplete features, follow the links below:

Tokens Used in Non-Native Regular Expressions (in Property Checkpoints, Dynamic Identifier Patterns, StrMatches and Find methods)

Tokens Used in Custom String Data Generator

Tokens Used in Native Regular Expressions (in Other Objects, Routines and Dialogs)

When specifying native regular expressions, you can divide an expression into sub-expressions and use special modifier keys. See the respective sections for detailed information:

Tokens Used in Non-Native Regular Expressions

The following table lists regular expression tokens that are recognized by the Property Checkpoint keyword test operation, the aqObject.CheckProperty, aqObject.CompareProperty, aqString.StrMatches and BuiltIn.StrMatches scripting methods and patterns for detecting dynamic identifiers of web objects.

Token Description
^ Matches the beginning of a line. For instance, the ^a search pattern lets you find all lines that start with a.
$ Matches the end of a line. For instance, the a$ search pattern lets you find all lines that end with a.
. Matches any single character. For example, a.c matches abc, adc, aec but not aaac or abbc.
* Indicates that the preceding character or group matches 0 or more times. For instance, the abc*d pattern matches abd, abcd, abccd, but not a or abcx. The .* pattern matches a string of any length (including the empty string) that does not contain the newline symbol.

Note that this token is greedy, that is, it matches the longest possible string. For example, when searching on the string abbbb, ab* matches abbbb rather than ab.

+ Indicates that the preceding character or group matches 1 or more times. For instance, the ab+d pattern matches abbd or abbbd, but does not match abcd or abbcd.

Note that this token is greedy, that is, it matches the longest possible string. For example, when searching on the string abbbb, ab+ matches abbbb rather than ab or abb.

? Indicates that the preceding character or group is optional, that is, it should either match once or not to match at all. For example, abc?d will find abd and abcd, but not abccd.

Note that this token is greedy, that is, it matches the longest possible string. For example, when searching on the string abc, ab? matches ab rather than a.

??, +?, *? These tokens are non-greedy versions of ?, + and *. That is, they match the shortest possible string. For example, when searching on the string abbbb, ab*? matches a rather than ab or abbbb.
[ ] Matches any single character specified in brackets. For instance, d[ab]e matches dae or dbe and does not match dwe or dxe. To include the ] character into the search, make it either first, or last character in the range or use \]. For example, []abc], [abc]] or [ab\]cd].
[^ ] Matches any single character except for those in brackets. For instance, d[^bc]e matches dae or dxe, but not dbe or dce. d[^b-d]e matches dae, but not dbe, dce or dde.
[a-b] Matches any single character from a to b, inclusive. For instance d[k-x]e matches dke, dme and dxe, but not dze. To include the - character into the search, make it either first, or last character in the range or use \-. For example, [-ab], [abc-] or [a\-z].
[^a-b] Matches any single character not in the range a through b. For instance a[^k-x]z matches abz, aiz and ayz, but not akz.
a|b Matches either the a or b character or a group. For example, A|abc matches Abc and abc, but not A. The htm|(ml) pattern matches htm and html, but not htl or ml.
a!b Matches a not followed by b. For example, colo!ur matches color, but not colour.
( ) Groups characters. For example, (ab)+ matches ab and abab but not acb.

To specify a round bracket that should be treated literally, precede it with a backslash: \( or \).

{ } Indicates a match group. You can use braces in regular expressions that retrieve values that match the expression in the braces. If you create the following regular expression: [0-9]+-[0-9]+, it will match 125-125, but not 125-abcd. However, you can use braces to reduce the size of a regular expression. For example, you may need to modify the expression above to make it find strings containing only similar numbers that are hyphenated. For this purpose, you can specify the first part as a group and then address the value that matches this part by a zero-based index of the match group. This is the index that is specified after the backslash, \n. For example, you can change the regular expression mentioned above in the following way {[0-9]+}-\0. This means that TestComplete will replace the \0 expression with the string returned by the first match group. It will match 168-168, but not 125-168.

To specify a brace that should be treated literally, precede it with a backslash: \{ or \}.

\ A backslash indicates that the next character token (such as ?, !, *, - and others) should be treated literally. For example, \$10 matches the string $10. To match a backslash itself, use the double backslash pattern (\\).

When followed by a number (for example, \0), indicates the match group at the specified index (from 0).

In JavaScript, JScript, Python, C++Script and C#Script, you should use double backslashes to interpret the subsequent special character literally: "\\?", "\\.", "\\2" and so on. To match a backslash character, use the "\\\\" pattern.
\a Matches any alphanumeric character. This token is equivalent to [A-Za-z0-9].
\b Matches a whitespace character.
\c Matches any alphabetic character. This token is equivalent to [A-Za-z].
\d Matches any decimal digit. This token is equivalent to [0-9].
\h Matches any hexadecimal digit. This token is equivalent to [0-9A-Fa-f].
\n Matches a new line character.
\q Matches a quoted string. This token is equivalent to ("[^"]*")|('[^']*').
\w Matches a word. This token is equivalent to [a-zA-Z]+ or \c+.
\z Matches an integer number. This token is equivalent to [0-9]+ or \d+.

Tokens Used in Custom String Data Generator

Tokens Used in Native Regular Expressions

The following table lists tokens that are recognized by JavaScript’s, JScript’s, Python’s and VBScript’s “native” regular expressions, the HISUtils.RegExpr object, the Find and Replace dialogs, name mapping templates, and edit masks of form components.

Token Description
^ Beginning of a line. For instance, the ^a search pattern lets you find all lines that start with a.
$ End of a line. For instance, the a$ search pattern lets you find all lines that end with a.
. Matches any single character, except for a newline one. To search for any symbol including a newline one, you can use the [\s\S] pattern or enable the single-line mode with the (?s) modifier.
* The asterisk is a symbol-“repeater”. * means 0 or more occurrences of the preceding character or sub-expression. For instance, the abc*d pattern matches abd, abcd, abccd, but not a or abcx. The .* pattern matches a string of any length (including the empty string) that does not contain the newline symbol. The * token is equivalent to {0,}.
+ The plus is a symbol-“repeater”. + indicates 1 or more occurrences of the preceding character or sub-expression. For instance, the ab+d pattern matches abbd or abbbd, but does not match abcd or abbcd. The + token is equivalent to {1,}.
? The question mark means 0 or one occurrence of the preceding character or sub-expression. For example, abc?d will find abd and abcd, but not abccd. The ? token is an equivalent to {0,1}.
a{n} n occurrences of a. For example, fo{2}t will find foot, but not fot or fooot.
a{n,} n or more occurrences of a. For example, ab{2,}c will find abbc, abbbc, abbbbc, but not abc.
a{n,m} n or more, but less than or equal to m occurrences of a. For instance, ab{2,3}c will find abbc, but not abc or abbbbc.

Note:

The ? token can be used after *, +, {n,} and {n,m}. At that, it makes the searching pattern non-greedy, that is, the pattern matches as few characters or sub-expressions as possible. For example, when searching the abbbbbcd string, the b{3,} pattern will return bbbbb, while b{3,}? will return bbb (that is, without the question mark you get the string of five symbols “b”, while with question mark you get a string of three symbols “b”). Similarly, when searching the same string using the b{2,4} pattern, you will get bbbb; while using b{2,4}? you’ll get bb.
[ ] Any single character specified in brackets. For instance, d[ab]e matches dae or dbe and does not match dwe or dxe. To include the ] character into the search, make it either first, or last character in the range or use \]. For example, []abc], [abc]] or [ab\]cd].
[^ ] Any single character except those that are specified in brackets. For instance, d[^bc]e matches dae or dxe, but not dbe or dce. d[^b-d]e matches dae, but not dbe, dce or dde.
[a-b] Any single character from a to b, inclusive. For instance d[k-x]e matches dke, dme and dxe, but not dze. To include the - character into the search, make it either first, or last character in the range or use \-. For example, [-ab], [abc-] or [a\-z].
[^a-b] Any single character not in the range a through b. For instance a[^k-x]z matches abz, aiz and ayz, but not akz.
(aaa) Denotes a sub-expression. For instance, the (abra)(kadabra) pattern contains two sub-expressions: abra and kadabra. To specify a round bracket that should be treated literally, follow it with backslash: \( or \).
a|b Either a or b. For instance, ab|cde matches ab and cde, but not abde. The ht(m|ml) pattern will find htm and html, but not htl.
\ Backslash is used to specify that special characters, such as ^, $ or . (dot), do not belong to the search pattern and should be treated literally. For instance, the \$10 pattern lets you find the $10 string. To search for a backslash, use the double backslash pattern (\\).
Some more examples:

d[a\-c]e will find dae, d-e or dce, but not dbe.

d[\^bc]e will find d^e, dbe or dce.

\xNN A symbol whose hexadecimal ASCII code is NN. For example, A\x31B will find the string A1B. (Hexadecimal 31 is ASCII code of 1).

You can also use \x{NNNN} to search for characters whose code occupies more than one byte (Unicode).

\t Tab character.
\n Newline character.
\r Carriage return character.
\f Form feed character.
\w “Word” character: an alphanumeric symbol or underscore (_). This token is equivalent to [A-Za-z0-9_].
\W Any symbol except for “word” characters. This token is equivalent to [^A-Za-z0-9_].
\d Any digit character. This token is equivalent to [0-9].
\D Any character except for digit. This token is equivalent to [^0-9].
\s “Whitespace” character: a space, tab (\t), newline (\n), carriage return (\r) and form feed (\f). This token is equivalent to [ \t\n\r\f].
\S Any symbol except for “whitespace” characters. This token is equivalent to [^ \t\n\r\f].
\b Indicates a word boundary, that is, a position between a word character and a whitespace character. For example, oo\b matches oo in foo, but not in foot. Similarly, \bfo matches fo in foot, but not in afoot.
\B Indicates any position in a word except for boundary. For example, oo\B matches oo in foot, but not in foo.

Note:

You can use the \t, \w, \W, \d, \D, \s, \S, \b and \B expressions within brackets. For example, b[\d-e]b will find b1b, b2b or beb.

Sub-expressions

You can divide an expression into constituent parts or sub-expressions. To specify a sub-expression use parenthesis, for instance, (\s\d+,\d+,d+,)(\d+). The parsing engine detects two sub-expressions in this expression:

  • \s\d+,\d+,d+,
  • \d+

Besides, the engine assigns an item index to the whole expression and to each sub-expression, where the index of the expression is 0, the index of the first sub-expression is 1 and so on. That is:

  • 0: (\s\d+,\d+,d+,)(\d+)
  • 1: \s\d+,\d+,d+,
  • 2: \d+

A text fragment that matches a sub-expression is called a submatch.

To address a submatch in dialogs, you can use the syntax ${nn} (where nn stands for the index of the desired sub-expression). This feature gives you the option to operate with parts of the regular expression.

When using regular expressions in the Replace dialog, you can specify sub-expressions in both the Find what and Replace with expressions, thus providing you the ability to replace only a part of the sought-for expression.

Example

The way of addressing submatches from scripts depends on the scripting language.

In JavaScript, JScript, C#Script or C++Script the submatches are accessed via the $1$9 properties of global object RegExp. These properties return text fragments that correspond to first, second, … ninth sub-expression of the last found match. For example, RegExp.$2 returns the second submatch.

In Python, you can access submatches by using the group() method of the match object. If the index parameter of the method is 0, then it returns the entire match. If the index is a positive integer n, then the method returns the n-th submatch. For example, m.group(3) returns the third submatch.

In VBScript each found match is represented as the Match object, that has the SubMatches collection that stores data regarding found submatches. For example, Matches(2).SubMatches(0) refers to first submatch of the third found matching fragment.

In DelphiScript to refer to a submatch you can use the RegExpr.Match property. If the Index parameter of this property is 0, then it returns fragment that matches the whole expression. If the Index is a positive integer number n, then the property returns a fragment matching the n-th sub-expression. For example, Match[1] returns the first submatch. Besides, you can use the RegExpr.Substitute method that creates a new string by replacing special characters with found matches and submatches.

Mode Modifiers

VBScript does not support mode modifiers.

Mode modifiers specify how the engine interprets regular expressions. They toggle the engine’s behavior modes. The following modifiers are available:

Modifier Key Default State Description
i Enabled Makes the pattern match case-insensitive.
m Disabled Treats a string as multiple lines. In this mode, the caret ^ and dollar $ match before and after newlines in the subject string.
s Enabled Treats a string as a single line. In this mode, the dot matches the newline symbol.
g Enabled Controls greedy mode. Non-standard modifier.

Greedy repetition operator takes as many matching characters as possible, non-greedy takes as few as possible. For example, b+ and b* applied to string abbbbc will return bbbb, whereas b+? will return b and b*? will return an empty string.

Switching to non-greedy mode makes + work as +?, * as *? and so on.

x Disabled Permits whitespaces and comments in the pattern. Non-standard modifier.

In this mode, the whitespaces (\s) that are not backslashed nor within a character class are ignored. You can use this to break up your regular expression into more readable parts. Also the # character is treated as a metacharacter introducing a comment. For example:

  ( # This pattern matches
    (this) # the occurrence of 'this'
      | # or
    (that) # the occurrence of 'that'
  )

If you want to place a whitespace or # characters in the pattern, then you have to prefix them with / or encode them using hex notations (\xNN).

You can specify the modifier in the expression using the (?key) or (?-key) syntax (where key stands for the modifier key and minus sign specifies the disabled state for the corresponding modifier). If you try to specify unsupported modifier keys an error occurs. The modifiers can be applied to the whole expression or only to a sub-expression. For example:

(?i)Las Vegas matches Las vegas and Las Vegas
(?i)Las (?-i)Vegas matches Las Vegas but not Las vegas
(?i)(Las )?Vegas matches Las vegas and las vegas
((?i)Las )?Vegas matches las Vegas, but not las vegas

See Also

Using Regular Expressions in Scripts
Property Checkpoint Operation
CheckProperty Method
CompareProperty Method
StrMatches Method
StrMatches Method
Find Dialog
Replace Dialog
Name Mapping Templates
RegExpr Object
Using Masks

Highlight search results