Regular Expressions Syntax

Applies to LoadComplete 4.97, last modified on May 20, 2019

Regular expressions are coded strings that define an infinite number of possible matches. You can use regular expressions to specify:

The syntax of regular expression patterns that can be used in various LoadComplete areas is slightly different due to the differences in regular expression engines used. This topic describes regular expressions understood by search queries, data selectors and dynamic parameters.

For information on regular expressions that can be used by the Custom String data generator, see Custom-String Generator - Supported Regular Expressions.

The topic contains the following sections:

To learn more about regular expressions, visit http://www.regular-expressions.info.

Color Convention

In this topic, we highlight different text fragments with different colors:

  • Regular expression pattern is highlighted with purple color.

  • Literal text to which the regular expression is applied is highlighted with olive color.

  • Literal text that matches the regular expression is highlighted with teal color.

Regular Expression Tokens

The following table lists regular expression tokens supported by LoadComplete user scenarios and dynamic parameters:

Token Description
^ Beginning of a line. For instance, the ^a search pattern lets you find all lines that start with a.
$ End of a line. For instance, the a$ search pattern lets you find all lines that end with a.
. Matches any single character, except for a newline one. To search for any symbol including a newline one, you can use the [\s\S] pattern or enable the “single-line mode” with the (?s) modifier.
* The asterisk is a symbol-“repeater”. * means 0 or more occurrences of the preceding character or sub-expression. For instance, the abc*d pattern matches abd, abcd, abccd, but not a or abcx. The .* pattern matches a string of any length (including the empty string) that does not contain the newline symbol. The * token is equivalent to {0,}.
+ The plus is a symbol-“repeater”. + indicates 1 or more occurrences of the preceding character or sub-expression. For instance, the ab+d pattern matches abbd or abbbd, but does not match abcd or abbcd. The + token is equivalent to {1,}.
? The question mark means 0 or one occurrence of the preceding character or sub-expression. For example, abc?d will find abd and abcd, but not abccd. The ? token is an equivalent to {0,1}.
a{n} n occurrences of a. For example, fo{2}t will find foot, but not fot or fooot.
a{n,} n or more occurrences of a. For example, ab{2,}c will find abbc, abbbc, abbbbc, but not abc.
a{n,m} n or more, but less than or equal to m occurrences of a. For instance, ab{2,3}c will find abbc, but not abc or abbbbc.
   Note: The ? token can be used after *, +, {n,} and {n,m}. At that, it makes the searching pattern “non-greedy”, that is, the pattern matches as few characters or sub-expressions as possible. For example, when searching the abbbbbcd string, the b{3,} pattern will return bbbbb, while b{3,}? will return bbb (that is, without the question mark you get the string of five symbols “b”, while with question mark you get a string of three symbols “b”). Similarly, when searching the same string using the b{2,4} pattern, you will get bbbb; while using b{2,4}? you’ll get bb.
[ ] Any single character specified in brackets. For instance, d[ab]e matches dae or dbe and does not match dwe or dxe. To include the ] character into the search, make it either first, or last character in the range or use \]. For example, []abc], [abc]] or [ab\]cd].
[^ ] Any single character except those that are specified in brackets. For instance, d[^bc]e matches dae or dxe, but not dbe or dce. d[^b-d]e matches dae, but not dbe, dce or dde.
[a-b] Any single character from a to b, inclusive. For instance d[k-x]e matches dke, dme and dxe, but not dze. To include the - character into the search, make it either first, or last character in the range or use \-. For example, [-ab], [abc-] or [a\-z].
[^a-b] Any single character not in the range a through b. For instance a[^k-x]z matches abz, aiz and ayz, but not akz.
(aaa) Denotes a sub-expression. For instance, the (abra)(kadabra) pattern contains two sub-expressions: abra and kadabra. To specify a round bracket that should be treated literally, follow it with backslash: \( or \).
LoadComplete allows specifying up to 9 sub-expressions inside one regular expression.
a|b Either a or b. For instance, ab|cde matches ab and cde, but not abde. The ht(m|ml) pattern will find htm and html, but not htl.
\ Backslash is used to specify special characters, such as ^, $, =, . (dot), parentheses, square brackets and others, that do not belong to the search pattern and should be treated literally. For instance, the \$10 pattern lets you find the $10 string. To search for a backslash, use the double backslash pattern (\\).
Some more examples:

d[a\-c]e will find dae, d-e or dce, but not dbe.

d[\^bc]e will find d^e, dbe or dce.

\xNN A symbol whose hexadecimal ASCII code is NN. For example, A\x31B will find the string A1B. (Hexadecimal 31 is ASCII code of 1).

You can also use \x{NNNN} to search for characters whose code occupies more than one byte (Unicode).

\t Tab character.
\n Newline character.
\r Carriage return character.
\f Form feed character.
\w “Word” character: an alphanumeric symbol or underscore (_). This token is equivalent to [A-Za-z0-9_].
\W Any symbol except for “word” characters. This token is equivalent to [^A-Za-z0-9_].
\d Any digit character. This token is equivalent to [0-9].
\D Any character except for digit. This token is equivalent to [^0-9].
\s “Whitespace” character: a space, tab (\t), newline (\n), carriage return (\r) and form feed (\f). This token is equivalent to [ \t\n\r\f].
\S Any symbol except for “whitespace” characters. This token is equivalent to [^ \t\n\r\f].
\b Indicates a word boundary, that is, a position between a word character and a whitespace character. For example, oo\b matches oo in foo, but not in foot. Similarly, \bfo matches fo in foot, but not in afoot.
\B Indicates any position in a word except for boundary. For example, oo\B matches oo in foot, but not in foo.
Note:

You can use the \t, \w, \W, \d, \D, \s, \S, \b and \B expressions within brackets. For example, b[\d-e]b will find b1b, b2b or beb.

Sub-Expressions

You can divide an expression into constituent parts or sub-expressions. To specify a sub-expression use parenthesis, for instance, (\s\d+\.\d+\.)(\d+). The parsing engine detects two sub-expressions in this expression:

  • \s\d+\.\d+\.
  • \d+
LoadComplete allows specifying up to 9 sub-expressions inside one regular expression.

Besides, the engine assigns an item index to the whole expression and to each sub-expression, where the index of the expression is 0, the index of the first sub-expression is 1 and so on. That is --

  • 0: (\s\d+\.\d+\.)(\d+)
  • 1: \s\d+\.\d+\.
  • 2: \d+

A text fragment that matches a sub-expression is called a submatch.

To address a submatch, you can use the syntax ${nn} (where nn stands for the index of the desired sub-expression). This feature gives you the option to operate with parts of the regular expression.

Show Example

Modifiers

Modifiers specify how the engine interprets regular expressions. They toggle the engine’s behavior modes. The following modifiers are available:

Modifier Key Default State Description
i Disabled Makes the pattern match case-insensitive.
m Enabled Treats a string as multiple lines. In this mode, the caret ^ and dollar $ match before and after newlines in the subject string.
s Enabled Treats a string as a single line. In this mode, the dot matches the newline symbol.
g Enabled Controls greedy mode. Non-standard modifier.

“Greedy” repetition operator takes as many matching characters as possible, “non-greedy” takes as few as possible. For example, b+ and b* applied to string abbbbc will return bbbb, whereas b+? will return b and b*? will return an empty string.

Switching to non-greedy mode makes + work as +?, * as *? and so on.
x Disabled Permits whitespaces and comments in the pattern. Non-standard modifier.

In this mode, the whitespaces (\s) that are not backslashed nor within a character class are ignored. You can use this to break up your regular expression into more readable parts. Also the # character is treated as a metacharacter introducing a comment. For example:

  ( # This pattern matches
    (this) # the occurrence of 'this'
      | # or
    (that) # the occurrence of 'that'
  )

If you want to place a whitespace or # characters in the pattern, then you have to prefix them with / or encode them using hex notations (\xNN).

You can specify the modifier in the expression using the (?key) or (?-key) syntax (where key stands for the modifier key and minus sign specifies the disabled state for the corresponding modifier). If you try to specify unsupported modifier keys an error occurs. The modifiers can be applied to the whole expression or only to a sub-expression. For example:

(?i)Las Vegas matches Las Vegas and las Vegas
(?i)Las (?-i)Vegas matches Las Vegas but not Las vegas
(?i)(Las )?Vegas matches Las Vegas and las Vegas
((?i)Las )?Vegas matches las Vegas but not las vegas

See Also

About Data Selectors
About Data Replacers
Create Data Selector Wizard
Finding Text in the Scenario Editor

Highlight search results