Regular expressions are coded strings that define an infinite number of possible matches. You can use regular expressions to specify:
-
Text to find in user scenarios.
-
The data that data selectors extract.
-
The data that data replacers replace.
-
The masks for values generated by the Custom String data generator.
The syntax of regular expression patterns that can be used in various LoadComplete areas is slightly different due to the differences in regular expression engines used. This topic describes regular expressions understood by search queries, data selectors and dynamic parameters.
For information on regular expressions that can be used by the Custom String data generator, see Custom-String Generator - Supported Regular Expressions.
The topic contains the following sections:
To learn more about regular expressions, visit http://www.regular-expressions.info.
Color Convention
In this topic, we highlight different text fragments with different colors:
-
Regular expression pattern is highlighted with purple color.
-
Literal text to which the regular expression is applied is highlighted with olive color.
-
Literal text that matches the regular expression is highlighted with teal color.
Regular Expression Tokens
The following table lists regular expression tokens supported by LoadComplete user scenarios and dynamic parameters:
Token | Description | ||
---|---|---|---|
^ | Beginning of a line. For instance, the ^a search pattern lets you find all lines that start with a. | ||
$ | End of a line. For instance, the a$ search pattern lets you find all lines that end with a. | ||
. | Matches any single character, except for a newline one. To search for any symbol including a newline one, you can use the [\s\S] pattern or enable the “single-line mode” with the (?s) modifier. | ||
* | The asterisk is a symbol-“repeater”. * means 0 or more occurrences of the preceding character or sub-expression. For instance, the abc*d pattern matches abd, abcd, abccd, but not a or abcx. The .* pattern matches a string of any length (including the empty string) that does not contain the newline symbol. The * token is equivalent to {0,}. | ||
+ | The plus is a symbol-“repeater”. + indicates 1 or more occurrences of the preceding character or sub-expression. For instance, the ab+d pattern matches abbd or abbbd, but does not match abcd or abbcd. The + token is equivalent to {1,}. | ||
? | The question mark means 0 or one occurrence of the preceding character or sub-expression. For example, abc?d will find abd and abcd, but not abccd. The ? token is an equivalent to {0,1}. | ||
a{n} | n occurrences of a. For example, fo{2}t will find foot, but not fot or fooot. | ||
a{n,} | n or more occurrences of a. For example, ab{2,}c will find abbc, abbbc, abbbbc, but not abc. | ||
n or more, but less than or equal to m occurrences of a. For instance, ab{2,3}c will find abbc, but not abc or abbbbc. | |||
Note: | The ? token can be used after *, +, {n,} and {n,m}. At that, it makes the searching pattern “non-greedy”, that is, the pattern matches as few characters or sub-expressions as possible. For example, when searching the abbbbbcd string, the b{3,} pattern will return bbbbb, while b{3,}? will return bbb (that is, without the question mark you get the string of five symbols “b”, while with question mark you get a string of three symbols “b”). Similarly, when searching the same string using the b{2,4} pattern, you will get bbbb; while using b{2,4}? you’ll get bb. | ||
[ ] | Any single character specified in brackets. For instance, d[ab]e matches dae or dbe and does not match dwe or dxe. To include the ] character into the search, make it either first, or last character in the range or use \]. For example, []abc], [abc]] or [ab\]cd]. | ||
[^ ] | Any single character except those that are specified in brackets. For instance, d[^bc]e matches dae or dxe, but not dbe or dce. d[^b-d]e matches dae, but not dbe, dce or dde. | ||
[a-b] | Any single character from a to b, inclusive. For instance d[k-x]e matches dke, dme and dxe, but not dze. To include the - character into the search, make it either first, or last character in the range or use \-. For example, [-ab], [abc-] or [a\-z]. | ||
Any single character not in the range a through b. For instance a[^k-x]z matches abz, aiz and ayz, but not akz. | |||
(aaa) | Denotes a sub-expression. For instance, the (abra)(kadabra) pattern contains two sub-expressions: abra and kadabra. To specify a round bracket that should be treated literally, follow it with backslash: \( or \).
|
||
a|b | Either a or b. For instance, ab|cde matches ab and cde, but not abde. The ht(m|ml) pattern will find htm and html, but not htl. | ||
\ | Backslash is used to specify special characters, such as ^, $, =, . (dot), parentheses, square brackets and others, that do not belong to the search pattern and should be treated literally. For instance, the \$10 pattern lets you find the $10 string. To search for a backslash, use the double backslash pattern (\\).
Some more examples:
d[a\-c]e will find dae, d-e or dce, but not dbe. d[\^bc]e will find d^e, dbe or dce. |
||
\xNN | A symbol whose hexadecimal ASCII code is NN. For example, A\x31B will
find the string A1B. (Hexadecimal 31 is ASCII
code of 1).
You can also use \x{NNNN} to search for characters whose code occupies more than one byte (Unicode). |
||
\t | Tab character. | ||
\n | Newline character. | ||
\r | Carriage return character. | ||
\f | Form feed character. | ||
\w | “Word” character: an alphanumeric symbol or underscore (_). This token is equivalent to [A-Za-z0-9_]. | ||
\W | Any symbol except for “word” characters. This token is equivalent to [^A-Za-z0-9_]. | ||
\d | Any digit character. This token is equivalent to [0-9]. | ||
\D | Any character except for digit. This token is equivalent to [^0-9]. | ||
\s | “Whitespace” character: a space, tab (\t), newline (\n), carriage return (\r) and form feed (\f). This token is equivalent to [ \t\n\r\f]. | ||
\S | Any symbol except for “whitespace” characters. This token is equivalent to [^ \t\n\r\f]. | ||
\b | Indicates a word boundary, that is, a position between a word character and a whitespace character. For example, oo\b matches oo in foo, but not in foot. Similarly, \bfo matches fo in foot, but not in afoot. | ||
\B | Indicates any position in a word except for boundary. For example, oo\B matches oo in foot, but not in foo. | ||
Note: |
You can use the \t, \w, \W, \d, \D, \s, \S, \b and \B expressions within brackets. For example, b[\d-e]b will find b1b, b2b or beb. |
Sub-Expressions
You can divide an expression into constituent parts or sub-expressions. To specify a sub-expression use parenthesis, for instance, (\s\d+\.\d+\.)(\d+). The parsing engine detects two sub-expressions in this expression:
- \s\d+\.\d+\.
- \d+
LoadComplete allows specifying up to 9 sub-expressions inside one regular expression. |
Besides, the engine assigns an item index to the whole expression and to each sub-expression, where the index of the expression is 0, the index of the first sub-expression is 1 and so on. That is --
- 0: (\s\d+\.\d+\.)(\d+)
- 1: \s\d+\.\d+\.
- 2: \d+
A text fragment that matches a sub-expression is called a submatch.
To address a submatch, you can use the syntax ${nn} (where nn stands for the index of the desired sub-expression). This feature gives you the option to operate with parts of the regular expression.
Modifiers
Modifiers specify how the engine interprets regular expressions. They toggle the engine’s behavior modes. The following modifiers are available:
You can specify the modifier in the expression using the (?key) or (?-key) syntax (where key stands for the modifier key and minus sign specifies the disabled state for the corresponding modifier). If you try to specify unsupported modifier keys an error occurs. The modifiers can be applied to the whole expression or only to a sub-expression. For example:
(?i)Las Vegas | matches Las Vegas and las Vegas |
(?i)Las (?-i)Vegas | matches Las Vegas but not Las vegas |
(?i)(Las )?Vegas | matches Las Vegas and las Vegas |
((?i)Las )?Vegas | matches las Vegas but not las vegas |
See Also
About Data Selectors
About Data Replacers
Create Data Selector Wizard
Finding Text in the Scenario Editor