|
Regular Expressions Syntax
|
All characters are taken literally except the following:
".", "|", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\".
These characters have special meaning and must be preceded by a "\" to be taken
literally.
The dot "." matches any characters including new line symbols [CR] and [LF].
An expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times excluding zero.
An expression followed by "?" can be repeated no more than one time.
The bounds "{" "}" may be used to specify number of repetitions: "{N}" means that
the expression must be repeated N times,
"{N,M}" means that the expression must be repeated N to M times.
Parenthesis "(" ")" are used to mark subexpressions which which are counted starting
from 1 from left to right.
Subexpression zero is the whole match of the expression.
Alternative expressions are separated by "|" or put on separate lines in the expression.
The empty string at the beginning of line is matched by "^" character.
The empty string at the end of line is matched by "$" character.
"\`" matches the start of the whole text.
"\A" matches the start of the whole text.
"\'" matches the end of a whole text.
"\z" matches the end of a whole text.
"\Z" matches the end of a whole text, or any new line characters at the end.
|
|
The character set enclosed in brackets "[" "]" matches any symbol it contains, for
example "[abc]" matches either "a", "b" or "c".
Sets that start with "^" matches any character that is not member of the set, for
example "[^abc]" matches any character except "a", "b" and "c".
Character ranges can be specified as "[a-d]", which matches any symbol betweed "a"
and "d".
Character classes are denoted by "[:class:]" within a set declaration.
Commonly used character sets are:
[:alnum:] Alpha numeric character.
[:alpha:] Alphabetical character a-z and A-Z.
[:blank:] Blank character, either a space or a tab.
[:cntrl:] Control character.
[:digit:] Digit 0-9.
[:graph:] Graphical character.
[:lower:] Lower case character a-z.
[:print:] Printable character.
[:punct:] Punctuation character.
[:space:] Whitespace character.
[:upper:] Upper case character A-Z.
[:xdigit:] Hexadecimal digit character, 0-9, a-f and A-F.
[:word:] Word character - all alphanumeric characters plus the underscore.
[:Unicode:] Character whose code is greater than 255, this applies to the Unicode
characters only.
|
|
The characters may be matched by octal code "\0NNN" or hexademical code "\xHH",
enclosed in brackets "{" "}" if necessary: "\0{NNN}" "\x{HH}".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
The beginning of the text is a potential start of the word and the end of the text
is a potential end of the word.
Subexpressions may be identified and the matched text used further in the expression
by labels "\1" to "\9".
|
|
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to ".".
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter
a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character
until a \E end quote operator is found.
\E The end quote operator, terminates a sequence started with \Q.
\a Bell character 0x07.
\f Form feed character 0x0C.
\n Newline character 0x0A.
\r Carriage return character 0x0D.
\t Tab character 0x09.
\v Vertical tab character 0x0B.
\e ASCII Escape character 0x1B.
\0dd An octal character code, where dd is one or more octal digits.
\xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} A hexadecimal character code, where XX is one or more hexadecimal digits,
optionally a Unicode character.
\cZ An ASCII escape sequence control-Z, where Z is any ASCII character greater than
or equal to the character code for '@'.
Download Easy File Editor (3.1 Mb
zip,exe)
Current version: 2.3.5, updated on Fri 07/10/2009
Order Now ($ 14.95)
|