Regular expressions in JavaScript

Regular expressions are a powerful tool for matching strings. JavaScript provides convenient methods for searching and replacing strings using regular expressions patterns, highly useful for validating and manipulating user entered text.

A regular expression is a specific kind of string used to search and manipulate textual content based on patterns. Often referred to as regex or regexp, a regular expression or pattern is an expression that describes a set of strings. Thus, we refer to pattern matching as the process of finding specific sets of strings described by regular expressions.

Regular expressions are written in a formal language interpreted by a regular expressions engine. The language provides a concise way to describe sets of strings through a combination of normal characters and metacharacters. Normal characters are treated as literals that have no special meaning and only match themselves. On the other hand, metacharacters or metasequences are characters or sequences of characters that are interpreted in a special way and represent things such as quantity, location, types and ranges of characters. If metadata are data about data, then metacharacters are data about characters.

Most programming languages support regular expressions one way or the other and in some languages such as Perl they are built into their syntax. JavaScript is no exception and has built-in support for regular expressions since version 1.2. It uses a Perl-like syntax.

Regular expressions cheat sheet

Modifiers or flags

In JavaScript, there are three optional flags that allow you to change how the regular expressions engine will perform the actual matching:

g
Global match — find all matches rather than only the first one
i
Ignore case — match both lower and upper case letters
m
Multi-line match — tells the engine to treat the subject string as multiple lines. ^ and $ match next to \n instead of the start or end of the entire string

Special characters and character classes

.
Matches any single character except the newline character. Example: .at matches bat, cat, rat and also .at, 1at
Equivalent to [^\x0A\x0D\u2028\u2029]
[...]
The bracket expression specifies a character class and matches any single character contained within the brackets or range of characters. Example: [abc] matches a, b and/or c, in any order.
A range of characters is specified using the -, for example [a-z] matches any lowercase ASCII letter from a to z. Other examples include [A-F] which matches any uppercase ASCII letter from A to F, and [4-7] which matches any number from 4 to 7. The - character is treated as a literal character if it's listed first, last or escaped: [-] matches -, [a-] matches a and/or -, [a\-z] matches a, - and/or z.
Additionally, listed characters can be mixed with ranges of characters. Example: [0-9a-fA-F] matches any number and also letters from a to z irrespective of their case, [02468aeiouy-] matches even numbers, vowels and the - character.
Brackets inside bracket expressions are treated as literals if they are escaped. Example: [\[\]] matches [ and/or ]. The [ doesn't need to be escaped if it's listed first: [[] matches [
[^...]
The negated bracket expression or negated character class matches any single character not contained within the brackets or range of characters. Same as above, except that the ^ negates the expression. Example: [0-9] matches any character that's not a number.
Although the ^ character is a special character, it doesn't need to be escaped within the brackets in order to be treated as a literal. Example: [^] matches anything, [^^] matches anything except the ^ character.
\w
Word character
Equivalent to [A-Za-z0-9_]
\W
Non-word character
Equivalent to [^A-Za-z0-9_]
\d
Digit character
Equivalent to [0-9]
\D
Non-digit character
Equivalent to [^0-9]
\s
Whitespace character
Equivalent to [\f\n\r\t\v\u00A0\u2028\u2029] (\u00A0 means "no-break space", \u2028 means "line separator", \u2029 means "paragraph separator")
\S
Non-whitespace character
Equivalent to [^\f\n\r\t\v\u00A0\u2028\u2029]
\b
Backspace (\x08)
\f
Form-feed (\x0C)
\n
Linefeed or newline (\x0A)
\r
Carriage return (\x0D)
\t
Tab (\x09)
\v
Vertical tab (\x0B)
\0
Null character (\x00)
\xhh
Character with hexadecimal code hh.
\uhhhh
Character with hexadecimal code hhhh.

Quantifiers

Repetition is specified by quantifiers:

?
Match 0 or 1 times. Example: ab? matches a and ab
*
Match 0 or more times. Example: ab* matches a, ab, abb, abbb etc.
+
Match 1 or more times. Example: ab+ matches ab, abb, abbb etc.
{n}
Match exactly n times. Example: ab{2} matches abb
{n,}
Match n or more times. Example: ab{2,} matches abb, abbb, abbbb etc.
{n,m}
Match at least n times, but no more than m times. Example: ab{2,3} matches abb and abbb
??
Match 0 or 1 times, but as few times as possible. Example: ab?? against abbbbb matches a
*?
Match 0 or more times, but as few times as possible. Example: ab*? against abbbbb matches a
+?
Match 1 or more times, but as few times as possible. Example: ab+? against abbbbb matches ab
{n}?
Match n or more times, but as few times as possible. Example: ab{2}? against abbbbb matches abb
{n,m}?
Match at least n times, no more than m times, but as few times as possible. Example: ab{2,3}? against abbbbb matches abb

Subpatterns and alternative patterns

(...)
Capturing group - group subpattern and capture the match. Example: (foo)bar matches foobar and captures foo
(?:...)
Non-capturing group - group subpattern, but don't capture the match. Example: (?:foo)bar matches foobar and doesn't capture anything
...|...
Alternation operator - matches one of the alternative subppatterns. Example: foo|bar|baz matches either foo, bar or baz

Anchor points

Anchors match positions in the subject string:

^
Start of search string, or after any newline in multi-line mode m. Example: ^The matches The at the beginning of every line
$
End of search string, before a newline, or before any newline in multi-line mode m. Example: !$ matches ! at the end of every line
\b
Word boundary — position before or after a word. Example: \bipsum matches ipsum against lorem ipsum but doesn't match anything against lipsum. Word boundaries need not be spaces. For example: \w+\b matches both yeah and whatever against yeah, whatever!
\B
Non-word or not-word boundary — position between word boundaries (i.e. inside a word). Example: \Bipsum matches ipsum against lipsum but doesn't match anything against lorem ipsum
(?=...)
Positive lookahead — matches something followed by something else. Example: ab(?=c) matches ab against abc but doesn't match anything against ab or aba
(?!...)
Negative lookahead — matches something not followed by something else. Example: ab(?!c) matches ab against ab or aba but doesn't match anything against abc

Note: lookbehind is not supported in JavaScript

Regular expressions objects and methods

JavaScript provides two objects for dealing with regular expressions:

The RegExp object

RegExp is a global object in JavaScript used to create regular expressions objects. A RegExp object can be defined through an object constructor:

new RegExp(pattern [, flags])

or as a literal:

/pattern/flags

The advantage of the constructor function is that the pattern can be constructed dynamically at any time.

They are handled the same way, no matter how you define them.

Parameters

pattern
Specifies the text of the regular expression
flags
Flags are optional and they specify how the regular expression should behave

Methods

The RegExp objects provides two methods for working with regular expressions:

test(text)
The test(text) method tests for a match in the input string. It searches the string for the specified pattern and returns true if the pattern matches the string or false otherwise.
exec(text)
The exec(text) method executes the specified pattern on the input string and returns an array of matched strings if it succeeds or null if it fails. The first element of the array contains the text matched by the entire pattern while the other elements correspond to text that matched captured subpatterns.

The String object

The String global object may also be used to search and manipulate strings in JavaScript. It offers four methods for matching and manipulating strings.

Methods

str.match(pattern)
Match pattern against the input string. With the g (global search) flag it returns an array containing all matches. Without the g flag it returns only the first match. If there are no matches it returns null.
str.search(pattern)
Searches the input string for the given pattern and returns the index of the start of the match. If no match is found, it returns -1.
str.replace(pattern, replacement)
Performs a search and replace operation on the input string. It replaces the matches with the replacement string. Returns the nre string. The subject string may remain unchanged if there are no matches.
str.split(pattern [, limit])
Splits the input string by a regular expression and returns an array. The optional limit specifies a limit on the number of splits.

See also

If you see a typo, want to make a suggestion or have anything in particular you'd like to know more about, please drop us an e-mail at hello at diveintojavascript dot com.

Copyright © 2010-2013 Dive Into JavaScript