Regex Generator

Build regular expressions from common tokens, anchors, and quantifiers, then test the pattern instantly against sample text. You can also edit the generated regex directly before copying or downloading it.

Matches

No matches yet.
Add a regex and sample text to see matches here.


Regex Generator — Build and Test Regular Expressions Online

Regular expressions sit somewhere between a superpower and a puzzle. When you know them well, you can validate an email address, extract all URLs from a document, parse every timestamp from a log file, or sanitize user input — all in a single concise pattern. When you don't, even a simple pattern can produce unexpected matches, miss obvious cases, or cause the catastrophic backtracking performance issue that takes down a production server. The gap between "I know regex exists" and "I can write regex confidently" is mostly a matter of understanding the building blocks and having a way to test your patterns against real input before deploying them. This regex generator lets you construct patterns from named components, apply them to test text, and see matches highlighted in real time — so you know exactly what your expression does before it goes anywhere near your code.

The live testing runs in your browser using JavaScript's RegExp engine. Patterns you build here are directly compatible with JavaScript, TypeScript, and modern browser environments — and very similar to the regex syntax used in PHP (PCRE), Python, Java, Ruby, and most other languages, with some minor differences noted below.

The Core Building Blocks Every Regex Uses

Regular expressions are composed from a small set of concepts. Understanding each one clearly makes the rest fall into place:

Character classes: Define what characters can appear at a position. \d matches any digit (0–9). \w matches word characters (letters, digits, underscore). \s matches whitespace (space, tab, newline). [a-z] matches any lowercase letter. [^abc] matches any character that is not a, b, or c. The dot . matches any character except a newline (unless the s flag is set).

Quantifiers: Define how many times to match the preceding element. * means zero or more. + means one or more. ? means zero or one (making the element optional). {n} means exactly n times. {n,m} means between n and m times. By default, quantifiers are greedy — they match as much as possible. Adding ? after a quantifier (*?, +?) makes it lazy — matching as little as possible.

Anchors: Assert position rather than matching characters. ^ asserts the start of the input string. $ asserts the end. \b asserts a word boundary (the transition between a word character and a non-word character). With the m (multiline) flag, ^ and $ match the start and end of each line rather than the whole string.

Groups: Parentheses () create a capture group that both groups part of the pattern and captures the matched text for later use. Non-capturing groups (?:) group without capturing. Named groups (?<name>) let you reference captured content by name instead of index.

Alternation: The pipe character | acts as OR between alternatives. cat|dog matches either "cat" or "dog". When combined with groups, (cat|dog)s? matches "cat", "cats", "dog", or "dogs".

Flags: Modify how the entire pattern behaves. i makes matching case-insensitive. g finds all matches in the string rather than stopping after the first. m makes ^ and $ match line boundaries. s makes . match newlines. Missing the g or m flag is the most common reason a pattern that looks correct doesn't behave as expected.

Greedy vs. Lazy Matching — and Why It Matters

The greedy-vs-lazy distinction is one of the most important regex concepts to understand, because greedy matching produces confusing results until you internalize the rule.

By default, quantifiers are greedy: they match as many characters as possible while still allowing the overall pattern to succeed. Consider the pattern <.+> applied to the string <b>hello</b>. You might expect it to match <b>, but a greedy .+ will expand as far as possible — matching <b>hello</b> as a single match because the overall pattern still succeeds with the last > at the end of the string.

Making the quantifier lazy with <.+?> tells it to match the shortest possible string that still satisfies the pattern. Now it matches <b> and then </b> separately. For extracting content between delimiters — HTML tags, brackets, quotation marks — lazy quantifiers are almost always what you want.

Common Regex Patterns Used in Real Development

Email validation: /^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/i — matches a basic email format. Note that truly RFC-compliant email validation via regex is extremely complex; for most purposes a simpler pattern that catches obvious errors is preferable, with actual delivery confirmation handled at the server side.

URL matching: /https?:\/\/[\w\-.]+(\/[\w\-./?%&=]*)?/g — extracts http and https URLs from text. Useful for link extraction and URL sanitization.

Date format (YYYY-MM-DD): /\b\d{4}-\d{2}-\d{2}\b/g — matches ISO 8601 dates. Combine with range checking in your application code to validate actual date values.

Log parsing: Server and application logs are some of the most practical regex use cases. An Apache combined log line has a predictable format: IP address, timestamp, request line, status code, response size, referrer, user agent. A well-built regex can extract all of these fields from every line in a log file, turning unstructured text into structured data for analysis.

Input sanitization: Stripping non-alphanumeric characters from user input, removing consecutive whitespace, extracting phone number digits regardless of formatting — all of these are regex substitution operations that are more concise and readable than equivalent character-by-character loops.

Regex Differences Across Languages

The core syntax — character classes, quantifiers, anchors, groups — is consistent across most languages. The differences are at the edges: Python's re module uses slightly different flag syntax, PHP's PCRE uses delimiter characters around the pattern (/pattern/flags in code but without them in PCRE-specific functions), Java requires double-escaping backslashes in string literals. Lookbehind support varies — fixed-length lookbehinds work in most engines, variable-length lookbehinds are supported in Python 3.x and modern JavaScript (ES2018+) but not everywhere. Test in this tool for the structure, then verify in your target language's environment for the edge cases specific to that engine.

Frequently Asked Questions About Regular Expressions

The live test runs JavaScript's built-in RegExp engine in your browser. JavaScript regex is ECMAScript-based and is very similar to the regex used in most modern languages, but has some differences from Python's re module, PHP's PCRE, or Java's regex. Notably, JavaScript doesn't support lookbehind in older environments (though modern browsers do), doesn't have named group syntax in older code, and handles Unicode differently. Test your regex in the target language's environment for final verification.
By default, quantifiers like * and + are greedy — they match as much as possible. So <.+> applied to <b>text</b> matches the entire string from the first < to the last >, not just <b>. Adding ? after the quantifier makes it lazy: <.+?> matches the shortest possible string, giving you <b> then </b> separately. This distinction is a common source of bugs when extracting content between tags or delimiters.
The most common causes: missing the g flag (without it, only the first match is returned), anchors (^ and $) matching the whole string when you expected line-by-line matching (add m flag for multiline), unescaped special characters in the pattern (a literal . must be written as \.), and case sensitivity (add i flag if case shouldn't matter). Check each of these systematically before assuming the pattern logic is wrong.
Catastrophic backtracking happens when a regex engine tries exponentially many combinations because the pattern has nested quantifiers on overlapping character classes — like (a+)+ applied to a string like aaaaaab. The engine tries every possible way to split the input across the groups, which grows exponentially with input length, causing the regex to hang. Avoid patterns with nested quantifiers on similar character sets. Prefer atomic groups or possessive quantifiers if your regex engine supports them, and always test against inputs designed to trigger backtracking.
Wrapping part of your pattern in parentheses creates a capture group. When the pattern matches, the content inside the group is available separately from the full match. For example, /(\d{4})-(\d{2})-(\d{2})/ applied to 2024-01-15 gives you the full match 2024-01-15 plus three captured groups: 2024, 01, and 15. In JavaScript, these are accessed as match[1], match[2], match[3]. Named groups ((?<year>\d{4})) let you access captures by name instead of index.
For simple, controlled cases — extracting a known attribute from a known tag in generated output — regex can work. But as a general rule, don't use regex to parse arbitrary HTML or XML. These are recursive, context-sensitive languages that regex cannot correctly handle in the general case. Nested tags, optional closing tags (in HTML), attributes in any order, and varying whitespace all break naive regex parsers. Use a proper DOM parser (DOMParser in JavaScript, BeautifulSoup in Python, SimpleXML in PHP) for reliable HTML/XML parsing.
In JavaScript: const regex = /your-pattern/flags; or new RegExp('your-pattern', 'flags'). Use string.match(regex), string.replace(regex, replacement), or regex.test(string). In PHP: preg_match('/your-pattern/flags', $string, $matches) or preg_replace(). In Python: import re; re.match(r'your-pattern', string) or re.findall(). The core pattern syntax is similar across languages but flags and some features differ — always verify in the target environment.