Python Programming for Beginners: A Comprehensive Guide Python Environment Set Up Python Basic Syntax Python Variables Python Data Types Python String Python Numbers Python Casting Python List Python Tuple Python Set Python Dictionaries Python Boolean Python Arrays Python Operators Python For Loop Python while Loop Python if elif else Functions in Python Python Lambda Python Classes & Objects Python Inheritance Python Scope Python Modules Python Math Python RegEx Python User Input Python try except Python finally Python Logging Create a Python Project Python Managing Application Dependencies

RegEx in Python

Last updated Apr 25, 2024

Regular expressions (regex or regexp) in Python are used for pattern matching and text manipulation. Python provides the re module, which allows you to work with regular expressions.

Here are some common tasks and examples of how to use regular expressions in Python:

Importing the re Module:

To use regex in Python, you need to import the re module first:

import re

Matching Patterns:

The re.match() function matches the pattern at the beginning of a string.

Example:

import re

pattern = r"Sun"
text = "Sun is the ultimate source of energy"

match = re.match(pattern, text)
if match:
    print("Pattern found:", match.group())
else:
    print("Pattern not found")

Output:

Pattern found: Sun

Searching for Patterns:

The re.search() function finds and returns the first occurrence of a pattern within a string.

Example:

import re

pattern = r"\d+" # pattern to match digit
text = "The first code is 9097 and the second code is 2032"

result = re.search(pattern, text)
print(result.group())

Output:

Searching for Patterns while Ignoring Case:

To perform case-insensitive matching, you can use flags like re.IGNORECASE.

Example:

import re

pattern = r"tree"
text = "The TREE is tall."

match = re.search(pattern, text, re.IGNORECASE)
if match:
    print("Pattern found:", match.group())
else:
    print("Pattern not found.")

Output:

Pattern found: TREE

Finding all Matches:

The re.findall() function finds all occurrences of a pattern in a string and returns them in a list.

import re

pattern = "\d+" # match digits
text = "The first code is 9091. The second code is 2043. The third code is 7203. The fourth code is 5499."

result = re.findall(pattern, text)
print(result)

Output:

['9091', '2043', '7203', '5499']

Replacing Matches:

The re.sub() function replaces all occurrences of a pattern with a specified string. The unchanged string is returned if the pattern is not found.

Example:

import re

pattern = "\\s+"  # Pattern for matching one or more whitespaces
text = "The     first code    is    9091."

replace_with = " "  # Replace with a single space

replaced_text = re.sub(pattern, replace_with, text)
print("Replaced text:", replaced_text)

Output:

Replaced text: The first code is 9091.

Splitting using a Matching Pattern:

The split() function splits the source string at every match of the pattern and returns the resulting list of substrings.

Example:

import re

pattern = "[,\\s]+" # Pattern for splitting by comma and spaces
text = "a@example.com , b@example.com, c@example.com, d@example.com"

result = re.split(pattern, text)
print(result)

Output:

['a@example.com', 'b@example.com', 'c@example.com', 'd@example.com']

Regular Expression Metacharacters

Metacharacters are characters with special meaning. Following are the list of Regular Expression metacharacters in Python:

Character	Description	Example
.	This (dot) matches any character except a new line.	"sec..t"
^	This (caret) matches the start of the string.	"^complete"
$	This matches the end of the string.	"secret$"
[ ]	This matches a set of characters.	"[A-Za-z]"
+	This matches 1 or more occurrence of the preceding regex pattern.	"[A-Za-z]+"
*	This matches 0 or more occurrence of the preceding pattern.	"[A-Za-z]*"
{ }	This matches the exact number of occurrences of the preceding pattern.	"[a-z]{3,10}"
\|	This is regex or condition.	"a\|b"
(re)	This matches a pattern as a group.	"(hello)"

Regular Expression Sequence Characters

Regular Expression uses the "\" character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:

Character	Description	Example
\A	Matches the specified pattern at the beginning.	"\Ahello"
\b	Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string.	r"\bhello" r"hello\b"
\B	Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string.	r"\Bhello" r"hello\B"
\d	Matches digits in a string.	"\d"
\D	Matches if a string does not contain digits	"\D"
\s	Matches whitespaces in a string.	"\s"
\S	Matches if a string does not contain a whitespace.	"\S"
\w	Matches a word in a string	"\w"
\W	Matches if a string does not contain a word.	"\W"
\Z	Matches the specified pattern at the end of a string.	"buddy\Z"