Python
RegEx in Python
- Last updated Apr 25, 2024
Regular expressions (regex or regexp) in Python are used for pattern matching and text manipulation. Python provides the re module, which allows you to work with regular expressions.
Here are some common tasks and examples of how to use regular expressions in Python:
- Importing the re Module:
- Matching Patterns:
- Searching for Patterns:
- Searching for Patterns while Ignoring Case:
- Finding all Matches:
- Replacing Matches:
- Splitting using a Matching Pattern:
To use regex in Python, you need to import the re module first:
import re
The re.match() function matches the pattern at the beginning of a string.
Example:
import re
pattern = r"Sun"
text = "Sun is the ultimate source of energy"
match = re.match(pattern, text)
if match:
print("Pattern found:", match.group())
else:
print("Pattern not found")
Output:
Pattern found: Sun
The re.search() function finds and returns the first occurrence of a pattern within a string.
Example:
import re
pattern = r"\d+" # pattern to match digit
text = "The first code is 9097 and the second code is 2032"
result = re.search(pattern, text)
print(result.group())
Output:
9097
To perform case-insensitive matching, you can use flags like re.IGNORECASE.
Example:
import re
pattern = r"tree"
text = "The TREE is tall."
match = re.search(pattern, text, re.IGNORECASE)
if match:
print("Pattern found:", match.group())
else:
print("Pattern not found.")
Output:
Pattern found: TREE
The re.findall() function finds all occurrences of a pattern in a string and returns them in a list.
import re
pattern = "\d+" # match digits
text = "The first code is 9091. The second code is 2043. The third code is 7203. The fourth code is 5499."
result = re.findall(pattern, text)
print(result)
Output:
['9091', '2043', '7203', '5499']
The re.sub() function replaces all occurrences of a pattern with a specified string. The unchanged string is returned if the pattern is not found.
Example:
import re
pattern = "\\s+" # Pattern for matching one or more whitespaces
text = "The first code is 9091."
replace_with = " " # Replace with a single space
replaced_text = re.sub(pattern, replace_with, text)
print("Replaced text:", replaced_text)
Output:
Replaced text: The first code is 9091.
The split() function splits the source string at every match of the pattern and returns the resulting list of substrings.
Example:
import re
pattern = "[,\\s]+" # Pattern for splitting by comma and spaces
text = "a@example.com , b@example.com, c@example.com, d@example.com"
result = re.split(pattern, text)
print(result)
Output:
['a@example.com', 'b@example.com', 'c@example.com', 'd@example.com']
Regular Expression Metacharacters
Metacharacters are characters with special meaning. Following are the list of Regular Expression metacharacters in Python:
Character | Description | Example |
. |
This (dot) matches any character except a new line. |
"sec..t" |
^ |
This (caret) matches the start of the string. |
"^complete" |
$ |
This matches the end of the string. |
"secret$" |
[ ] |
This matches a set of characters. |
"[A-Za-z]" |
+ |
This matches 1 or more occurrence of the preceding regex pattern. |
"[A-Za-z]+" |
* |
This matches 0 or more occurrence of the preceding pattern. |
"[A-Za-z]*" |
{ } |
This matches the exact number of occurrences of the preceding pattern. |
"[a-z]{3,10}" |
| |
This is regex or condition. |
"a|b" |
(re) |
This matches a pattern as a group. |
"(hello)" |
Regular Expression Sequence Characters
Regular Expression uses the "\" character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:
Character | Description | Example |
\A |
Matches the specified pattern at the beginning. |
"\Ahello" |
\b |
Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\bhello" r"hello\b" |
\B |
Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\Bhello" r"hello\B" |
\d |
Matches digits in a string. |
"\d" |
\D |
Matches if a string does not contain digits |
"\D" |
\s |
Matches whitespaces in a string. |
"\s" |
\S |
Matches if a string does not contain a whitespace. |
"\S" |
\w |
Matches a word in a string |
"\w" |
\W |
Matches if a string does not contain a word. |
"\W" |
\Z |
Matches the specified pattern at the end of a string. |
"buddy\Z" |