RegEx in Python

  • Last updated Apr 25, 2024

Regular expressions (regex or regexp) in Python are used for pattern matching and text manipulation. Python provides the re module, which allows you to work with regular expressions.

Here are some common tasks and examples of how to use regular expressions in Python:

  1. Importing the re Module:
  2. To use regex in Python, you need to import the re module first:

    import re
  3. Matching Patterns:
  4. The re.match() function matches the pattern at the beginning of a string.

    Example:

    import re
    
    pattern = r"Sun"
    text = "Sun is the ultimate source of energy"
    
    match = re.match(pattern, text)
    if match:
        print("Pattern found:", match.group())
    else:
        print("Pattern not found")

    Output:

    Pattern found: Sun
  5. Searching for Patterns:
  6. The re.search() function finds and returns the first occurrence of a pattern within a string.

    Example:

    import re
    
    pattern = r"\d+" # pattern to match digit
    text = "The first code is 9097 and the second code is 2032"
    
    result = re.search(pattern, text)
    print(result.group())

    Output:

    9097
  7. Searching for Patterns while Ignoring Case:
  8. To perform case-insensitive matching, you can use flags like re.IGNORECASE.

    Example:

    import re
    
    pattern = r"tree"
    text = "The TREE is tall."
    
    match = re.search(pattern, text, re.IGNORECASE)
    if match:
        print("Pattern found:", match.group())
    else:
        print("Pattern not found.")

    Output:

    Pattern found: TREE
  9. Finding all Matches:
  10. The re.findall() function finds all occurrences of a pattern in a string and returns them in a list.

    import re
    
    pattern = "\d+" # match digits
    text = "The first code is 9091. The second code is 2043. The third code is 7203. The fourth code is 5499."
    
    result = re.findall(pattern, text)
    print(result)

    Output:

    ['9091', '2043', '7203', '5499']
  11. Replacing Matches:
  12. The re.sub() function replaces all occurrences of a pattern with a specified string. The unchanged string is returned if the pattern is not found.

    Example:

    import re
    
    pattern = "\\s+"  # Pattern for matching one or more whitespaces
    text = "The     first code    is    9091."
    
    replace_with = " "  # Replace with a single space
    
    replaced_text = re.sub(pattern, replace_with, text)
    print("Replaced text:", replaced_text)

    Output:

    Replaced text: The first code is 9091.
  13. Splitting using a Matching Pattern:
  14. The split() function splits the source string at every match of the pattern and returns the resulting list of substrings.

    Example:

    import re
    
    pattern = "[,\\s]+" # Pattern for splitting by comma and spaces
    text = "a@example.com , b@example.com, c@example.com, d@example.com"
    
    result = re.split(pattern, text)
    print(result)

    Output:

    ['a@example.com', 'b@example.com', 'c@example.com', 'd@example.com']
Regular Expression Metacharacters

Metacharacters are characters with special meaning. Following are the list of Regular Expression metacharacters in Python:

Character Description Example
.
This (dot) matches any character except a new line.
"sec..t"
^
This (caret) matches the start of the string.
"^complete"
$
This matches the end of the string.
"secret$"
[ ]
This matches a set of characters.
"[A-Za-z]"
+
This matches 1 or more occurrence of the preceding regex pattern.
"[A-Za-z]+"
*
This matches 0 or more occurrence of the preceding pattern.
"[A-Za-z]*"
{ }
This matches the exact number of occurrences of the preceding pattern.
"[a-z]{3,10}"
|
This is regex or condition.
"a|b"
(re)
This matches a pattern as a group.
"(hello)"
Regular Expression Sequence Characters

Regular Expression uses the "\" character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:

Character Description Example
\A
Matches the specified pattern at the beginning.
"\Ahello"
\b
Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string.
r"\bhello"
r"hello\b"
\B
Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string.
r"\Bhello"
r"hello\B"
\d
Matches digits in a string.
"\d"
\D
Matches if a string does not contain digits
"\D"
\s
Matches whitespaces in a string.
"\s"
\S
Matches if a string does not contain a whitespace.
"\S"
\w
Matches a word in a string
"\w"
\W
Matches if a string does not contain a word.
"\W"
\Z
Matches the specified pattern at the end of a string.
"buddy\Z"