Regular Expression Cheat Sheet
Regular Expression is useful when working with strings, but I often find it difficult to get started when I vaguely remember the regex grammar and the code syntax. This post is a summary of regex handy code for python I found online (and mainly from here).
1. Code Syntax
import re
text = "At 12, I bought 13 bananas and 3 oranges. Amazing!"
p = re.compile(r"\d+")
There are 4 methods:
- match(): determine if the beginning of the string matches the RE
- search(): return the first match
- findall(): find all substrings that match the RE and return a list
-
finditer(): find all substrings that match the RE and return an iter
p.match(text)returnsNonebecuase the begining is not a numberp.serch(text).group()returns12because it is the first set of numbersp.findall(text)returns['12', '12', '3'].
Or you don’t have to create a pattern object.
import re
re.search(r'\d+', text)
<re.Match object; span=(3, 5), match='12'>
2. RE Grammar
If you sort of know the grammar, check your RE here and skip the rest.
This RegEx Cookbook may be useful.
Otherwise, read below and here if you need more clarification.
Python’s Raw String to Handle Backslashes
Use Pyhton’s ring notation when using regex. Simply put an ‘r’ in front of an expression. This is to abtract away double interpretation of backslashes in regex and Python. raw st
Symbols
Anchor
^Startstart of stringend.$end of string\bone side is a word and the other is a white space.\Bis a negation.\Babc\Bmatchesabcin the middle of a word.
Quantifiers
*zero or more+one or more?zero or one{2}exactly 2{2,}2 or more{2,5}2 to up to 5
Character Classes
\ddigit\walphanumeric and underscoreswhite space, tab, and line break.any characters
Capitalized classes (\D, \W, and \S) are their negations.
Capturing
a(bc)matches an a that followed by a sequence bca(?:bc)like previous, but the capturing group is disabled using?:a(?<foo>bc)the group is named<foo>. The result can be accessed like a dictionary.
Brackets
[a-z]matches any letter from a to z[^a-zA-Z]a string that has not a letter from a to z or from A to Z. In this case the ^ is used as negation of the expression
Lookahead and Lookbehind
d(?=r)matches adthat is followed by anr.ris not in the match.(?<=r)dmatches adthat is preceded by anr.ris not in the match.- Use
!for their negations.d(?!r)and(?<!r)d.
Back reference [to be updated]
Use Case Examples [to be updated]