Representation of regular expressions
Representation of regular expressions:
Regular expressions are represented as normal strings.
A small problem with this; when you have a string with some escape sequences, for example the string is “ hi this is \nepal”. What happens with “\n” here whether it goes to new line or it prints a “backslash” after is keyword.
The best way to overcome this problem consists in marking regular expressions as raw strings. The solution is use “r’hi this is \nepal”. If you really want a new line then use another backslash before the escaping character. Here a list of examples is given for practice.
Sample Program: Search a string in another string:
>>> import re
>>> x=re.search(“cat”,”A cat and a rat can’t be friends”) >>> print(x) <_sre.SRE_Match object; span=(2, 5), match=’cat’> >>> x=re.search(“cow”,”A cat and a rat can’t be friends”) >>> print(x) None >>> str=”A cat and a rat can’t be friends” >>> s=re.search(“cat”,str) >>> print(s) <_sre.SRE_Match object; span=(2, 5), match=’cat’> >>> t=re.search(“cow”,str) >>> print(t) None |
We need to import “re” module to get the above output. We have used the search method in “re” module.
We use the following Syntax:
re.search(expr,s) |
It checks a string s for an occurrence of a substring which matches the regular expression expr. The first substring (from left), which satisfies this condition will be returned. We can use it in conditional statements: If a regular expression matches, we get an SRE object returned, which is taken as a True value, and “None”, which is the return value if it doesn’t match, is taken as False:
>>> str=”A cat and a rat can’t be friends”
>>> if re.search(“cat”,str): print(” cat found”) else: print(” cat not found”)
cat found >>> if re.search(“cow”,str): print(” cow found”) else: print(” cow not found”)
cow not found >>> |
Task:
Write a Python code to check if the given mobile number is valid or not. The conditions to be satisfied for a mobile number are:
a) Number of characters must be 10
b) All characters must be digits and must not begin with a ‘0’.How?
Input | Processing | Output |
A string representing a mobile number. | Take character by character and check if it valid | Print valid or invalid |
Test Case 1:
- abc8967891
- Invalid
- Alphabets are not allowed
Test Case 2:
- 440446845
- Invalid
- Only 9 digits
Test Case 3:
- 0440446845
- Invalid
- Should not begin with a zero
Test Case 4:
- 8440446845
- Valid
All conditions satisfied.
Example 1: Python Code to check the validity of a mobile number (Long Code)
Output:
enter the phone number 9985327199
Valid |
- Regexes are strings containing text and special characters that describe a pattern with which to recognize multiple strings.
- Regexs without special characters.
Regex Pattern | String(s) Matched |
foo | foo |
Python | Python |
abc123 | abc123 |
- These are simple expressions that match a single string
- Power of regular expressions comes in when special characters are used to define character sets, subgroup matching, and pattern repetition.
Special Symbols and Characters:
Except for the control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all characters match themselves. You can escape a control character by preceding it with a backslash.
The following table lists few of the regular expressions:
Pattern | Description |
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more occurrence of preceding expression |
re? | Matches 0 or 1 occurrence of preceding expression |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
\w | Matches word characters. |
\W | Matches nonword characters. |
\s | Matches whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches non whitespace. |
\d | Matches digits. Equivalent to [0-9]. |
\D | Matches nondigits. |
Example 2:
Output:
Matched |
Reason: Since it searches only for the pattern ‘f.o’ in the string
Example 3:
Output:
Not matched |
Reason: The entire string starts with ‘f’, ends with ‘o’ and contain one letter in between.
Example 4:
Output:
Matched |
Reason: Two dots matches any pair of characters.
Example 5:
Output:
Not matched |
Reason: Including a ‘$’ at the end will match only strings of length 2.
Example 6:
Output:
Matched |
Reason: The expression used in the example, matches any character for ‘.’
If you remove the “.” in the regular expression and check the output.
Output:
Not matched |
Reason: three letters presented and this is a match function. If you use the “search” function of the “re” module we will get the different output. See Example 7.
Example 7:
Output:
Matched |
Reason: we used a search function instead of match function.
Example 8:
Output:
Not matched |
Reason: we used ‘emd”.
Example 9:
Output:
Not matched |
Matching from the Beginning or End of Strings or Word Boundaries (^, $)
^ – Match beginning of string
$ – Match End of string.
Regex Pattern | Strings Matched |
^From | Any string that starts with From |
/bin/tcsh$ | Any string that ends with /bin/tcsh |
^Subject:hi$ | Any string consisting solely of the string Subject: hi |
If you wanted to match any string that ended with a dollar sign, one possible regex solution would be the pattern .*\$$.
Example 10:
Check whether the given register number of a VIT student is valid or not.
Example register number – 15bec1032
Register number is valid if it has two digits
Followed by three letters
Followed by four digits
Denoting Ranges (-) and Negation (^)
- brackets also support ranges of characters
- A hyphen between a pair of symbols enclosed in brackets is used to indicate a range of characters;
For example A–Z, a–z, or 0–9 for uppercase letters, lowercase letters, and numeric digits, respectively.
Regex Pattern | Strings Matched |
z.[0-9] | “z” is followed by any character then followed by a single digit. |
[r-u][env-y][us] | “r,” “s,” “t,” or “u” followed by “e,” “n,” “v ,” “w,” “x,” or “y” followed by “u” or “s” |
[^aeiou] | A non-vowel character (Exercise: why do we say “non-vowels” rather than “constants”?) |
[^\t\n] | Not a TAB or \n. |
[“-a] | In an ASCII system, all characters that fall between ‘”, and “a,” that is, between ordinals 34 and 97 |
Multiple Occurrence/Repetition Using Closure Operators (*, +, ?, {})
- special symbols *, +, and ?, all of which can be used to match single, multiple, or no occurrences of string patterns
- Asterisk or star operator (*) – match zero or more occurrences of the regex immediately to its left
- Plus operator (+) – Match one or more occurrences of a regex
- Question mark operator (?) – match exactly 0 or 1 occurrences of a regex.
- There are also brace operators ({}) with either a single value or a comma-separated pair of values. These indicate a match of exactly N occurrences (for {N}) or a range of occurrences; for example, {M, N} will match from M to N occurrences
Code for Example 10:
Output:
enter any register number17bce1124
Matched |
enter any register number09efg1234
Not matched |
Since we have {m} indicate that the pattern before the braces should occur m times. We can rewrite the Example 10 as shown below:
Example 11: Observe the changes compared with Example 10.
Output:
enter any register number17bce1124
Matched |
enter any register number09efg1234
Not matched |
Example 12:
Output:
enter a mobile number9985327199
Valid
|
enter a mobile numberZ985327199
Valid |
We can observe a “bug” in test case 2, it also accepted the first letter as a character i.e “Z” here. Normally it should give the “Invalid” result for such kind of mobile number. Instead of using [^0] use [1-9].
Example 13:
Output:
enter a mobile number9985327199
Valid |
enter a mobile numberZ985327199
Invalid |
Example 14: Program to check whether the given PAN card number is valid or not.
Output:
enter any pan card numberAPKPM0642K
Your card APKPM0642K is valid |
enter any pan card number%12352d5ab
No Special Symbols allowed |
Example 15: We have a phone list of the Simpsons, yes, the famous Simpsons from the American animated TV series. T5here are some people with the surname Neu. We are looking for a Neu, but we don’t know the first name, we just know that it starts with a J. Let’s write a Python script, which finds all the lines of the phone book, which contain a person with the described surname and a first name starting with J. If you don’t know how to read and work with files.
Make sure the following file is available in your directory. If not create a file with the name simpsons_phone_book.txt with the following content.
Now,
Output:
Jack Neu 555-7666
Jeb Neu 555-5543 Jennifer Neu 555-3652 |
Instead of downloading simpsons_phone_book.txt, we can use the file directly from the website by using urlopen from the module urllib.request:
Output:
Jack Neu 555-7666
Jeb Neu 555-5543 Jennifer Neu 555-3652 |