Example HTML page

Representation of regular expressions

Representation of regular expressions:

Regular expressions are represented as normal strings.

A small problem with this; when you have a string with some escape sequences, for example the string is “ hi this is \nepal”. What happens with “\n” here whether it goes to new line or it prints a “backslash” after is keyword.

The best way to overcome this problem consists in marking regular expressions as raw strings. The solution is use “r’hi this is \nepal”. If you really want a new line then use another backslash before the escaping character. Here a list of examples is given for practice.

Sample Program: Search a string in another string:

>>> import re

>>> x=re.search(“cat”,”A cat and a rat can’t be friends”)

>>> print(x)

<_sre.SRE_Match object; span=(2, 5), match=’cat’>

>>> x=re.search(“cow”,”A cat and a rat can’t be friends”)

>>> print(x)

None

>>> str=”A cat and a rat can’t be friends”

>>> s=re.search(“cat”,str)

>>> print(s)

<_sre.SRE_Match object; span=(2, 5), match=’cat’>

>>> t=re.search(“cow”,str)

>>> print(t)

None

We need to import “re” module to get the above output.  We have used the search method in “re” module.

We use the following Syntax:

re.search(expr,s)

It checks a string s for an occurrence of a substring which matches the regular expression expr. The first substring (from left), which satisfies this condition will be returned. We can use it in conditional statements: If a regular expression matches, we get an SRE object returned, which is taken as a True value, and “None”, which is the return value if it doesn’t match, is taken as False:

>>> str=”A cat and a rat can’t be friends”

>>> if re.search(“cat”,str):

print(” cat found”)

else:

print(” cat not found”)

 

 

cat found

>>> if re.search(“cow”,str):

print(” cow found”)

else:

print(” cow not found”)

 

 

cow not found

>>>

Task:

Write a Python code to check if the given mobile number is valid or not. The conditions to be satisfied for a mobile number are:

a) Number of characters must be 10

b) All characters must be digits and must not begin with a ‘0’.How?

Input Processing Output
A string representing a mobile number. Take character by character and check if it valid Print valid or invalid

Test Case 1:

  • abc8967891
  • Invalid
  • Alphabets are not allowed

Test Case 2:

  • 440446845
  • Invalid
  • Only 9 digits

Test Case 3:

  • 0440446845
  • Invalid
  • Should not begin with a zero

Test Case 4:

  • 8440446845
  • Valid

All conditions satisfied.

Example 1: Python Code to check the validity of a mobile number (Long Code)

import sys
number = input("enter the phone number")
if len(number)!=10:
    print ('invalid')
    sys.exit(0)
if number[0]=='0':
    print ('invalid')
    sys.exit(0)
for chr in number:
    if chr.isalpha():
        print ('invalid')
        break
else:
    print('Valid')

Output:

enter the phone number 9985327199

Valid

  • Regexes are strings containing text and special characters that describe a pattern with which to recognize multiple strings.
  • Regexs without special characters.
Regex Pattern String(s) Matched
foo foo
Python Python
abc123 abc123
  • These are simple expressions that match a single string
  • Power of regular expressions comes in when special characters are used to define character sets, subgroup matching, and pattern repetition.

Special Symbols and Characters:

Except for the control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all characters match themselves. You can escape a control character by preceding it with a backslash.

The following table lists few of the regular expressions:

Pattern Description
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[…] Matches any single character in brackets.
[^…] Matches any single character not in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression
re? Matches 0 or 1 occurrence of preceding expression
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
\w Matches word characters.
\W Matches nonword characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches non whitespace.
\d Matches digits. Equivalent to [0-9].
\D Matches nondigits.

 Example 2:

import re


if re.match("f.o","fooo"):
    print("Matched")
else:
    print("Not Matched")

Output:

Matched

Reason: Since it searches only for the pattern ‘f.o’ in the string

Example 3:

import re
if re.match("f.o$","fooo"):
    print("Matched")
else:
    print("Not matched")

Output:

Not matched

Reason: The entire string starts with ‘f’, ends with ‘o’ and contain one letter in between.

Example 4:

import re
if re.match("..","fooo"):
    print("Matched")
else:
    print("Not matched")

Output:

Matched

Reason:  Two dots matches any pair of characters.

Example 5:

import re
if re.match("..$","fooo"):
    print("Matched")
else:
    print("Not matched")

Output:

Not matched

Reason: Including a ‘$’ at the end will match only strings of length 2.

Example 6:

import re
if re.match(".end","bend"):
    print("Matched")
else:
    print("Not matched")

Output:

Matched

Reason: The expression used in the example, matches any character for ‘.’

If you remove the “.”  in the regular expression and check the output.

import re
if re.match("end","bend"):
    print("Matched")
else:
    print("Not matched")

Output:

Not matched

Reason: three letters presented and this is a match function. If you use the “search” function of the “re” module we will get the different output. See Example 7.

Example 7:

import re
if re.search("end","bend"):
    print("Matched")
else:
    print("Not matched")

Output:

Matched

Reason: we used a search function instead of match function.

Example 8:

import re
if re.match(".emd","bends"):
    print("Matched")
else:
    print("Not matched")

Output:

Not matched

Reason: we used ‘emd”.

Example 9:

import re
if re.match(".end$","bends"):
    print("Matched")
else:
    print("Not matched")


Output:

Not matched

Matching from the Beginning or End of Strings or Word Boundaries (^, $)

^ – Match beginning of string

$ – Match End of string.

Regex Pattern Strings Matched
^From Any string that starts with From
/bin/tcsh$ Any string that ends with /bin/tcsh
^Subject:hi$ Any string consisting solely of the string Subject: hi

If you wanted to match any string that ended with a dollar sign, one possible regex solution would be the pattern .*\$$.

Example 10:

Check whether the given register number of a VIT student is valid or not.

Example register number – 15bec1032

Register number is valid if it has two digits

Followed by three letters

Followed by four digits

Denoting Ranges (-) and Negation (^)

  • brackets also support ranges of characters
  • A hyphen between a pair of symbols enclosed in brackets is used to indicate a range of characters;

For example A–Z, a–z, or 0–9 for uppercase letters, lowercase letters, and numeric digits, respectively.

Regex Pattern Strings Matched
z.[0-9] “z” is followed by any character then followed by a single digit.
[r-u][env-y][us] “r,” “s,” “t,” or “u” followed by “e,” “n,” “v
,” “w,” “x,” or “y” followed by “u” or “s”
[^aeiou] A non-vowel character (Exercise: why do we say “non-vowels” rather than “constants”?)
[^\t\n] Not a TAB or \n.
[“-a] In an ASCII system, all characters that fall between ‘”, and “a,” that is, between ordinals 34 and 97

Multiple Occurrence/Repetition Using Closure Operators (*, +, ?, {})

  • special symbols *, +, and ?, all of which can be used to match single, multiple, or no occurrences of string patterns
  • Asterisk or star operator (*) – match zero or more occurrences of the regex immediately to its left
  • Plus operator (+) – Match one or more occurrences of a regex
  • Question mark operator (?) – match exactly 0 or 1 occurrences of a regex.
  • There are also brace operators ({}) with either a single value or a comma-separated pair of values. These indicate a match of exactly N occurrences (for {N}) or a range of occurrences; for example, {M, N} will match from M to N occurrences

Code for Example 10:

import re
reg_num=input("enter any register number")
if re.match("^[1-9][0-9][a-zA-Z][a-zA-Z][a-zA-Z][0-9][0-9][0-9][0-9]$",reg_num):
    print("Matched")
else:
    print("Not matched")

Output:

enter any register number17bce1124

Matched

 

enter any register number09efg1234

Not matched

Since we have {m} indicate that the pattern before the braces should occur m times. We can rewrite the Example 10 as shown below:

Example 11: Observe the changes compared with Example 10.

import re
reg_num=input("enter any register number")
if re.match("^[1-9][0-9][a-zA-Z]{3}[0-9]{4}$",reg_num):
    print("Matched")
else:
    print("Not matched")

Output:

enter any register number17bce1124

Matched

 

enter any register number09efg1234

Not matched

Example 12:

import re
number=input("enter a mobile number")
if re.match("[^0][0-9]{9}",number):
    print("Valid")
else:
    print("Invalid")

Output:

enter a mobile number9985327199

Valid

 

 

enter a mobile numberZ985327199

Valid

We can observe a “bug” in test case 2, it also accepted the first letter as a character i.e “Z” here. Normally it should give the “Invalid” result for such kind of mobile number. Instead of using [^0] use [1-9].

Example 13:

import re
number=input("enter a mobile number")
if re.match("[1-9][0-9]{9}",number):
    print("Valid")
else:
    print("Invalid")


Output:

enter a mobile number9985327199

Valid

 

enter a mobile numberZ985327199

Invalid

Example 14: Program to check whether the given PAN card number is valid or not.

import re
pan=input("enter any pan card number")

if len(pan)<10 and len(pan)>10:
    print("PAN number should have 10 charecters")
    exit
elif re.search("[^s-zA-Z0-9]",pan):
    print("No Special Symbols allowed")
    exit
elif re.search("[0-9]",pan[0:5]):
    print("Invalid 1")
    exit
elif re.search("[A-Za-z]",pan[5:9]):
    print("Invalid 2")
    exit
elif re.search("[0-9]",pan[-1]):
    print("Invalid 3")
    exit
else:
    print("Your card "+pan+" is valid")

Output:

enter any pan card numberAPKPM0642K

Your card APKPM0642K is valid

 

enter any pan card number%12352d5ab

No Special Symbols allowed

Example 15: We have a phone list of the Simpsons, yes, the famous Simpsons from the American animated TV series. T5here are some people with the surname Neu. We are looking for a Neu, but we don’t know the first name, we just know that it starts with a J. Let’s write a Python script, which finds all the lines of the phone book, which contain a person with the described surname and a first name starting with J. If you don’t know how to read and work with files.

Make sure the following file is available in your directory. If not create a file with the name simpsons_phone_book.txt with the following content.

Allison Neu 555-8396
Bob Newhall 555-4344
C. Montgomery Burns 555-0001
C. Montgomery Burns 555-0113
Canine College 555-7201
Canine Therapy Institute 555-2849
Cathy Neu 555-2362
City of New York Parking Violation Bureau 555-BOOT
Dr. Julius Hibbert 555-3642
Dr. Nick Riviera 555-NICK
Earn Cash For Your Teeth 555-6312
Family Therapy Center 555-HUGS
Homer Jay Simpson (Plow King episode) 555-3223
Homer Jay Simpson (work) 555-7334
Jack Neu 555-7666
Jeb Neu 555-5543
Jennifer Neu 555-3652
Ken Neu 555-8752
Lionel Putz 555-5299
MAD Magazine 555-8628
Marital Street Hotline 555-1680
Marvin Monroe 555-3700
Marvin Monroe's radio therapy show 555-PAIN
Moe Szyslak (phone number spells SMITHERS) 7648-4377
Moe Szyslak 555-0000
Moe's Tavern 555-1239
Mr. Plow 555-3226
NY Metro 555-5680
Ned Flanders 555-8904
New York Parking Violation Bureau 555-2668
ORB 	Dr Nick's "B"argain Medical Services 1-800-DOCT
Original Famous Ray's Pizza 555-PIZA
Otto's "How's my Driving" 555-8821
Plow King 555-4796
Pretzel Wagon 555-3226
Prof John Frink's Lab 555-5782
Radio Psychaiatrist 555-7246
Reverend Timothy Lovejoy 555-6542
Richard Nash 555-9996
Richard Newhall 555-9973
Ruff-form Dog School 555-0078
Santitarium for Dogs 555-9716
Sleep-Eazy Motel 555-1000
Sugar Truck 555-3872
Susan Newhall 555-2362
The Nuclear Powerplant 555-5246
The Simpsons' residence 555-8707
The Simpsons, 742 Evergreen Terrace 555-0113
Toby Muntz 555-9972


Now,

import re

fh = open("simpsons_phone_book.txt")
for line in fh:
    if re.search(r"J.*Neu",line):
        print(line.rstrip())
fh.close()

Output:

Jack Neu 555-7666

Jeb Neu 555-5543

Jennifer Neu 555-3652

Instead of downloading simpsons_phone_book.txt, we can use the file directly from the website by using urlopen from the module urllib.request:

import re

from urllib.request import urlopen
with urlopen('https://www.python-course.eu/simpsons_phone_book.txt') as fh:
    for line in fh:
        # line is a byte string so we transform it to utf-8:
        line = line.decode('utf-8').rstrip() 
        if re.search(r"J.*Neu",line):
            print(line)

Output:

Jack Neu 555-7666

Jeb Neu 555-5543

Jennifer Neu 555-3652

 

 

Example HTML page

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest