String pattern matching (Regular Expressions):
This topic will cover the theoretical aspects of regular expressions and will show you how to use them in Python scripts. The term “regular expression”, sometimes also called regex or regexp, is originated in theoretical computer science.
Regular Expressions are used in programming languages to filter texts or text-strings. It’s possible to check, if a text or a string matches a regular expression. A great thing about regular expressions is the syntax of regular expressions is the same for all programming and script languages, e.g. Python, Perl, Java, SED, AWK and even X#.
In sequential data types, “in” operator is used to check one string i.e. “vellore” is a substring of the string of another string i.e. “welcome to vellore institute of technology “:
|>>> s=”welcome to vellore institute of technology”
>>> “vellore” in s
>>> “come” in s
>>> “f t” in s
>>> “f i” in s
>>> “lc” in s
>>> “lo” in s
>>> “oy” in s
How it works??
We show step by step with the following diagrams how this matching is performed:
We check if the string sub = “abc”.
It is contained in the string s = “xaababcbcd”.
By the way, the string sub = “abc” can be seen as a regular expression, just a very simple one.
In the first place, we check, if the first positions of the two string match, i.e. s == sub.
This is not satisfied in our example. We mark this fact by the colour red:
Then we check, if s[1:4] == sub. This means that we have to check at first, if sub is equal to s. This is true and we mark it with the colour green. Then we have to compare the next positions. s is not equal to sub, so we don’t have to proceed further with the next position of sub and s:
Now we have to check if s[2:5] and sub are equal. The first two positions are equal but not the third:
The following steps should be clear without any explanations:
Finally, we have a complete match with s[4:7] == sub :