Regular_Expression _Matching

Let me continue….

Problem Statement:

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

  • s could be empty and contains only lowercase letters a-z.
  • p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".

Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

Solution One: Recursion

To make things simple, the only rule we need to worry about is ‘.’ and ‘*’. The corresponding meaning for ‘.’ is obvious, which means that that dot corresponds to every character. However, the ‘*’ is more complex, it means how many times the preceding character repeats (note the preceding character can repeat 0 time, which means get rid of the preceding character).

First thing first, if we do not consider ‘*’ character, everything will be simple. We just need to compare each character respectively. Let’s write our recursive code to provide a solution for this (without ‘*’):

Step 1: Let’s assume we have a function which could tell us whether the two string s and p will correspond to each other:

bool isMatch(string s, string p);

Step 2: If we want s and p to match, which means the first element of the s and p should match, and the rest of the two strings should match as well. We can use isMatch function to calculate the rest of the substring:

(s[0] == p[0] || p[0] == ‘.’ )&& isMatch(s.substring(1), p.substring(1));

Step 3: Exit of the recurrence. As the sub string shrinks, we are approaching the exit of the recurrence. We need to handle that: if string p is empty and s is empty, we return true, else we return false.

if(p.empty()) return s.empty();

Then we are done with this problem, the full code is there:

Then we need to take care of the ‘*’.  Although the rule seems a little bit confusing, however, we do not need to change too much code.

We can see clearly from the rule description, if we encounter ‘*’, basically, there are two possible outcomes:

1. We skip two characters, which is the preceding character and ‘*’, if the preceding character from p does not match the corresponding character from s, so we totally get rid of the preceding character.

2. We repeat the preceding character (in p) multiple times to test whether there is a match with several same characters in s. We can consider this progress from another perspective, we do not repeat the character in p, while we skip character in s several times. For example: s = “aa”, p = “a*”, this pair is a good match, because we can repeat a twice in p.

The last thing we need to take care of is that we only consider ‘*’ if and only if size of p is greater or equal to 2. It make sense, because you need to guarantee that there are at least two characters in p when consider ‘*’, which are the preceding character and ‘*’.

Time complexity: A little bit complex, I believe it should be something exponential for the worst case. I am not sure about the average case.

Space complexity: same as above.

So the final code is here:


Solution Two: Dynamic Programming

OK, if you have got the idea from the above explanation, then we can go to dynamic programming part, let’s start from observation about the above code:

If we want to calculate the s[0 : length], we need to know the comparative result from s[1 : length];

If we want to calculate the s[1 : length], we need to know the comparative result from s[2 : length];

If we want to calculate the s[2 : length], we need to know the comparative result from s[3 : length];

……..

If we want to calculate the s[length – 1 : length], we need to know the comparative result from s[length : length];

We can easily know the comparative result s[length : length], since it is an empty string.

If we know the result of s[length : length], we can easily know s[length – 1 : length];

If we know the result of s[length – 1 : length], we can easily know s[length – 2 : length];

If we know the result of s[length – 2 : length], we can easily know s[length – 3 : length];

…..

If we know the result of s[2 : length], we can easily know s[1 : length];

If we know the result of s[1 : length], we can easily know s[0 : length];

Then we are done!

Basically, the whole progress is like a stack, we first keep pushing calculations to the stack, then we gradually pop results from the stack. We can skip the first half part by starting our loop from the last element to the first element.

Look at the picture below, we use a 2 dimensional array dp[i][j] to represent the matching results for substring s[i : ] and p[j : ]. For example, dp[2][2] represents the matching result of the orange part in this picture.

10_2

If we look at our calculation methods in an array graph, we can see things like the picture below:

TIM截图20180929234633

Each block in this 2d array represent the calculated result from our approach, we can see that s[i : ] and p[j : ] can be determined by either of these three conditions:

1. s[i : ] and p[j+2 : ]: This is the case that the preceding character in p does not match the corresponding character in s, so we skip two the preceding character and ‘*’;

2. s[i+1 : ] and p[j]: In this case, we are doing things like repeating the preceding element in p, it’s also equivalent to skip element in s;

3. s[i+1 : ] and p[j +1 : ]: This is the base case, it’s like if we do not encounter ‘*’, we compare the next pair of elements.

So, based on the analysis above, we are going to write our code. Before we look at the final code, there is still something we need to worry about.

How large our dp matrix will be. You will notice that we define our dp array as length + 1 size. The reason we do that is because when we calculate dp[len – 1][j], we need to know dp[len][j]. It will be better if we add one extra column to store the dp[len][j] information, same thing for rows.

Time complexity: O(SP), if S is the length of string s, and P is the length of string p.

Space complexity: O(SP)

Well, can we do better? If we take a close look at the dp value, we will notice that every time when we calculate dp[i][j], we only need two columns instead of the whole array. Look at the picture below:

ArrayNeed

Can you optimize the space complexity?

Time complexity: O(SP)

Space complexity: O(2*P), which is O(P)

Code is here:

Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s