Let me continue….
Given an input string (
s) and a pattern (
p), implement regular expression matching with support for
'.' Matches any single character. '*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
scould be empty and contains only lowercase letters
pcould be empty and contains only lowercase letters
a-z, and characters like
Input: s = "aa" p = "a" Output: false Explanation: "a" does not match the entire string "aa".
Input: s = "aa" p = "a*" Output: true Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
Input: s = "ab" p = ".*" Output: true Explanation: ".*" means "zero or more (*) of any character (.)".
Input: s = "aab" p = "c*a*b" Output: true Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".
Input: s = "mississippi" p = "mis*is*p*." Output: false
Solution One: Recursion
To make things simple, the only rule we need to worry about is ‘.’ and ‘*’. The corresponding meaning for ‘.’ is obvious, which means that that dot corresponds to every character. However, the ‘*’ is more complex, it means how many times the preceding character repeats (note the preceding character can repeat 0 time, which means get rid of the preceding character).
First thing first, if we do not consider ‘*’ character, everything will be simple. We just need to compare each character respectively. Let’s write our recursive code to provide a solution for this (without ‘*’):
Step 1: Let’s assume we have a function which could tell us whether the two string s and p will correspond to each other:
bool isMatch(string s, string p);
Step 2: If we want s and p to match, which means the first element of the s and p should match, and the rest of the two strings should match as well. We can use isMatch function to calculate the rest of the substring:
(s == p || p == ‘.’ )&& isMatch(s.substring(1), p.substring(1));
Step 3: Exit of the recurrence. As the sub string shrinks, we are approaching the exit of the recurrence. We need to handle that: if string p is empty and s is empty, we return true, else we return false.
if(p.empty()) return s.empty();
Then we are done with this problem, the full code is there:
Then we need to take care of the ‘*’. Although the rule seems a little bit confusing, however, we do not need to change too much code.
We can see clearly from the rule description, if we encounter ‘*’, basically, there are two possible outcomes:
1. We skip two characters, which is the preceding character and ‘*’, if the preceding character from p does not match the corresponding character from s, so we totally get rid of the preceding character.
2. We repeat the preceding character (in p) multiple times to test whether there is a match with several same characters in s. We can consider this progress from another perspective, we do not repeat the character in p, while we skip character in s several times. For example: s = “aa”, p = “a*”, this pair is a good match, because we can repeat a twice in p.
The last thing we need to take care of is that we only consider ‘*’ if and only if size of p is greater or equal to 2. It make sense, because you need to guarantee that there are at least two characters in p when consider ‘*’, which are the preceding character and ‘*’.
Time complexity: A little bit complex, I believe it should be something exponential for the worst case. I am not sure about the average case.
Space complexity: same as above.
So the final code is here:
Solution Two: Dynamic Programming
OK, if you have got the idea from the above explanation, then we can go to dynamic programming part, let’s start from observation about the above code:
If we want to calculate the s[0 : length], we need to know the comparative result from s[1 : length];
If we want to calculate the s[1 : length], we need to know the comparative result from s[2 : length];
If we want to calculate the s[2 : length], we need to know the comparative result from s[3 : length];
If we want to calculate the s[length – 1 : length], we need to know the comparative result from s[length : length];
We can easily know the comparative result s[length : length], since it is an empty string.
If we know the result of s[length : length], we can easily know s[length – 1 : length];
If we know the result of s[length – 1 : length], we can easily know s[length – 2 : length];
If we know the result of s[length – 2 : length], we can easily know s[length – 3 : length];
If we know the result of s[2 : length], we can easily know s[1 : length];
If we know the result of s[1 : length], we can easily know s[0 : length];
Then we are done!
Basically, the whole progress is like a stack, we first keep pushing calculations to the stack, then we gradually pop results from the stack. We can skip the first half part by starting our loop from the last element to the first element.
Look at the picture below, we use a 2 dimensional array dp[i][j] to represent the matching results for substring s[i : ] and p[j : ]. For example, dp represents the matching result of the orange part in this picture.
If we look at our calculation methods in an array graph, we can see things like the picture below:
Each block in this 2d array represent the calculated result from our approach, we can see that s[i : ] and p[j : ] can be determined by either of these three conditions:
1. s[i : ] and p[j+2 : ]: This is the case that the preceding character in p does not match the corresponding character in s, so we skip two the preceding character and ‘*’;
2. s[i+1 : ] and p[j]: In this case, we are doing things like repeating the preceding element in p, it’s also equivalent to skip element in s;
3. s[i+1 : ] and p[j +1 : ]: This is the base case, it’s like if we do not encounter ‘*’, we compare the next pair of elements.
So, based on the analysis above, we are going to write our code. Before we look at the final code, there is still something we need to worry about.
How large our dp matrix will be. You will notice that we define our dp array as length + 1 size. The reason we do that is because when we calculate dp[len – 1][j], we need to know dp[len][j]. It will be better if we add one extra column to store the dp[len][j] information, same thing for rows.
Time complexity: O(SP), if S is the length of string s, and P is the length of string p.
Space complexity: O(SP)
Well, can we do better? If we take a close look at the dp value, we will notice that every time when we calculate dp[i][j], we only need two columns instead of the whole array. Look at the picture below:
Can you optimize the space complexity?
Time complexity: O(SP)
Space complexity: O(2*P), which is O(P)
Code is here:
Thanks for reading.