Word Break I && II

Let’s continue…

Problem Statement:

Word Break I

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".   Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

Solution One – Recursion:

This is a typical problem related to recursion. Since we want to know whether a string can be segmented as space separated words, we can solve it by iterating and adding each character to a tempString, if we can find the tempString in the dictionary, then we search for the next valid tempString. The code is pretty self-explanatory. Let’s go to the code:

Time Complexity: O(2^n) The reason is: T(n) = T(n-1) + T(n-2) + …. + T(0) and T(n+1) = T(n) + T(n-1) + … + T(0) => T(n+1) = 2 * T(n) = 2^2 * T(n-1) = … = 2^(n+1)*T(0)

Space Complexity: O(n)


Solution Two – Recursion + Memorization:

As you can see, the solution one is so slow. A typical way to improve the efficiency for recursion problem is to cache the intermediate results so we do not need to recalculate them again and again. If you draw a recursive tree, you can clearly see the redundant calculation. For more details about recursive tree, please see here. For this problem, we create a vector called memo, which has n + 1 slots (n is the number of characters in the string). The memo[i] means whether the sub string from position i to the end can be segmented with the words from word dictionary. So each time when we recursively call the function, we can check whether memo[i] has a valid value, we can return memo[i] if it has already been set instead of doing the repetitive calculation.

Let’s go to the code:

Time Complexity: O(n^2) We have to iterate a for loop for each recursive call, which results in O(n^2) complexity.

Space Complexity: O(n)


Solution Three – Dynamic Programming:

If we can solve one problem using recursion + memorization, then we can do it with dynamic programming (solve it iteratively). For this problem, the dynamic programming solution is pretty straightforward, we create an array to keep track of whether sub string from position i to the end can be segmented with the words from word dictionary. We iterate from the end to the beginning, and each time we add one character to the temp string, we then check whether the temp string is in the word dictionary. Note dp[i] has different meanings during the inner loop iteration and after the inner loop iteration. For this problem, I think recursion + memorization should a little bit more efficient, because the recursion solution (actually depth first search) will immediate return when it finds a valid solution, while dynamic programming has to finish filling all the results in the array.

Time Complexity: O(n^2)

Space Complexity: O(n)


Solution Four – Breadth First Search:

We can also do a breadth first search. In this approach, we maintain a queue to keep track of the index we need to visit for the next iteration and an unordered set to store the index that we have already visited so far.  As long as the queue is not empty, we continue our iteration. Since we can search the element in the set with O(1) average time complexity, we can check the visited elements with relatively cheap cost. If we find that the element has already been visited, we skip the inner for loop.

In the inner for loop, we check the sub string one by one, starting from next (which is the end index + 1 of the last valid sub string), until we find a following sub string which can be found in the word dictionary. Let’s go to the code:

Time Complexity: O(n^2)

Space Complexity: O(n)

 



Word Break II

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example 1:

Input:
s = "catsanddog" wordDict = ["cat", "cats", "and", "sand", "dog"] Output: [
  "cats and dog",
  "cat sand dog"
]

Example 2:

Input:
s = "pineapplepenapple"
wordDict = ["apple", "pen", "applepen", "pine", "pineapple"]
Output:
[
  "pine apple pen apple",
  "pineapple pen apple",
  "pine applepen apple"
]
Explanation: Note that you are allowed to reuse a dictionary word.

Example 3:

Input:
s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
Output:
[]

Solution – DFS + DP:

Similar to the problem above. However, this time, we are required to store all the valid results to an array. I am going to provide a optimized solution from this post (dong wang).

In this solution, we divide the problem into two parts: the first part is that we apply dynamic programming technique to determine whether the string s can be segmented to dictionary words. If the result is false, we can simply return null array; The second part is that we use dfs to build our results if we find the s can be segmented.

The code is pretty self-explanatory, a very brilliant idea is that we keep track of the length of both shortest element and longest element in the word dictionary. So we can optimize our iteration a little bit. Each time, instead of adding one character to the temp string, we search between the gap of minimum length and maximum length because sub strings who are less than the minimum length or greater than the maximum length are guaranteed not to find in the word dictionary. With this optimization, we can skip several redundant search if not the worst case.

Time Complexity: O(n^2)

Space Complexity: O(n)

That’s all for today, thanks for reading…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s