# Word Break I && II

Let’s continue…

Problem Statement:

Word Break I

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

• The same word in the dictionary may be reused multiple times in the segmentation.
• You may assume the dictionary does not contain duplicate words.

Example 1:

```Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because `"leetcode"` can be segmented as `"leet code"`.```

Example 2:

```Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because `"`applepenapple`"` can be segmented as `"`apple pen apple`"`.   Note that you are allowed to reuse a dictionary word.```

Example 3:

```Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false
```

Solution One – Recursion:

This is a typical problem related to recursion. Since we want to know whether a string can be segmented as space separated words, we can solve it by iterating and adding each character to a tempString, if we can find the tempString in the dictionary, then we search for the next valid tempString. The code is pretty self-explanatory. Let’s go to the code:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 class Solution { public: bool wordBreak(string s, vector& wordDict) { if(wordDict.empty()) return false; unordered_set dict(wordDict.begin(), wordDict.end()); int len = wordDict.size(); return dfs(0, s, dict); } bool dfs(int start, string& s, unordered_set& dict){ if(start == s.size()) return 1; string subS; for(int i = start; i < s.size(); i++){ subS += s[i]; if(dict.count(subS) != 0 && dfs(i+1,s,dict)){ return 1; } } return 0; } };

Time Complexity: O(2^n) The reason is: T(n) = T(n-1) + T(n-2) + …. + T(0) and T(n+1) = T(n) + T(n-1) + … + T(0) => T(n+1) = 2 * T(n) = 2^2 * T(n-1) = … = 2^(n+1)*T(0)

Space Complexity: O(n)

Solution Two – Recursion + Memorization:

As you can see, the solution one is so slow. A typical way to improve the efficiency for recursion problem is to cache the intermediate results so we do not need to recalculate them again and again. If you draw a recursive tree, you can clearly see the redundant calculation. For more details about recursive tree, please see here. For this problem, we create a vector called memo, which has n + 1 slots (n is the number of characters in the string). The memo[i] means whether the sub string from position i to the end can be segmented with the words from word dictionary. So each time when we recursively call the function, we can check whether memo[i] has a valid value, we can return memo[i] if it has already been set instead of doing the repetitive calculation.

Let’s go to the code:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 class Solution { public: bool wordBreak(string s, vector& wordDict) { if(wordDict.empty()) return false; unordered_set dict(wordDict.begin(), wordDict.end()); int len = wordDict.size(); vector memo(s.size(), –1); return dfs(0, s, memo, dict); } //Here memo[i] record the result of substring[0…i] and substring[i+1… len] bool dfs(int start, string& s, vector& memo, unordered_set& dict){ if(start == s.size()) return 1; if(memo[start] != –1) return memo[start]; string subS; for(int i = start; i < s.size(); i++){ subS += s[i]; if(dict.count(subS) != 0 && dfs(i+1,s,memo,dict)){ return memo[start] = 1; } } return memo[start] = 0; } };

Time Complexity: O(n^2) We have to iterate a for loop for each recursive call, which results in O(n^2) complexity.

Space Complexity: O(n)

Solution Three – Dynamic Programming:

If we can solve one problem using recursion + memorization, then we can do it with dynamic programming (solve it iteratively). For this problem, the dynamic programming solution is pretty straightforward, we create an array to keep track of whether sub string from position i to the end can be segmented with the words from word dictionary. We iterate from the end to the beginning, and each time we add one character to the temp string, we then check whether the temp string is in the word dictionary. Note dp[i] has different meanings during the inner loop iteration and after the inner loop iteration. For this problem, I think recursion + memorization should a little bit more efficient, because the recursion solution (actually depth first search) will immediate return when it finds a valid solution, while dynamic programming has to finish filling all the results in the array.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 class Solution { public: bool wordBreak(string s, vector& wordDict) { if(wordDict.empty()) return false; int len_s = s.size(); vector dp(len_s + 1, 0); dp[len_s] = 1; unordered_set dict(wordDict.begin(), wordDict.end()); for(int i = len_s – 1; i>=0; i–){ string tempS = ""; for(int j = i; j < len_s; j++){ tempS += s[j]; dp[i] = dict.count(tempS); // dp[i] here means whether substring[i, j] can be segmented //When the inner loop is over, dp[i] means whether substring[i, end] can be segmented if(dp[i]!=0 && dp[j+1] !=0) break; } } return dp[0]; } };

Time Complexity: O(n^2)

Space Complexity: O(n)

Solution Four – Breadth First Search:

We can also do a breadth first search. In this approach, we maintain a queue to keep track of the index we need to visit for the next iteration and an unordered set to store the index that we have already visited so far.  As long as the queue is not empty, we continue our iteration. Since we can search the element in the set with O(1) average time complexity, we can check the visited elements with relatively cheap cost. If we find that the element has already been visited, we skip the inner for loop.

In the inner for loop, we check the sub string one by one, starting from next (which is the end index + 1 of the last valid sub string), until we find a following sub string which can be found in the word dictionary. Let’s go to the code:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 class Solution { public: bool wordBreak(string s, vector& wordDict) { if(wordDict.empty()) return false; unordered_set dict(wordDict.begin(), wordDict.end()); unordered_set visited; queue q({0}); //Queue initialization should include container while(!q.empty()){ int next = q.front(); q.pop(); if(visited.count(next)!=0) continue; visited.insert(next); string tempS = ""; for(int i = next; i < s.size(); i++){ if(dict.count(tempS += s[i]) != 0){ q.push(i+1); if(i+1 == s.size()) return true; } } } return false; } };

Time Complexity: O(n^2)

Space Complexity: O(n)

Word Break II

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.

Note:

• The same word in the dictionary may be reused multiple times in the segmentation.
• You may assume the dictionary does not contain duplicate words.

Example 1:

```Input:
s = "`catsanddog`" wordDict = `["cat", "cats", "and", "sand", "dog"]` Output: ```[
"cats and dog",
"cat sand dog"
]``````

Example 2:

```Input:
s = "pineapplepenapple"
wordDict = ["apple", "pen", "applepen", "pine", "pineapple"]
Output:
[
"pine apple pen apple",
"pineapple pen apple",
"pine applepen apple"
]
Explanation: Note that you are allowed to reuse a dictionary word.
```

Example 3:

```Input:
s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
Output:
[]```

Solution – DFS + DP:

Similar to the problem above. However, this time, we are required to store all the valid results to an array. I am going to provide a optimized solution from this post (dong wang).

In this solution, we divide the problem into two parts: the first part is that we apply dynamic programming technique to determine whether the string s can be segmented to dictionary words. If the result is false, we can simply return null array; The second part is that we use dfs to build our results if we find the s can be segmented.

The code is pretty self-explanatory, a very brilliant idea is that we keep track of the length of both shortest element and longest element in the word dictionary. So we can optimize our iteration a little bit. Each time, instead of adding one character to the temp string, we search between the gap of minimum length and maximum length because sub strings who are less than the minimum length or greater than the maximum length are guaranteed not to find in the word dictionary. With this optimization, we can skip several redundant search if not the worst case.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.