TopCoder problem "QueryFilter" used in TCHS08 Round 2 (Division I Level One)



Problem Statement

    

You are working on the query preprocessor for your search engine. When a user submits a search query, the preprocessor must do the following:

  1. Remove all commonly used words. These words do not increase search quality.
  2. Remove all duplicate words. Each distinct word in the query must appear only once.
  3. Sort the remaining words in alphabetical order.

You are given a String query containing a single space separated list of words. You are also given a String[] common, each element of which is a commonly used word. Preprocess the query using the process described above and return the resulting list of words as a String[].

 

Definition

    
Class:QueryFilter
Method:preprocess
Parameters:String, String[]
Returns:String[]
Method signature:String[] preprocess(String query, String[] common)
(be sure your method is public)
    
 

Constraints

-query will contain between 1 and 50 characters, inclusive.
-query will contain only lowercase letters ('a'-'z') and spaces (' ').
-query will contain no leading, trailing, or consecutive spaces.
-common will contain between 1 and 50 elements, inclusive.
-Each element of common will contain between 1 and 50 characters, inclusive.
-Each element of common will contain only lowercase letters ('a'-'z').
-All elements of common will be distinct.
 

Examples

0)
    
"an easy test"
{"a", "an", "the"}
Returns: {"easy", "test" }
Here "an" is removed as a commonly used word.
1)
    
"money money money"
{"a", "an", "the"}
Returns: {"money" }
Two occurences of "money" must be removed as duplicate words.
2)
    
"some really cool stuff that i forgot where to look"
{"i", "the", "to", "a", "an", "that"}
Returns: {"cool", "forgot", "look", "really", "some", "stuff", "where" }
Don't forget to sort the words!
3)
    
"aaaaaaaaaaaaaaaa"
{"a"}
Returns: {"aaaaaaaaaaaaaaaa" }

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=8554

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=11151&pm=8554

Writer:

andrewzta

Testers:

PabloGilberto , Olexiy , marek.cygan , ivan_metelsky

Problem categories:

Brute Force, String Manipulation