TopCoder problem "TextCompressor" used in SRM 222 (Division II Level One)



Problem Statement

    

Your company is working on writing a piece of software to compress a text document. As part of the software development team, you have been asked to write a function that will find the longest repeated sub-string within a piece of text, such that the two chosen occurrences of the sub-string do not overlap. Your software is case sensitive, so two sub-strings are not the same if their capitalization differs. You have been instructed that, in a case where more than one such sub-string is the longest, the one that occurs earliest in the source string should be chosen.

For instance, given the string "ABCDABCFG", the longest repeated sub-string is "ABC". In the string "ABABA", both "AB" and "BA" are repeated substrings, however, we choose "AB", since it occurs earlier. (Even though "ABA" appears twice as a sub-string, the two occurrences overlaps, so that can not be used.)

You are given a String sourceText. You are to return a String that is the longest repeated sub-string. If more than one substring of equal maximum length is repeated, return the one that occurs first in the string. If no sub-string repeats itself, return "".

 

Definition

    
Class:TextCompressor
Method:longestRepeat
Parameters:String
Returns:String
Method signature:String longestRepeat(String sourceText)
(be sure your method is public)
    
 

Constraints

-sourceText will contain between 1 and 50 characters, inclusive.
-Each character of sourceText will be 'A'-'Z', 'a'-'z', '0'-'9' or ' '.
 

Examples

0)
    
"ABCDABCFG"
Returns: "ABC"
The first example from the problem statement.
1)
    
"ABABA"
Returns: "AB"
The second example from the problem statement. Notice in particular that the two occurrences cannot overlap. Also, notice that "BA" is repeated as well, but "AB" occurs earlier in the string.
2)
    
"This is a test."
Returns: "is "
Notice here that the longest repeated substring is not restricted to just being a complete word, or part of a word, and may include spaces.
3)
    
"Testing testing 1 2 3."
Returns: "esting "
Notice here that although the word "testing" appears twice, the capitalization is different, so it's not actually a repeated substring.
4)
    
"The quick brown fox jumps over the lazy dog."
Returns: "he "

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=3442

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=5868&pm=3442

Writer:

timmac

Testers:

PabloGilberto , lbackstrom , brett1479

Problem categories:

Encryption/Compression, String Manipulation