TopCoder problem "CompressedStringSearch" used in TCO06 Semi 3 (Division I Level Three)

Problem Statement


One form of data compression is to use a dictionary of symbols and their expansions, which can contain other dictionary symbols. Thus if we have the dictionary


then an occurrence of c119 in the original text could be represented by the string b instead (which would require that b not occur in the original text). This substitution by dictionary entries can repeat, and uncompression is just the reverse of compression, looking up each character to see if it occurs in the dictionary and if so expanding it. For instance, the compressed text aAa uses a dictionary entry for A, and so expands to abCca, and then to ac119C123a, then to a123119C123a, at which point expansion would stop because no remaining symbols have corresponding dictionary entries. (Note that we are not restricting dictionary symbols to be upper case or letters; any character a-z A-Z 0-9 and _ can be a dictionary symbol.)

Given a dictionary dictionary, a compressed string haystack to search, and a literal string (not compressed) needle to search for, return the earliest 0-based position of the literal string in the expanded, uncompressed original string. Return -1 if the literal string does not occur in the uncompressed string. The dictionary is formatted as "X=abcde", where X is the compressed representation of the string "abcde" (which itself may be compressed using other dictionary entries). For instance, if the compressed string is abCca, with the dictionary above, and the string to search for is 9C1, then the result is 6.



Parameters:String[], String, String
Method signature:long indexOf(String[] dictionary, String haystack, String needle)
(be sure your method is public)


-Dictionary will contain between 0 and 50 elements, inclusive.
-Each element of dictionary will contain between 2 and 50 characters, inclusive.
-Strings in dictionary are composed of the characters a-z, A-Z, 0-9, _, and =.
-Each element of dictionary will have a second character of '=', and no other character of '='.
-The dictionary must not be recursive; every symbol should resolve to some finite string.
-Each element of dictionary will have a distinct first character.
-Haystack for will have between 1 and 50 characters, inclusive.
-Haystack will be composed of the characters a-z, A-Z, 0-9, and _.
-Needle will have between 1 and 50 characters, inclusive.
-Needle will be composed of the characters a-z, A-Z, 0-9, and _.
-Needle will not contain any characters that occur in the first position in any string in dictionary.
-The length of a fully uncompressed dictionary entry will be less than 2^50.


Returns: 6
The string "abCca" expands to "ac119C123a" which expands to "a123119C123a". At this point no more characters exist in the string that have corresponding entries in the dictionary, so we have the full uncompressed string. The first occurrence of "9C1" is at position 6.
Returns: 8
Returns: -1
Returns: 7625597484986
Returns: 181
A compressed nursery rhyme.

Problem url:

Problem stats url:




PabloGilberto , lbackstrom , brett1479 , vorthys , Olexiy

Problem categories:

Encryption/Compression, Search