Consider a family of compression algorithms that exploit repeated substrings
in the text to be compressed. The basic idea is to replace selected substrings with
references of the form "&startPos-endPos", where the substring
to be replaced is identical to the substring of the original text that begins at position
startPos and ends at position endPos, inclusive. For
example, the text "ABCDEFG ABCDEFG" might be compressed to
"ABCDEFG &0-6". Notice that positions are zero-based and also that positions are
given relative to the original text, not the compressed text. The original text is assumed
to contain only uppercase letters ('A'-'Z') and spaces (' ').
Compression algorithms in this family work as follows:
1. copy the original text into W
2. while not done do
A. choose some substring S of length > 1 that occurs both in W and in the original text
B. find the starting position startPos and ending position endPos
of some occurrence of S in the original text
C. replace some occurrence of S in W with "&startPos-endPos"
3. output W
Different algorithms in this family might decide when to finish or how to choose S
differently, but they all follow this basic outline. Your task is to write a decompression method
that will decompress text compressed by any member of this family. In other words, given the output
of a compression algorithm in this family, you are to reconstruct and return the original text.
The input will be a String compressed containing only uppercase letters, spaces,
and references of the form "&startPos-endPos", where startPos and
endPos are non-negative integers written without extraneous leading zeros, and startPos
< endPos. The original text is guaranteed to contain no more than 256 characters.
Notice that if the compression algorithm makes unwise choices, the decompression algorithm may be unable to
reconstruct some of the characters in the original text. In such cases, the decompression algorithm should
return a '?' in each position for which the character cannot be determined.
For example, consider the compressed text
"AB&7-9&2-6". We know that the original text contains 10 characters, and that the first two
characters of the original text are 'A' and 'B', but we cannot tell what the remaining
characters are. Therefore, your method should return "AB????????".
|