A microbiologist wants us to partition a collection of strings
into disjoint groups based on common prefixes.
The desired partition is formed by the following process:
while there are at least 2 strings that have not been assigned to a group:
find the longest prefix that appears in at least 2 unassigned strings
form a new group consisting of all unassigned strings having that prefix
if there is an unassigned string, assign it to its own group
Note that the "longest prefix" in the algorithm above may have length 0.
We want to produce a listing of the strings organized by group, with a string
of '-'s following the members of each group. The '-' string should contain
one '-' for each character in the common prefix of the group. So a group
whose common prefix has length 0 should be followed by the string "". As a special
case, a group that contains just one string is considered to have a common prefix of
length 0.
List the groups with the longest common prefix first. Among groups with the
same length common prefix, list the groups alphabetically. Among strings within a group,
list the strings alphabetically.
Given a String[] protein, return a String[] that is the desired listing.
|