TopCoder problem "PointClustering" used in TCCC '04 Finals (Division I Level One)



Problem Statement

    Clustering algorithms are becoming increasingly important in many applications and there are dozens of different ways to cluster, each of which has its own pros and cons. In this problem, your task is to implement one of the simpler algorithms on a set of points in two-dimensional space (corresponding elements of int[]s x and y represent a single point). The algorithm starts by initializing each point as its own cluster (a cluster is just a group of points). Then, the algorithm repeatedly merges the two closest clusters, where the distance between two clusters is defined as the average distance between each pair of points, p and q, where p is a point in one cluster and q is a point in the other cluster. Hence, the distance between a cluster of 4 points and a cluster of 2 points would be the average of 8 distances. The algorithm terminates when there is only one cluster left. In most applications, we wouldn't have to worry about ties, so in this problem there will be no cases where the comparison of average distances with a relative difference of 1E-3 or less has an effect on the clustering.



Once you have successfully clustered the points, you need to return the final cluster as a String. The format of this String can be defined recursively. If you have a cluster A, with more than one element, which was formed from clusters which have String forms B and C, then the String form of A should be "[BC]" (quotes for clarity), where B represents the cluster with the lowest indexed point out of all the points in B and C (indexed in the input). If you have a cluster A that contains only one point, (xi,yi), then it should be formatted as "[(xi,yi)]", where (xi,yi) are the coordinates of the points..



For example, if x = {5,17,5} and y = {4,4,9}, we get the following distances:

p0 - p1 = 12

p0 - p2 = 5

p1 - p2 = 13

Thus, we would first merge p0 and p2. This gives us 2 clusters, and so we would merge them, as there is no other choice. This gives us a return value of "[[[(5,4)][(5,9)]][(17,4)]]".
 

Definition

    
Class:PointClustering
Method:cluster
Parameters:int[], int[]
Returns:String
Method signature:String cluster(int[] x, int[] y)
(be sure your method is public)
    
 

Constraints

-x and y will contain the same number of elements.
-x and y will each contain between 1 and 50 elements, inclusive.
-Each element of x and y will be between -1000 and 1000, inclusive.
-There will be no cases where the comparison of average distances with a relative difference of 1E-3 or less has an effect on the clustering.
 

Examples

0)
    
{5,17,5}
{4,4,9}
Returns: "[[[(5,4)][(5,9)]][(17,4)]]"
The example from the problem statement.
1)
    
{9,3,6,3,1,7,3,2,6}
{9,1,2,4,6,8,2,3,4}
Returns: 
"[[[(9,9)][(7,8)]][[[[[(3,1)][(3,2)]][[(3,4)][(2,3)]]][[(6,2)][(6,4)]]][(1,6)]]]"
This image shows all of the clusters that the algorithm creates, where each cluster is represented by an oval.

2)
    
{-10,10,-10,100}
{-10,0,-10,0}
Returns: "[[[[(-10,-10)][(-10,-10)]][(10,0)]][(100,0)]]"
3)
    
{1000,-1000,0}
{1000,-1000,1}
Returns: "[[[(1000,1000)][(0,1)]][(-1000,-1000)]]"

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=2445

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=5080&pm=2445

Writer:

lars2520

Testers:

PabloGilberto , schveiguy , vorthys

Problem categories:

Brute Force, Simulation