Problem Statement |
| In the real estate business, most purchases are made through real estate
agents, who typically have specific regions in which they work. Naturally,
different agents have different specialties, both geographically and
otherwise.
The goal of this problem is to match an interested customer to the most
appropriate agent -- the one most likely to provide a successful outcome to
the customer. To this end,
your task is to develop an algorithm which assigns a score to a number of
customer-agent pairings. You should assign higher scores to pairings which are
more likely to conclude successfully, with a property being purchased. To develop
this algorithm, you will be given a corpus of data from past pairings, along
with the outcomes for those pairings. After submitting your algorithm, you
will be scored on an independent dataset, using the AUC metric.
For this evaluation metric, the exact scores do not matter, only the ordering they impose. Ideally, the ordering will consist of all the negative instances, followed by all the positive ones.
To make these predictions, you will be given three pieces of data. The first
two will be constant, and contain information about the agents. For the
third, you will be given half the data for training, with the remaining half
held back for testing. The data is described below:
- A partitioning of agents into brokerages. More info and
download
- The coverage areas of the agents, as a list of agent-zip code pairs.
More info and download
- The customer-agent pairings, along with information about the
customer and property. More info and download
You will be given all three of the above as String[]'s with one
element representing one line from the files for download (one record). You
will also be given a String[], test representing a number
of pairings with unknown outcomes. You are to return a double[]
with elements corresponding to elements of test. These will be your
scores, upon which you will be evaluated. The elements of test will be the same as the elements of train except that they will lack the final two fields (which give the outcome). Each element will be in CSV Format.
Scoring
If you ranked the test cases perfectly, you would assign all of the successful pairings (those which would have a one in their final field) higher scores than all of the unsuccessful pairings. In this case, your score would be 1.0. If you just returned a random ordering, you would expect to score about 0.5. Another way to think about the scoring is that it is a decreasing function in the number of inversions your ordering has compared to a correct ordering. In other words, every time you rank a bad test case higher than a good test case (give the bad one a higher score) that decreases your score.
|
|
Definition |
| Class: | AgentMatching | Method: | evaluate | Parameters: | String[], String[], String[], String[] | Returns: | double[] | Method signature: | double[] evaluate(String[] agents, String[] zips, String[] train, String[] test) | (be sure your method is public) |
|
|
|
|
Notes |
- | During the provisional testing, only 10% of the full dataset will be used for scoring. The remaining 40% (50% used for training, 10% for provisional testing) will be used for final testing. |
- | As the data comes from a real website, it is noisy and may contain missing or incorrect values. |
- | The time limit for the call to your method is 5 minutes. The memory limit is 1024MB. |
- | The example test is the same as the provisional test. |
- | The training data contains 39797 matchings, the example/provisional test set contains 8051, and the final test set contains 31902. |
- | More information about the AUC metric and ROC curves can be found at Wikipedia. |
|
Examples |
0) | |
| | Returns: "Provisional Test" | |
|