TopCoder problem "ExperimentalAnalyzer" used in SRM 226 (Division II Level Two)



Problem Statement

    

You have collected data from a research study. Each experiment in this study has its own set of values for the relevant variables and an outcome of either 0 or 1. You wish to analyze the data to determine which variables can independently predict the outcome using a simple threshold. In this problem, a variable v is an independent predictor if there exists a threshold t such that every experiment with v less than or equal to t has the same outcome, every experiment with v greater than t has the other outcome, and both outcomes are each attained by at least one experiment (otherwise, the research study should probably be redesigned).

Write a class ExperimentalAnalyzer with a method getPredictors that takes a String[] data containing the outcome and variable values for each experiment and returns a int[] containing the one-based indices of all the independent predictors, sorted in increasing order. Each element of data corresponds to one experiment and will contain its outcome (either 0 or 1) followed by its list of variable values (which are non-negative integers). The value list for each experiment will be arranged in the same order, with the value for the first variable immediately following the outcome.

 

Definition

    
Class:ExperimentalAnalyzer
Method:getPredictors
Parameters:String[]
Returns:int[]
Method signature:int[] getPredictors(String[] data)
(be sure your method is public)
    
 

Notes

-If all the experiments have identical outcomes, there can be no independent predictors.
 

Constraints

-data will contain between 2 and 50 elements, inclusive.
-Each element of data will contain between 3 and 50 characters, inclusive.
-Each element of data will contain an outcome (0 or 1) followed by one or more variable values (integers between 0 and 2147483647 inclusive, with no extra leading zeroes), all separated by single spaces.
-Each element of data will contain the same number of variable values.
 

Examples

0)
    
{
"0 10 20 20 0",
"1 20 30 17 98765",
"0 10 30 29 1234567",
"1 20 40 10 42"}
Returns: { 1,  3 }

There are two experiments with outcome 0 and two with outcome 1. Variable 1 (the first column after the outcome) is an independent predictor because the experiments with outcome 0 have values of 10 or less, while experiments with outcome 1 have values greater than 10. Variable 2 is not an independent predictor because two experiments with the same value have different outcomes. Variable 3 is an independent predictor, but unlike Variable 1, the experiments with outcome 1 have the smaller values. Variable 4 is not an independent predictor since there is no threshold that can separate the values for the different outcomes.

1)
    
{
"1 220 212 247 764 928 956 946 66 640 983 125 994",
"0 816 835 98 81 783 267 946 584 309 757 876 670"}
Returns: { 1,  2,  3,  4,  5,  6,  8,  9,  10,  11,  12 }

With only two experiments with distinct outcomes, a variable will be an independent predictor unless it has the same value in both experiments.

2)
    
{
"0 1944914038 1696137778 1525367830",
"0 1547932733 1185820653 1500052399",
"0 230149443 1358715189 501418065",
"0 1676118083 1499656529 2103271593",
"0 1441540020 1189300515 1544659186"}
Returns: { }

All the experiments have outcome 0, so there are no independent predictors.

3)
    
{
"0 163869663 388719849 383049741",
"1 1982032201 1346175990 1500891700",
"0 436834674 559375803 994453722",
"0 652316051 372955428 361692727",
"1 1946362869 1204080206 2066121600",
"0 840867095 22073435 1166658385",
"1 1864235269 2041251772 1847305529",
"0 852306016 447986701 407997336",
"1 1183214776 728141214 1985649244",
"0 70064437 7110416 107908753",
"1 1383409284 1328770197 1942831571",
"1 1023334064 1596272317 1226876467",
"1 2068895243 1481323649 1955807390",
"0 758836687 541737411 312747384",
"1 1432983907 1475284843 1512945413",
"1 1512506825 1933755150 2041997368",
"1 2041700103 1859742986 1995865005",
"0 323696628 328891715 893352493",
"1 1623710967 1193592990 1871502957",
"0 9241593 127579695 36337622",
"1 1935123182 1332735215 1400991717",
"1 2130762600 1874898210 1234793873",
"0 411777048 142083649 61450530",
"1 1042383468 1979605937 1276643901",
"0 219279208 371281702 65383690",
"0 948808405 243728462 984221323",
"0 999393888 131231007 1186873391",
"1 1294115986 1618148416 1324126407",
"1 1710811842 2024808989 1696767048",
"1 1399327255 1317859960 1427366434",
"0 145887863 452552798 165691442",
"0 685701683 334764463 1001631935",
"1 1596433536 1911594193 1533322508",
"0 128647261 372955025 1051296077",
"0 946946329 548039713 83591687",
"1 1490995704 1541151932 2052868342",
"0 805037508 405134691 593191395",
"1 2048614262 1171142414 1309623012",
"1 1024870244 1791074791 1778846631",
"1 1403452711 2033486235 1555085078",
"0 900806815 269954427 806033528",
"1 1598931622 938905156 1474311731",
"1 1468418323 749319445 2060324871",
"0 407246582 393808982 163347811",
"0 52629967 643176802 619367349"}
Returns: { 1,  2,  3 }

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=3493

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=6515&pm=3493

Writer:

the_one_smiley

Testers:

PabloGilberto , lbackstrom , brett1479

Problem categories:

Simple Math, String Parsing