TopCoder problem "Zebras" used in ()



Problem Statement

    We are interested in the genetics of a herd of zebras. For the purposes of this problem, a herd consists of exactly 100 zebras. Each zebra has an age between 0 and 9, inclusive, and it turns out that there are always exactly 10 zebras of each age. Every year, 10 random pairs of one male and one female zebra produce one offspring. Young zebras (2 year olds and younger) are not eligible for this pairing. Each offspring zebra is equally likely to be male or female. This herd of zebras have been doing this for many years now. Your task is to determine how related each pair of zebras are, given their DNA sequences.



Each zebra has two independent copies of each of five chromosomes. Each copy of each chromosome consist of roughly 1.6 million bases, which can each be thought of as an integer in [0,3], and thus a zebra's entire DNA is roughly 16 million bases. When two zebras produce an offspring, for each chromosome the offspring gets one copy at random from each parent. Thus, if we call the copies of chromosome 1 from zebra one A and B, and we call the copies of chromosome 1 from zebra two C and D, then the offspring of zebras one and two will end up with AC, AD, BC, or BD, each with equal probability.



Whenever a new offspring is produced some mutation occurs to that offspring's DNA. We can think of this mutation as a two step process. First, each base in the DNA is mutated to a random base with probability p_mutation. After this, some random sections are deleted or inserted. Each base has a probability p_change of being the start of an insertion or deletion. When an insertion or deletion occurs, it's length is exponentially distributed with lambda = p_return. An insertion inserts random bases, while a deletion simply deletes the bases. Insertions and deletions are considered one base at a time, and a base may not be the start of an insertion or deletion if it was just inserted or already deleted.



For each pair of zebras, you need to compute that pair's relatedness. In defining the relatedness of two zebras, A and B, we will first define a valid path from A to B as one that starts at A, goes up the ancestry tree to some point (first to A's parents, then to one of A's grandparents, and so on) and then goes down the ancestry tree to B. For instance, a path from A to A's mother to A's maternal grandfather and then down to an offspring of A's maternal grandfather would be valid. A valid path may not contain a zebra more than once, so a path from A to A's mother, to A's maternal grandfather and back down to A's mother is invalid. To define the relatedness between A and B we will count the number of valid paths between A and B with at most 6 hops. The relatedness will be the sum of the number of length 6 paths, plus 2 times the number of length 5 paths, plus 4 times the number of length 4 paths, and so on up to 64 times the number of length 0 paths.

Implementation Details

First, a method init will be called giving you the mutation parameters. A number of calls (100) will then be made to zebra giving you the DNA sequences of each of the 100 zebras. A int[] will give you the DNA sequence, where each integer represents 16 bases, each using two bits. Thus, bits 0 and 1 (the low order ones) of element 0 of the input give the very first base of the first chromosome, while bits 30 and 31 of the last element give the last base of the last chromosome. A int[] chromosomes will tell you the starting indices (indexing into the int[]) of the 10 chromosomes. Chromosomes 0 and 1 will be a matched pair, as will 2 and 3 and so on. An int age gives you the age of the zebra, while index specifies the zebra (the indices will be ordered from 0 to 99 in the same order as the calls). Your returns from init and zebra will be ignored (return any int). From relatedness you should return a String[] with 100 elements. Each element should have 100 floating point numbers in it, separated by spaces. The ith number in element j of your return should be your guess at the relatedness of zebras i and j.

Simulation Details

At the beginning of the simulation, all zebras will be identical with exactly 1.6 million bases (100,000 integers) per chromosome; each base will be randomly generated and given to each of the 100 zebras. Note that this does not mean that each zebra has two identical chromosomes for each pair. The simulation will then be run for 200 years, and you will be given the results.

Scoring

For each test case, your squared error will be computed as the total squared difference between each of the numbers in your return and the correct values. This will be your score for an individual test case. The overall scoring will be relative, with your score from a particular test case being BEST/YOU where BEST is the lowest error on that test case, and YOU is your score.

Tools

 

Definition

    
Class:Zebras
Method:init
Parameters:double, double, double
Returns:int
Method signature:int init(double p_mut, double p_change, double p_ret)
 
Method:zebra
Parameters:int[], int[], int, char, int
Returns:int
Method signature:int zebra(int[] bases, int[] chromosomes, int age, char gender, int index)
 
Method:relatedness
Parameters:
Returns:String[]
Method signature:String[] relatedness()
(be sure your methods are public)
    
 

Notes

-A mutation does not necessarily change a base, it just causes a base to be randomly selected.
-Zebras stop reproducing after their ninth year (and only the 0-9 year olds are part of the input to your program).
-This problem is fictional. Any similarity between it and true zebras is purely coincidental.
-After each mutation, all trailing base 0's are removed from each chromosome, and then each chromosome is padded at the end with base 0's so that its length is divisible by 16.
-Note that each zebra is related to itself with relatedness 64.
-You may have threads working while you are receiving data. This will give you significantly more total processing time (on the order of a minute). If you spawn too many threads and cause data transmission to slow excessively, you will receive a 0. More specifically, if the total time exceeds 130 seconds, you'll fail.
-For performance reasons, the mutation process uses a continuous approximation when determining which bases to mutate and where to insert and delete. The difference is inconsequential.
-There are 40 provisional tests.
 

Constraints

-p_change will be between 1E-7 and 1E-9
-p_mut will be between 1E-5 and 1E-6
-p_ret will be between 1E-2 and 1E-4
-The memory limit is 1024M
-The time limit is 30 seconds
-The thread limit is 32 threads
 

Examples

0)
    
"1"
Returns: 
"seed = 1<br/>
p_mut = 4.593229603372773E-6<br/>
p_change = 5.93149224206196E-8<br/>
p_ret = 0.00340177671875254<br/>
"
1)
    
"2"
Returns: 
"seed = 2<br/>
p_mut = 1.2305219227908448E-6<br/>
p_change = 4.4595125599329114E-8<br/>
p_ret = 0.005575076850947532<br/>
"
2)
    
"3"
Returns: 
"seed = 3<br/>
p_mut = 2.0771485617064452E-6<br/>
p_change = 8.188043961958693E-8<br/>
p_ret = 0.005531753367993433<br/>
"
3)
    
"4"
Returns: 
"seed = 4<br/>
p_mut = 2.564704548673069E-6<br/>
p_change = 8.728608328557298E-8<br/>
p_ret = 0.005897852531967956<br/>
"
4)
    
"5"
Returns: 
"seed = 5<br/>
p_mut = 1.4373876861986961E-6<br/>
p_change = 4.389987797713309E-8<br/>
p_ret = 0.007991144485645106<br/>
"
5)
    
"6"
Returns: 
"seed = 6<br/>
p_mut = 6.572635643776048E-6<br/>
p_change = 9.838017443255778E-8<br/>
p_ret = 0.0034035382849887563<br/>
"
6)
    
"7"
Returns: 
"seed = 7<br/>
p_mut = 3.7525074975768198E-6<br/>
p_change = 3.695995322172842E-8<br/>
p_ret = 0.005597827473284887<br/>
"
7)
    
"8"
Returns: 
"seed = 8<br/>
p_mut = 5.554124266699317E-6<br/>
p_change = 4.1627087442767915E-8<br/>
p_ret = 0.0021637423061094236<br/>
"
8)
    
"9"
Returns: 
"seed = 9<br/>
p_mut = 8.487989527259487E-6<br/>
p_change = 6.796809871623912E-8<br/>
p_ret = 0.008900028418135551<br/>
"
9)
    
"10"
Returns: 
"seed = 10<br/>
p_mut = 1.192643604024191E-6<br/>
p_change = 2.8006363485802505E-8<br/>
p_ret = 0.00854322260126697<br/>
"

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=9714

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=12167&pm=9714

Writer:

Testers:

Problem categories: