TopCoder problem "TandemRepeats" used in SRM 185 (Division I Level One , Division II Level Two)



Problem Statement

    

An important problem in DNA analysis is finding tandem repeats. A tandem repeat occurs when a base sequence of nucleotides appears twice in a row. (In this problem, a base sequence is simply a contiguous substring and a nucleotide is simply a letter.) For example, in the DNA sequence "GATCATCA", the base sequence "ATC" appears twice in a row. In contrast, "ATCGATC" is not a tandem repeat of the base sequence "ATC" because of the 'G' in between the two copies.

Real DNA often contains errors in which one nucleotide is substituted for another. A k-approximate tandem repeat occurs when a base sequence of nucleotides appears twice in a row, allowing the second copy of the base sequence to differ from the first copy in at most k nucleotides. For example, in the DNA sequence "GATCATGA", the base sequence "ATC" is immediately repeated with a single error as "ATG". Thus, "ATCATG" is a 1-approximate tandem repeat. Note that both copies of the base sequence must be the same length.

Given a DNA sequence dna (represented as a String) and a limit k on the number of errors, find the length of the longest sequence that appears in dna as the base sequence of a k-approximate tandem repeat. Note that a k-approximate tandem repeat contains at most k errors, but it is allowed to contain fewer errors.

 

Definition

    
Class:TandemRepeats
Method:maxLength
Parameters:String, int
Returns:int
Method signature:int maxLength(String dna, int k)
(be sure your method is public)
    
 

Constraints

-dna contains between 2 and 50 characters, inclusive.
-Every character in dna is a 'G', 'A', 'T', or 'C'.
-k is between 0 and 5, inclusive.
 

Examples

0)
    
"GATCATCA"
0
Returns: 3
A tandem repeat with a base sequence of length 3 is "ATCATC", with base sequence "ATC".
1)
    
"GATCATGA"
1
Returns: 3
A 1-approximate tandem repeat is "ATCATG". The base sequence "ATC" is repeated with a single error.
2)
    
"GATCATGA"
0
Returns: 0
This example has no tandem repeats.
3)
    
"AGAGAAAGAA"
3
Returns: 5
4)
    
"ATTAGCATTGCACACCTTGAGGACTTAGACAAACCTAGTACACAGGTGTA"
5
Returns: 11

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=2377

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=4745&pm=2377

Writer:

vorthys

Testers:

lbackstrom , brett1479

Problem categories:

Brute Force, String Manipulation