An important problem in DNA analysis is finding tandem repeats. A tandem repeat occurs
when a base sequence of nucleotides appears twice in a row. (In this problem, a base sequence is
simply a contiguous substring and a nucleotide is simply a letter.) For example, in the DNA
sequence "GATCATCA", the base sequence "ATC" appears twice in a row. In contrast,
"ATCGATC" is not a tandem repeat of the base sequence "ATC" because of the 'G' in between
the two copies.
Real DNA often contains errors in which one nucleotide is substituted for another. A k-approximate tandem
repeat occurs when a base sequence of nucleotides appears twice in a row, allowing
the second copy of the base sequence to differ from the first copy in at most k nucleotides.
For example, in the DNA sequence "GATCATGA", the base sequence "ATC" is immediately repeated with
a single error as "ATG". Thus, "ATCATG" is a 1-approximate tandem repeat. Note
that both copies of the base sequence must be the same length.
Given a DNA sequence dna (represented as a String) and a limit k on the
number of errors, find the length of the longest sequence that appears in dna as the base sequence
of a k-approximate tandem repeat. Note that a k-approximate tandem repeat contains at most k
errors, but it is allowed to contain fewer errors.
|