TopCoder problem "DNADeletion" used in SRM 435 (Division I Level Two)

Problem Statement

    In genetics, a DNA sequence is a sequence of nucleotides (A, C, G or T). Some DNA sequences translate to proteins, which are non-empty sequences of amino acids. Let's examine the DNA translation process::

  1. From left to right, split the DNA sequence into consecutive, non-overlapping triples of nucleotides. Each triple is called a codon. There may be one or two nucleotides left over at the end - those should be ignored. For example, the DNA sequence "ACCTGTACG" will produce the codon sequence "ACC", "TGT", "ACG". The DNA sequence "ACCTGTAC" will produce the codon sequence "ACC", "TGT" ("AC" is left over and ignored).
  2. You are given a codon table that maps codons to their associated amino acids. From left to right, look up each codon in the sequence generated above and replace it with its associated amino acid. Every codon in the sequence must have an associated amino acid - otherwise, the DNA sequence does not translate to a protein. For example, if "ACC" and "ACG" each map to threonin ("thr") and "TGT" maps to cysteine ("cys"), the DNA sequence "ACCTGTACG" will translate to the protein "thr cys thr".
Sometimes, after replication, one or more nucleotides in a DNA sequence go missing. This situation is called deletion. After a deletion, a DNA sequence can become any of its subsequences. For example, "ACTG" may become "ACG" or "CG".

You are given a String[] DNASequence containing a DNA sequence. Concatenate the elements of DNASequence to obtain the DNA sequence. You are also given a String[] codonTable containing the codon table. Each element is formatted "CODON AMINOACID" (quotes for clarity), where AMINOACID is the name of the amino acid associated with codon CODON. Compute the number of different possible proteins that the given DNA sequence can translate to after undergoing zero or more deletions. Since this number can be quite large, return its value modulo 1000000007. Remember that only nonempty amino acid sequences are considered proteins.


Parameters:String[], String[]
Method signature:int differentProteins(String[] DNASequence, String[] codonTable)
(be sure your method is public)


-DNASequence will contain between 1 and 50 elements, inclusive.
-Every element of DNASequence will contain between 1 and 50 characters, inclusive.
-Every element of DNASequence will contain only characters 'A', 'C', 'T', 'G'.
-codonTable will contain between 1 and 50 elements, inclusive.
-Every element of codonTable will contain a codon, followed by a single space, followed by an amino acid.
-Every codon in codonTable will contain exactly 3 characters.
-Every codon in codonTable will contain only characters 'A', 'C', 'T', 'G'.
-Every codon in codonTable will be unique.
-Every amino acid in codonTable will contain between 1 and 20 characters.
-Every amino acid in codonTable will contain only letters ('a'-'z', 'A'-'Z').


{"ACT gua", "ACG cys", "ATG leu", "CTG thr"}
Returns: 4
You can get proteins:

"gua" (deletion of 'G' or no deletion),

"cys" (deletion of 'T'),

"leu" (deletion of 'C'),

"thr" (deletion of 'A').

Other deletions do not result in proteins.
{"AAA thr", "CCC cys"}
Returns: 3
You can get proteins: "thr", "cys" and "thr cys".
{"AAA gua","TCC dop","AAT dop","CCC gua"}
Returns: 5
You can get proteins: "gua", "dop", "gua dop" (from sequence "AAATCC"), "dop gua" (from sequence "AATCCC"), "gua gua" (from sequence "AAACCC").
{"AAC RpjZt","AAT ZeiC","GCA ChZwh","TCC RpjZt","GAA I",
 "TAG ZeiC","CTG dVK","GAG ZeiC","GTG I","AAG q","ATT dVK",
 "GAC q","CCA q","GCC ZRV","GCG RpjZt","CCT ZRV","ATG dVK",
 "ATC ChZwh","CTC cJEjM","CCC q","ATA dWjz","TTG DkEG",
 "CAG q","CAA ZRV","ACT dVK","TCG dVK","ACC I","CGC dVK"}
Returns: 455985264
Be sure to concatenate the elements of DNASequence.

Problem url:

Problem stats url:




PabloGilberto , Olexiy , ivan_metelsky

Problem categories:

Dynamic Programming, Simulation