TopCoder problem "DNAsynth" used in TCI '02 Semifinals 4 (Division I Level Three)

Problem Statement


Dr. Ford of TC Laboratories has discovered a new way of synthesizing DNA sequences from shorter DNA sequences. DNA sequences are composed of nucleotides, A, G, C, and T

He's discovered a miracle catalyst that enables certain sequences of DNA to append to other certain sequences of nucleotides. A catalyzation represented by the notation "<SEQ1>:<SEQ2>" means that any sequence starting in <SEQ2> may be appended to any sequence ending in <SEQ1>.

For example, "GCT:AGG" means that any sequence starting with AGG may be appended to any sequence ending in GCT. Thus, by this rule, AGGCGACG may be appended to CATGCT to produce the sequence CATGCTAGGCGACG.

Also, since a DNA sequence is identical to its reverse, "<SEQ1>:<SEQ2>" implies "<reverse(SEQ2)>:<reverse(SEQ1)>". For example, "GCT:AGG" is the same as "GGA:TCG". Note, however, "GCT:AGG" is NOT the same as "AGG:GCT".

Given a set of possible catalyzations determine the length of the longest DNA sequence which can be constructed, starting with an unlimited supply of the sequences <SEQ1> and <SEQ2> in the list of catalyzations. If sequences of unlimited length are possible, return -1.



Method signature:int longest(String[] reactivity)
(be sure your method is public)


-Assume that Dr. Ford has an unlimited starting supply of all the strands represented by SEQ1 and SEQ2 in each element of reactivity.


-reactivity will contain between 1 and 50 elements, inclusive.
-each element of reactivity will contain between 5 and 9 characters, inclusive.
-each element of reactivity will be formatted as (quotes added for clarity) "SEQ1:SEQ2" with SEQ1 and SEQ2 both being of length between 2 and 4 characters, inclusive, and separated by a single colon (':').
-SEQ1 and SEQ2 may only contain the capital letters 'A', 'G', 'C', and 'T'.


Returns: 6
The longest possible strand is "TTAAGG" (or "GGAATT", which is the same thing backwards--these will no longer be mentioned).
Returns: 15
The longest possible strand is "TTAAGGCCCGGAATT"
Returns: -1
Returns: 8
Returns: 9
Notice that since he has "ACG" ("GCA" backwards) in supply, and it begins with "AC", it can be added to "GAT". This gives the sequence "GATACG". Adding "ATC" on the front end gives the longest possible strand, which is "CTAGATACG".
Returns: 12
The longest sequence here is CGGGTGGAAAAC. First, we form TGGA from the first rule. Since this starts with TGG, we can append it to CGGG to get CGGGTGGA, which ends with GGA. So we can append AAAC to the end and get CGGGTGGAAAAC
Returns: -1

Problem url:

Problem stats url:




alexcchan , lbackstrom , brett1479

Problem categories:

Recursion, Search, String Manipulation