TopCoder problem "bloggoDocStructure" used in SRM 214 (Division I Level Two)



Problem Statement

    

Note: This problem statement includes images that may not appear if you are using a plugin. For best results, use the Arena editor.

In order to attract users of other weblog management systems, bloggo offers tools for content migration. Not only is bloggo capable of reading some of its chief competitors' back-end formats, but it can analyze the HTML that must by definition emerge at the front end of any weblog.

An HTML document consists of tags, which are angle-bracketed strings that look like "<b>" or "</li>", as well as plaintext, which is everything else. Each opening tag, such as "<h1>", has a similar closing tag, in this case "</h1>". A substring that begins with an opening tag and ends with a similar closing tag is called a span. The type of a span is decided by its first and last tags, so a span that begins with "<h1>" and ends with "</h1>" is called an "h1" span. We say that the substring found between the first and last tags of a span is enclosed by the span. Each span of an HTML document encloses zero or more plaintext characters interspersed with zero or more further spans.

Thus, the spans of an HTML document form a nested structure that is described by a unique tag tree, each node of which is labeled with the type of its corresponding span. The outermost span of an HTML document is required to be of the "html" type, so the root of a tag tree is always labeled "html". If a span encloses further spans, then its corresponding node in the tag tree has children that describe, in order, each of the immediately enclosed spans. A span that encloses no further spans corresponds to a leaf of the tag tree. For example, consider the following document.


  <html>
    <h1> Snapping Turtles </h1>
    <ul>
      <li>
        <h2> Common Snapping Turtle (<i>Chelydra serpentina</i>) </h2>
        <p> With its dark coloring, sinuous neck, and swift lunging
        motion, Chelydra serpentina is an adept hunter of nearly all
        creatures that live in or near water. A fully grown specimen,
        weighing on average 20 pounds with a 12-inch-long carapace,
        has no enemy but man. The long lifespan of the adult, reaching
        30 years or more, compensates for the low hatchling survival
        rate. </p>
        <p> The snapping turtle rarely basks, preferring
        to lurk at the silty bottom or among underwater
        vegetation. Although reclusive in its element, when on
        land it ferociously confronts larger beasts that approach
        it, for it cannot withdraw into its undersized shell. It
        is a solitary creature, declining to fraternize within
        its species. </p>
        <p> Snapping turtle populations are jeopardized by
        trapping, habitat loss, and by automobile traffic crossing
        the overland routes of nesting females. </p>
      </li>
      <li>
        <h2> Alligator Snapping Turtle (<i>Macroclemys temminckii</i>) </h2>
        <p> Like the common snapping turtle, the alligator snapping
        turtle has a massive head and a generally saurian aspect.
        Whereas the common snapping turtle's talent for hibernation
        lets it range as far north as Canada, the alligator snapping
        turtle is restricted to the warm waters flowing into the Gulf
        of Mexico. </p>
        <p> With an average shell diameter of two feet and
        a weight of 150 pounds, this is the largest freshwater
        turtle in the world. Macroclemys temminckii's short
        neck requires that it lie in wait for prey, which it
        dispatches with a sudden blow of its powerful jaws.
        A unique feature of the alligator snapping turtle is a
        tubular pink appendage, growing from the dark interior
        of its mouth, which it wriggles in wormlike fashion to
        lure fish to their doom. </p>
      </li>
    </ul>
  </html>

This document has the following tag tree.

For the purpose of content migration, it is useful to know how the tag trees of a pair of HTML documents relate to each other. One interesting relationship is that of the intree, which is defined recursively. First of all, every tree is an intree of itself. In addition, if A is an intree of B and we remove any leaf from A to obtain a smaller tree A', then A' is also an intree of B. If A is an intree of B, then B is called an outtree of A. If A is both an intree and an outtree of B, then A and B are equivalent. If A is neither an intree nor an outtree of B, then A and B are incompatible.

Given a pair of String[]s, docA and docB, each containing an HTML document, you are to determine how the tag tree of docA relates to the tag tree of docB. The elements of each String[] are concatenated to make up the respective document. The only tags permitted in these documents are the opening tags "<html>", "<h1>", "<h2>", "<h3>", "<ul>", "<ol>", "<li>", "<p>", "<b>", "<i>", "<u>", and the similar closing tags. Tag names are case-sensitive and may be broken over several lines, but must not otherwise be altered. The outermost tags of a document are of "html" type and may be padded with spaces on either side. Plaintext may only contain alphanumeric characters, space characters, and the punctuation symbols ',', ';', '.', '!', '?', '-', '(', and ')'.

If the two documents are equivalent or incompatible, return the String "equivalent" or "incompatible", respectively. If docA is an intree of docB, return "intree X", where X is the number of nodes in docB that are not present in docA . If docA is an outtree of docB, return "outtree X", where X is the number of nodes in docA that are not present in docB.

 

Definition

    
Class:bloggoDocStructure
Method:compare
Parameters:String[], String[]
Returns:String
Method signature:String compare(String[] docA, String[] docB)
(be sure your method is public)
    
 

Constraints

-docA and docB each contain between 1 and 50 elements, inclusive
-each element of docA and docB is between 1 and 50 characters long, inclusive
-the only characters allowed in docA and docB are '<', '>', '/', 'a' to 'z', 'A' to 'Z', '0' to '9', the space character, ',', ';', '.', '!', '?', '-', '(', and ')'
-if the elements of docA or of docB are concatenated into a single string, the result is an HTML document as defined above
 

Examples

0)
    
{"<html> <h1>Snapping Turtles</h1> <ul> <li> <h2>",
 "Common Snapping Turtle (<i>Chelydra serpentina</i>",
 ") </h2> <p> With its dark coloring, sinuous neck,",
 "and swift lunging motion, Chelydra serpentina is a", 
 "n adept hunter. </p><p> The snapping turtle rarely",
 "basks. </p><p> Snapping turtle populations are jeo",
 "pardized by automobile traffic.   </",
 "p>    </li> <li> <h2> Alligator Snapping Turtle (", 
 "<i>Macroclemys temminckii</i>) </h2> <p> Like the",
 " common snapping turtle, the alligator snapping ",
 "turtle has a massive head.</p><p>This is the lar",
 "gest freshwater turtle. A tubular pink append",
 "age grows from its mouth.</p>   <", 
 "/li> </ul> </html>"}
{" <html> turtles <h1> snapping </h1> <ul> <li> <h2",
 "> common <i> chelydra serpentina </i> </h2> <p>",
 "hunter </p> (adept?) <p> rarely basks </p> (hmm)",
 "<p> jeopardized by traffic </p></li>",
 "<li> often confused with... <h2> alligator snapp",
 "ing turtle <i>macroclemys temminckii</i> </h2>",
 "<p> massive head </p> big! <p>",
 "largest freshwater turtle. pink wormlike thing <", 
 "/p></li></ul></html>  "}
Returns: "equivalent"
These two documents have the same structure as the example shown above.
1)
    
{" <html> turtles <h1> snapping </h1> <ul> <li> <h2",
 "> common <i> chelydra serpentina </i> </h2> <p>",
 "hunter </p> (adept?) <p> rarely basks </p> (hmm)",
 "<p> jeopardized by traffic </p></li>",
 "<li> often confused with... <h2> alligator snapp",
 "ing turtle <i>macroclemys temminckii</i> </h2>",
 "<p> massive head </p> big! <p>",
 "largest freshwater turtle. pink wormlike thing <", 
 "/p></li></ul></html>  "}
{"<html><h1></h1><ul><li><h2><i></i></h2><p></p><p>",
 "</p><p></p></li><li><h2><i></i></h2><p></p><p></p",
 "></li></ul></html>"}
Returns: "equivalent"
The docB here is a minimal document with the same structure.
2)
    
{"<html><h1></h1><ul><li><h2><i></i></h2><p></p><p>",
 "</p><p></p></li><li><h2><i></i></h2><p></p><p></p",
 "></li></ul></html>"}
{" <html> snapping turtles <ul> <li> common ",
 "snapping turtle, chelydra serpentina <p>",
 "hunter </p> <p> rarely basks </p> ",
 "<p> jeopardized by traffic </p></li>",
 "<li> often confused with... <h2> alligator snapp",
 "ing turtle <i>macroclemys temminckii</i> </h2>",
 "<p> massive head; largest freshwater turtle;",
 "pink wormlike appendage lures fish </p>",
 "</li></ul></html>  "}
Returns: "outtree 4"
The tag tree for docB is the following.

It is evident from comparison with the earlier illustration that docA has four nodes not present in docB.
3)
    
{"<html><h1></h1><ul><li><h2><i></i></h2><p></p><p>",
 "</p><p></p></li><li><h2><i></i></h2><p></p><p></p",
 "></li></ul></html>"}
{" <html> turtles <h1> snapping </h1> <ul> <li> <h2",
 "> common <i> chelydra serpentina </i> </h2> <p>",
 "hunter </p> <p> rarely basks <h3>",
 "<i>almost</i> <b>never</b> </h3> </p>",
 "<p> jeopardized by traffic </p></li>",
 "<li> often confused with... <h2> alligator snapp",
 "ing turtle <i>macroclemys temminckii</i> </h2>",
 "<p> massive head </p> big! <p>",
 "largest freshwater turtle. </p> <p> <b>pink</b>",
 " <b>wormlike</b> lure in mouth </p> <p> imposing",
 "sight on land or water </p> </li></ul></html>  "}
Returns: "intree 7"
We now have the following tag tree for docB.

Here, docB has seven nodes not found in docA.
								
4)
    
{"<html><h1></h1><ul><li><h2><i></i></h2><p></p><p>",
 "</p><p></p></li><li><h2><i></i></h2><p></p><p></p",
 "></li></ul></html>"}
{"<html><ul><li><h2><i></i></h2><p></p><p>",
 "</p><p></p></li><li><h2><i></i></h2><p></p><p></p",
 "></li></ul><h1></h1></html>"}
Returns: "incompatible"
5)
    
{"<html></html>"}
{"<html><html><html></html><html></html><ul>",
 "</ul><ol></ol></html></html>"}
Returns: "intree 5"
6)
    
{"<html><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><b></b></html>"}
{"<html><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p><p></p>",
 "<p></p><p></p><p></p><p></p><p></p><p></p></html>"}
Returns: "incompatible"

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=3025

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=5860&pm=3025

Writer:

Eeyore

Testers:

PabloGilberto , lbackstrom , brett1479

Problem categories:

Recursion, String Parsing