TopCoder problem "BadXML" used in TCO04 Semifinal 2 (Division I Level One)



Problem Statement

    

XML documents are widely used today to describe many different kinds of data. The primary purpose of XML is to allow people to share textual and numerical data in a structured format across the Internet. Note that the XML document format described in this problem is a simplified version of the actual XML format.

An XML document contains tags and plain text. Tags are used to define blocks within the XML document. A block always begins with a start-tag and ends with a corresponding end-tag. The format of these tags are <tag-name> and </tag-name>, respectively. The tag-name for an end-tag is the same (including case) as the start-tag for the block it closes. All plain text data must be inside at least one block. The plain text will not contain the characters '<', '>' and '/'.

Blocks may be nested, but cannot overlap. So for instance, "<root><data>Hello world</data></root>" and "<root>Hello</root><root>world</root>" are valid XML documents, while "<root><data>Hello world</root></data>" and "<root>Hello</root> <root>world</root>" are not; the first one has overlapping blocks (the tag <data> must end before the outer tag <root> ends), the second has text - a space - outside all blocks (in this problem, spaces are treated just like any other character, see example 2).

Your task is to write a program which formats an XML document, according to the following rules: If a block contains other blocks, the start- and end-tags for that block should be on lines by themselves, and all tags and text inside this block should be indented by 3 spaces per open tag. Otherwise the start- and end-tags should be on the same line, with the textual content of the block (if any) between the tags (and nothing else, except indentation, may appear on this line). See example 0 for clarifications.

Create a class BadXML containing the method format which takes a String[] doc, the XML document, and returns a String[] containing the properly indented XML document. Concatenate the elements in doc to get the full XML document.

 

Definition

    
Class:BadXML
Method:format
Parameters:String[]
Returns:String[]
Method signature:String[] format(String[] doc)
(be sure your method is public)
    
 

Notes

-Spaces are treated like any other character, see example 2.
 

Constraints

-doc will contain between 1 and 50 elements, inclusive.
-Each element in doc will contain between 0 and 50 characters, inclusive.
-The characters in the elements of doc will have ASCII values between 32 and 126, inclusive.
-The characters '<', '>' and '/' will only be used in tags.
-A tag-name will not contain any of the characters '<', '>', '/' or space.
-A tag-name will contain at least one character.
-doc will describe a valid XML document according to the description above, and will contain at least one block.
-The return value will contain at most 100 elements, and no element will contain more than 80 characters.
 

Examples

0)
    
{"<article>",
 "<author>writer</author>",
 "<headline>TopCoder",
 " ",
 "Open 2004</headline>",
 "<ingress>",
 "</ingress>",
 "<paragraph>",
 "TopCoder Open is being held at <st:hotel>",
 "Santa Clara Marriott</st:hotel>",
 "which lies in the northern part of ",
 "<st:state>California</st:state>.",
 "</paragraph>",
 "<paragraph>",
 "&lbr;Image&rbr;",
 "</paragraph>",
 "</article>"}
Returns: 
{ "<article>",
 "   <author>writer</author>",
 "   <headline>TopCoder Open 2004</headline>",
 "   <ingress></ingress>",
 "   <paragraph>",
 "      TopCoder Open is being held at ",
 "      <st:hotel>Santa Clara Marriott</st:hotel>",
 "      which lies in the northern part of ",
 "      <st:state>California</st:state>",
 "      .",
 "   </paragraph>",
 "   <paragraph>&lbr;Image&rbr;</paragraph>",
 "</article>" }

The block surrounded by the paragraph tags contain two nested blocks as well as three plain text strings. All these end up on separate lines. The other plain text strings end up on the same line as the tags in the blocks they are surrounded by.

1)
    
{"<ro","ot>A roo","","t node</r","oot><","root>Anot","her root node</ro","ot>",""}
Returns: { "<root>A root node</root>",  "<root>Another root node</root>" }
An XML document may contain several blocks at the top level.
2)
    
{"<outer_tag>",
 "   <inner_tag>",
 "      Some text",
 "   </inner_tag>",
 "</outer_tag>"}
Returns: 
{ "<outer_tag>",
 "      ",
 "   <inner_tag>      Some text   </inner_tag>",
 "</outer_tag>" }
The indentation in the input is treated as space characters and is not removed.

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=3041

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=5883&pm=3041

Writer:

Yarin

Testers:

PabloGilberto , lbackstrom , vorthys

Problem categories:

String Manipulation, String Parsing