TopCoder problem "Comment" used in TCI '02 Finals (Division I Level Two)



Problem Statement

    

Comments are an important part of most code, however, in some circumstances, they end up just taking up space, and bandwidth. Your task is to write a class Comment with a method strip, which removes all the comments from a piece of code. For this problem, we will consider only two type of comments: comments starting with "//" and comments starting with "/*" and ending with "*/". The first type of comments removes everything starting from "//" until the end of the line. The second type of comment removes everything, starting at "/*", until the sequence "*/" is found. In addition comments may not be nested in any way. So, as soon as a "/*" comment is opened, everything is commented, until a "*/" is found. Thus the String "/*//*/code", without comments, is "code". Note that even though "//" is present, it does not comment out code, because it is commented out itself, by the comment starting with "/*". Similarly, the neither "//*" nor "///*" open a "/*" comment, because everything after the "//" is already commented out, and thus can not start a comment.

However, it is not enough to simple search for the sequence "//" and "/*", because these could be surrounded by quotes, and thus not a comment. For example, in the code:

String s = "//";

"//" does not start a comment, because it is part of a String. A String begins at a '"', which is not commented. A String ends at the first non-escaped '"'. An escaped quote is one that is preceded by a '\'. Any time a backslash is found, within a string, the next character is escaped, and the two of them together make one character. For example, in java, c++, and c# "\n" represents a new line, "\"" represents a quote, and "\\" represents a single '\' (not two, because the first one escapes the second, and together they make one '\'). Your method must take all of this into consideration when removing comments. It must remove the rest of the line, when a "//" comment is found, and remove characters starting with "/*" up to "*/" whenever a "/*" comment is found.

In addition, it must take into account that characters which are part of a String, cannot begin a comment (though they may end a comment which was previously begun by "/*"). You will be given an input String[], code, where each element represents a line of code. Your method should remove all of the commented parts of the code, and return the result. If, after removing commented text, any of the lines have been entirely removed, your return should not contain elements for those lines.

To summarize:

Upon finding "//" (not in a String) - remove the entire rest of the line.

Upon finding "/*" (not in a String) - remove all character until "*/"

Upon finding '"' (not in a comment) - all characters until the next unescaped quote are part of a String, and can not start a comment.

Upon finding '\' (in a String) - the next character is escaped, and can not close the String.

 

Definition

    
Class:Comment
Method:strip
Parameters:String[]
Returns:String[]
Method signature:String[] strip(String[] code)
(be sure your method is public)
    
 

Notes

-code will not neccessarily be syntactically correct. Your method should remove based simply on the rules above.
-When removing characters after finding "/*", remove until "*/" is found. Do not worry about the */ being part of a String, as everything between "/*" and "*/" is part of a comment, and thus can not be a String.
-For the purposes of this problem, characters may only by escaped by '\' when they are within a String.
-The beginning of a comment must be on one line. Thus if one element ends with '/' and the next element starts with '*', this will not start a comment.
-"/*" comments may not be nested. After removing comments, "/*/**/*/" becomes "*/"
-In reality, you would have to deal with code like "char c = '"';//comment" (single quote, double quote, single quote). However, for the purposes of this problem, you should treat such sequences as described above, in which case there are no comments here because the quote starts a String.
 

Constraints

-code will contain between 1 and 50 elements, inclusive.
-Each element of code will contain between 1 and 50 characters, inclusive.
-Each character of each element of code will be a letter ('a' to 'z' or 'A' to 'Z'), a number ('0' to '9'), a space, or a character from: ;:"',<.>/?[{]}\|`~!@#$%^&*()-_=+
-All "/*" comments will be closed by the end of the last element of code.
-All Strings (uncommented code starting with '"') will be closed before the end of the line they were started on.
 

Examples

0)
    
{"public class Comment{"
," public String[] strip(String[] code){"
,"//this isn't right!"
,"  return code;"
," }"
,"}"}
Returns: 
{ "public class Comment{",
 " public String[] strip(String[] code){",
 "  return code;",
 " }",
 "}" }
We remove the one commented line.
1)
    
{"/*public class Comment{"
," public String[] strip(String[] code){"
,"//this isn't right!"
,"  return code;"
," }"
,"}*/"}
Returns: { }
Here, everything is commented out.
2)
    
{"String s = \"\\\"/**/\\\"\""}
Returns: { "String s = \"\\\"/**/\\\"\"" }
The strings above are doubly escaped so that you can use the {} button in the arena. The actual input and output string are the same and are:

String s = "\"/**/\""
3)
    
{"public class Comment{"
," public String[] strip(String[] code){"
," //this isn't right!"
,"  return code;"
," }"
,"}"}
Returns: 
{ "public class Comment{",
 " public String[] strip(String[] code){",
 " ",
 "  return code;",
 " }",
 "}" }
4)
    
{"char c = '\"'//\""}
Returns: { "char c = '\"'//\"" }
note again that there are extra escape characters.

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=1018

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=4375&pm=1018

Writer:

lars2520

Testers:

zoidal , alexcchan , brett1479

Problem categories:

String Manipulation, String Parsing