Problem Statement |
|
Bloggo lets content producers integrate a text-search facility into
their weblogs, making it easy for end users to sift through archival
content. One type of query offered by the bloggo search engine is the
sequence search. In such a query, consecutive words are separated
by an ellipsis, "...". The query matches strings in which all
words of the query appear, in the same order as they are given in the
query. Such a string, called a passage, begins with the first
word of the query and ends with the last word of the query.
For example, the sequence-search query
with...here...there
matches the following passages.
With a quack-quack here and a quack-quack there
With an oink-oink here and an oink-oink there
With a moo-moo here and a moo-moo there
Observe that matching is not case-sensitive. Also bear in mind that words
are defined as sequences of alphabetic characters, 'a' to 'z' and 'A'
to 'Z', that are not included in a longer word. Thus, the query shown
above does not match either of the following passages in part or in whole.
without a woof-woof here or a woof-woof there
with a meow-meow here and a meow-meow thereabouts
In the special case of a single-word query, a matching passage is also one word long.
The query
pudding
therefore matches passages that look like this.
pudding
In general, a query word must appear in a matching
passage at least as many times as it appears in the query. Thus, the query
a...quack...quack...night
does not match
a quack in the night
but it does match both of the following.
a quack in the quack factory at night
a quack doctor said quack quack last night
You are given a String[], documents, each element of
which contains a single document. A document may include, in addition to
words and spaces, the punctuation symbols ',', ';', '.', '!', '?', '-',
'(', and ')'.
You are also given a String, query, consisting
of one or more words separated by ellipses. Find the top five passages
matching this query and return them in a String[] formatted
according to the specifications below and sorted by descending order
of quality, where quality is also defined below. Each passage must
be a substring of an element of documents, and must begin and
end with a whole word.
The shorter a passage, the higher its quality. The length of a passage
is counted in characters, and a passage extends from the first character
of the first query word to the last character of the last query word. If
two passages are equally long, the one drawn from the lower-numbered
document takes precedence. In the case of equally long passages in the
same document, earlier ones are better. Passages may overlap. If there
are fewer than five matching passages among the documents, return the
ones that do exist. To make the search results helpful to users, each
passage should be presented in the format
DOC_ID START_INDEX [PASSAGE_TEXT]
where DOC_ID is the zero-based index of the document from which the
passage is drawn, START_INDEX is the zero-based character index of the
first character in the passage, and PASSAGE_TEXT is the text of the
passage. There is one space between DOC_ID and START_INDEX, one space
between START_INDEX and the left bracket, and no other spaces except
possibly in the PASSAGE_TEXT.
|
|
Definition |
| Class: | bloggoSequenceSearch | Method: | findPassages | Parameters: | String[], String | Returns: | String[] | Method signature: | String[] findPassages(String[] documents, String query) | (be sure your method is public) |
|
|
|
|
Constraints |
- | documents contains between 1 and 50 elements, inclusive |
- | each element of documents is between 1 and 50 characters long, inclusive |
- | the only characters allowed in documents are 'a' to 'z', 'A' to 'Z', the space character, ',', ';', '.', '!', '?', '-', '(', and ')' |
- | query is between 1 and 50 characters long, inclusive |
- | query consists of a sequence-search query as defined above |
|
Examples |
0) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "the...People" |
| Returns:
{ "17 19 [the people]",
"17 34 [the people]",
"18 4 [the people]",
"17 19 [the people, by the people]" } | These documents are fragments of the Gettysburg Address. Observe that word matching is case-insensitive and that passages may overlap. |
|
|
1) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "Shall...The" |
| Returns: { "18 16 [shall not perish from the]" } | Note that the substring "the people, shall" in the last document does not match this query. |
|
|
2) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "wE...nOT" |
| Returns:
{ "0 22 [we can not]",
"1 0 [we can not]",
"1 25 [we can not]",
"1 0 [we can not consecrate -- we can not]" } | |
|
3) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "we...can...not" |
| Returns:
{ "0 22 [we can not]",
"1 0 [we can not]",
"1 25 [we can not]",
"1 0 [we can not consecrate -- we can not]" } | |
|
4) | |
| {"B A b B B A b a A B b b b a a a B a A b a B B b b ",
"A B b b A a A b A A a b b A A a A a A b A a A B b ",
"B B a A a a A A A b A b b a a B B b a A a A b a A ",
"a b b b b A B a A A A A a b A b A A B a A b b a A ",
"A b a A b B B a b A A A A B A A B a A B a b b b B ",
"A A A b b a a a A b b b a A A b a B a a B b B a b ",
"B A B A a b A B a B b A b B b b A B A A a B A b B ",
"A A b B a a b B b a b a A A A B A b b b b A b b A ",
"B B A A B B A b a a b a a A B B A A B B A B b A b ",
"A a A b B B b A a b b B A a B b b A B b a a a b b ",
"B B a b A A a b a A a A a B B A B A A a B B b a B ",
"B b B B a B A B b B B a A a a a b b B a A A b a A ",
"b B a a a B b a A a A b a a A b A B b a B a b A A ",
"A A b B b A a B a a a b B A b B B a b B b b b B B ",
"b B a A A A B B b b b b a b B B a a a b B b A A B ",
"a B A a a a a A A B A A B b a b A A a A a a B B B ",
"B B A B b B B a b B A B A b b A b A B A A b A a b ",
"B A A A b A B a b b b a b b b a a A A b A b a A b ",
"A a B B B A B b b A B a B a A b B b b a B A B b a ",
"B a A A B a a b b B b B A A a A b a a b b b a B b ",
"a a b A B a b B B B A A B a B b B A b B b a b A B ",
"A a a b a A b B A b B a A B a a A B a A b a b a B ",
"A b A b b A b b a B b a A a b A a A A A A a A b A ",
"a B B A a B B a b B b a B b A a B B B a B a a b a ",
"a a A b B B B a A B a B a A a B a a B b A B b b B ",
"A a A B A A A b b B A A a A b B b a A b B A b b A ",
"A A a a b B A b a b a A A A B B b B A a A B A B b ",
"b a b B a A B a a b B B B A b a A a A A A a a A b ",
"a b a a B a B b a B B A A b a B A B b A b a A a a ",
"b a a B a a b A B A a b a B A B b b B A b A a b a ",
"a B b B a a a A b B B A b b b A A B A b b b B b A ",
"B B a A b b a B B b A a a A B B A a A b a a B A A ",
"a A a A b b B B a B B b a b b A B a B A a a b a A ",
"A B A A a a b A b A B B A b A B b B B a a a A b a ",
"b A B b b b a b b A A a A B b B B A a b b B a b a ",
"A B B b b a a a B A B b b a b A a a B A b B A B B ",
"A b B b B a b a B B B B A a A a a B B b b a b A A ",
"B B a a b B a A b A A A a A a b b b a B B A a B a ",
"A B a a B A B B a B B b b a B a b b A A B b A A b ",
"B b a a A a B b B a b A B B B A b B b A A B a a B ",
"B b b A B a a b a B a b A A b B a b b a A B B a A ",
"A A b A a a a a B B b b b b A B A B B a B b A a a ",
"B B B A b A a a a B a B b b B b a A a b b A B a B ",
"A B b b B a b A b B b A B A A A a B B a A b b b B ",
"B a A B A b b A b b a A A b b b A B a b B a b A b ",
"B a a B a B A b B A b b a b B a b a A b b a A b B ",
"a b b a A b A a A B B a b B B a A B A a b B a a a ",
"b b a A a a A a a B b A B a A B B a A a B A b A a ",
"b b B b B a a b a B B A B b A b A a b b B A B b a ",
"a b a B a A B a A A A A A a B a A A a B a b b b b "} | "a...b...a...b...a...a...b...b...a...a...b...b" |
| Returns:
{ "45 24 [a b B a b a A b b a A b B]",
"8 12 [A b a a b a a A B B A A B B]",
"18 18 [A B a B a A b B b b a B A B b]",
"40 16 [a B a b A A b B a b b a A B B]",
"47 16 [a B b A B a A B B a A a B A b]" } | |
|
5) | |
| { "this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "people" |
| Returns: { "0 23 [people]", "0 38 [people]", "1 8 [people]" } | |
|