Problem Statement |
|
To help end users sift through archival content, bloggo makes it
possible for content authors to incorporate a search facility into their
websites. Among the query types supported by the bloggo search engine
is proximity search, which lets users look for passages in which
specified words occur near one another. The syntax of a proximity-search
query is prescribed by the following grammar.
Query -> Word | "(" Query " " Near " " Query ")"
Near -> "+" Num
Num -> Digit | Digit Digit | Digit Digit Digit
Digit -> "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
A passage returned by proximity search is displayed to the user as a
substring of a text document. Internally, a passage is represented
by a pair of non-negative integers (a,b) such that
a<=b, where a and b are the zero-based
position numbers respectively of the first and last words in the
substring. A word is a sequence of alphabetic characters that is not
contained within a longer word. If a passage consists of a single word,
then a=b. For example, the passage (9,11) of the following
document is "appraising his wheat", while passage (17,17)
is "actor". Position numbers have been added for purposes
of illustration.
His expression combined that of a Middle Western farmer appraising
0 1 2 3 4 5 6 7 8 9
his wheat crop and that of an actor wondering whether he is
10 11 12 13 14 15 16 17 18 19 20 21
observed--the public manner of all good Americans. (Fitzgerald)
22 23 24 25 26 27 28 29 30
If a query consists of a word, it matches all passages consisting of
just that word. Matching is case-insensitive, so the query "HIS" would
match passages (0,0) and (10,10) in the above document.
If a query consists of two queries X and Y joined by the
"Near" operator, it matches a passage (p, q) under
the following conditions. There must be a passage (a,b)
matching query X, and a passage (c,d) matching
query Y. Letting n be the number specified by the "Near"
operator, it must be true that if the two passages don't overlap, they
are separated by at most n words. So if b<c or
d<a, we must have (c-b)<=n+1
or (a-d)<=n+1, respectively. Finally, p
and q are the leftmost and rightmost endpoints among the two
passages, meaning that p=min{a,c}
and q=max{b,d}.
You are given a String[], documents, each element
of which is a single document that may include, in addition to words and
space characters, the punctuation symbols ',', ';', '.', '!', '?', '-',
'(', and ')'. You are also given a String, query,
containing a proximity-search query. Find the top ten passages matching
this query and return them in a String[] formatted according
to the specifications below and sorted by descending order of quality,
where quality is also defined below. Each passage must be a substring
of an element of documents, and must begin and end with a whole word.
The smaller the width q-p of a passage (p,q),
the higher its quality. In the event of a tie between two passages, the
one occurring in the lower-numbered document takes precedence. If both
are in the same document, the one occurring earlier is better. Although
every pair of passages in the search results must be different in the
sense of covering different spans or occurring in different documents,
they may have identical textual content. If there are fewer than ten
matching passages among the documents, return the ones that do exist. To
make the search results helpful to users, each passage should be presented
in the format
DOC_ID START_INDEX [PASSAGE_TEXT]
where DOC_ID is the zero-based index of the document from which the
passage is drawn, START_INDEX is the zero-based character index of the
first character in the passage, and PASSAGE_TEXT is the text of the
passage. There is one space between DOC_ID and START_INDEX, one space
between START_INDEX and the left bracket, and no other spaces except
possibly in the PASSAGE_TEXT.
|
|
Definition |
| Class: | bloggoProximitySearch | Method: | findPassages | Parameters: | String[], String | Returns: | String[] | Method signature: | String[] findPassages(String[] documents, String query) | (be sure your method is public) |
|
|
|
|
Constraints |
- | documents contains between 1 and 50 elements, inclusive |
- | each element of documents is between 1 and 50 characters long, inclusive |
- | the only characters allowed in documents are 'a' to 'z', 'A' to 'Z', the space character, ',', ';', '.', '!', '?', '-', '(', and ')' |
- | query is between 1 and 50 characters long, inclusive |
- | query consists of a proximity-search query as defined above |
|
Examples |
0) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "(can +000 we)" |
| Returns: { "0 22 [we can]", "1 0 [we can]", "1 25 [we can]" } | These documents are made of fragments of the Gettysburg Address. The words "can" and "we" occur next to each other in three places. |
|
|
1) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "this" |
| Returns: { "1 43 [this]", "15 34 [this]", "17 0 [this]" } | The word "this" appears three times. |
|
|
2) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "((the +099 people) +999 by)" |
| Returns:
{ "17 19 [the people, by]",
"17 23 [people, by the]",
"17 31 [by the people]",
"17 19 [the people, by the people]" } | |
|
3) | |
| {"But in a larger sense we can not dedicate --",
"we can not consecrate -- we can not hallow this",
"ground. The brave men, living and dead, who",
"struggled, here, have consecrated it far above",
"our poor power to add or detract. The world will",
"little note, nor long remember, what we say here,",
"but can never forget what they did here. It is",
"for us, the living, rather to be dedicated here",
"to the unfinished work which they have, thus",
"far, so nobly carried on. It is rather for us",
"to be here dedicated to the great task remaining",
"before us -- that from these honored dead we take",
"increased devotion to that cause for which they",
"here gave the last full measure of devotion --",
"that we here highly resolve that these dead",
"shall not have died in vain; that this nation",
"shall have a new birth of freedom; and that",
"this government of the people, by the people,",
"for the people, shall not perish from the earth."} | "((TO +20 (tO +20 tO)) +20 ((TO +20 tO) +20 To))" |
| Returns:
{ "4 15 [to]",
"7 27 [to]",
"8 0 [to]",
"10 0 [to]",
"10 21 [to]",
"12 19 [to]",
"10 0 [to be here dedicated to]" } | Word matching is case-insensitive. |
|
|
4) | |
| {"c B b B A C b b A C A a c a B A b C c b b b b A a ",
"b A A C c c B c A c c C b c B C B A a B b a b c A ",
"a C C B A b A a A c A B b b a A A C a B C A A b C ",
"C A a a c a A B C a c B c B b A c B A A c b b B b ",
"a A a A b C A b c a B B a c B A a c a b c b C b C ",
"c C A a C C A c A c b a b A b A c C A C b C C B c ",
"a B C B c A b C c c a a c a c A A c a A B c c A c ",
"A C A C c a b C C C c A a b a b b C C a A C b a c ",
"C A B c a A B b b c b C b B C c A b A B A c b B a ",
"c b a B a C B C c A c b B c C A A C c C a B C a a ",
"A C C A a C C B a c C a C a c A A A C c A a B a A ",
"a b c B C C A b b a c c b B A B b A b b b b a a b ",
"C B c C B b c c a C B C b B B a C B B a A B B c c ",
"a B B B B A c c C a b b A c B A A b B B B c C c B ",
"c A B C b B b c a a C b b c b c b a B A c c A a A ",
"a B A B c c b C A A b b A a A B b C A A B b c b C ",
"b b b A A c a c a c a B A B b a a A c B b B B A b ",
"c a b B a b A c C B b B b c A A C c b C B a c c c ",
"C A A C b C C A a c B c b B C A c B a c c B c b c ",
"b A A B b B b b b A C b a b A C b B C A c c A B A ",
"b a c b B a B B B a a a A C a a A A A A A B a c a ",
"c a A C c A c b a a C B b c c B C B c b a c C B B ",
"a b A c B b b B B C c b c B A a a b C b a C b c b ",
"c b b b a a A b b b b c B b b c c a a c a C A a b ",
"C B b b A A A b B c b C a c C a B A A C C c C b C ",
"B C a a B A a B c B a C c B A B b c B a b B B b C ",
"a A a C C b A C a C b b A a B C C C C A c C c C b ",
"a c a a A a c C b B b c A c B b A A A B a C c c c ",
"A c B b c C c a b b C B b c B b b A C c B b a b b ",
"c a c A A B C C b C b C c C c C c b b a A C c C a ",
"A b a A b B B C c C c c a b C a C a b c c A A B c ",
"c C a b A a b A C C a C A a B C c a C b c b B C C ",
"B C C B b B C A c a C B c a B b c B A A a a a b C ",
"b a a A A B b b a b A b b a b c A B B c C a a A C ",
"a c a C A b a A C c a C c b b A A B a C B a A C B ",
"B A B B a a c c a c b A c c c B C A A B a b B B B ",
"C c B c A c b C b a c b a c C A c A a c b a b c c ",
"B C b c c A C A b A B c c c B A C b b C c C a a b ",
"C a b b C a C B a B C b c C C c b b c A b B B b a ",
"a B A b a A C C B C B A B a b a a c a c b b b B a ",
"c b c B A C C c A B a b B a A A C b A C b C a a c ",
"b b a b a A c c A c A A A a A b c a C C b a A C a ",
"a B b A b C c c a A b A c A B A b B a B a b A c A ",
"c B b b C b c c c a B C c c c a b a c B B a C c A ",
"C A a b c C c b a B A c A C C c B A C b A C c b c ",
"a c A a b C b C A C A A a c B c a B A B B b A c a ",
"B c C B a B C C B a a B A b b c b b A c A B b A a ",
"c c c a C A B a c b c a B c b a C B a B A a b c A ",
"B c b a C B c A A b a a c c c A c B C B B C b b a ",
"C a C c a c c c b a C C B C a B a C A b a C a c B "} | "(a +3 (b +2 (a +6 (((b +1 c) +3 a) +0 c))))" |
| Returns:
{ "0 6 [B A C]",
"0 8 [A C b]",
"0 14 [b A C]",
"0 24 [c a B]",
"0 30 [A b C]",
"1 12 [B c A]",
"1 30 [C B A]",
"1 42 [a b c]",
"1 44 [b c A]",
"2 4 [C B A]" } | |
|