TopCoder problem "PhoneSearch" used in TCO04 Qual 4 (Division I Level Three)



Problem Statement

    

When you were young and naive, you used to look up phone numbers by starting at the beginning of the phone book and flipping through the pages one at a time. By the time you found a phone number, you had usually forgotten why you wanted to call! Then you learned about binary search, and life was good. But now it has occurred to you that you might be able to do even better. You've realized that it is silly to look up a name like Wachowski by starting in the middle of the phone book. Instead, you should begin by opening the book to a page near the end.

Based on a study of the ethnicities present in your home town, you have compiled a int[] freqs of the expected relative frequencies of each letter, 'A' through 'Z'. For example, if freqs[0] is 7 and freqs[25] is 1, then you expect 7 times as many names to start with 'A' as start with 'Z'. Although it is unrealistic, you have decided to make the simplifying assumption that the same frequencies apply to the second and third letters, regardless of the values of the preceding letters. For example, among the names that start with 'F', you assume that 7 times as many have 'A' as the second letter as have 'Z' as the second letter (using the above frequencies), and among names that start with "GR", you assume that 7 times as many have 'A' as the third letter as have 'Z' as the third letter.

Each page in your phone book is labeled at the top with the three-letter prefix of the first name on that page and the three-letter prefix of the last name on that page, in the form "XXX-YYY" (nobody in your home town has a name shorter than three letters). The prefixes are always written in uppercase letters, and the pages are arranged in alphabetical order. Every page contains the same number of names, and page breaks never fall between two names that have the same three-letter prefix. You will be given information on the pages of your phone book as a String[] pages, containing the labels of all the pages in the phone book, in order.

You will be given the three-letter prefix of a name you want to look up. You intend to use the following strategy for searching the phone book. Based on your knowledge of the expected frequencies and the number of pages in the phone book, you will calculate the page where you expect the prefix to appear, and flip to that page. If it is the correct page for that prefix, you are done. Otherwise, you have narrowed down the pages on which it might appear, so you will recalculate and repeat. Eventually, you will either find the right page or discover that the prefix cannot possibly appear in the phone book (because it falls between the last name on one page and the first name on the next page, for example).

When recalculating the page where you expect the prefix to appear, you must take into account the prefixes on the pages you have seen. For example, if you have seen "HAR-JOH" on page 7 and "MCD-NAS" on page 11, and you are looking for "LOA", then you know it must appear between pages 7 and 11 (exclusive). You assume that the names on those pages, which fall between "JOH" and "MCD" (exclusive), are distributed according to the expected frequencies and calculate your next page flip accordingly.

Notice that the expected frequencies might predict that a certain prefix appears on several adjacent pages, even though we know that each prefix appears on at most a single page in the actual book. When this occurs, you will flip to the middle of those adjacent pages, rounding down if necessary. For example, if the frequencies predict that a given prefix appears on pages 22-25, you would flip to page 23.

Given freqs, pages, and the three-letter prefix of the name you are searching for, calculate the number of page flips it takes to either find the right page or discover that the prefix cannot possibly appear in the phone book. If you find the right page, return the number of page flips. If you discover that the prefix cannot appear in the phone book, return the negative of the number of page flips.

 

Definition

    
Class:PhoneSearch
Method:pageFlips
Parameters:int[], String[], String
Returns:int
Method signature:int pageFlips(int[] freqs, String[] pages, String prefix)
(be sure your method is public)
    
 

Constraints

-freqs contains exactly 26 elements.
-Each element of freqs is between 1 and 999, inclusive.
-pages contains between 1 and 50 elements.
-Each element of pages is formatted as "UUU-UUU", where each 'U' is an uppercase letter ('A'-'Z').
-The first prefix on each page is equal to or earlier alphabetically than the last prefix on the same page.
-The last prefix on each page is strictly earlier alphabetically than the first prefix on the next page.
-prefix contains exactly three characters.
-Each character in prefix is an uppercase letter ('A'-'Z').
 

Examples

0)
    
{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
{ "AAA-EZZ", "FAA-JZZ", "KAA-OZZ", "PAA-TZZ", "UAA-ZZZ" }
"WAC"
Returns: 1
All prefixes are equally likely. "WAC" is the 14875th prefix in alphabetical order, out of 17576 total possible prefixes. This puts it in the 84th percentile, so clearly you should flip to page 5 first, which turns out to be the right page.
1)
    
{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
{ "AAA-EZZ", "FAA-JZZ", "KAA-OZZ", "PAA-TOO", "TOQ-ZZZ" }
"TOP"
Returns: -2
You first flip to page 4 and see that it is not the right page. "TOP" comes after "TOO", so you next turn to page 5. However, "TOP" comes before "TOQ", so you know that "TOP" doesn't appear in the phone book.
2)
    
{ 51,50,50,50,50,50,50,50,50,50,50,50,50,
  50,50,50,50,50,50,50,50,50,50,50,50,50 }
{ "ZAA-ZBZ", "ZCA-ZDZ", "ZEA-ZFZ", "ZGA-ZHZ", "ZIA-ZJZ",
  "ZKA-ZLZ", "ZMA-ZNZ", "ZOA-ZPZ", "ZQA-ZRZ", "ZSA-ZTZ",
  "ZUA-ZVZ", "ZWA-ZXZ", "ZYA-ZZZ" }
"YYY"
Returns: -13
This is a strange town in which everybody's name starts with 'Z'. Based on the expected frequencies, you always expect "YYY" to appear on the last available page, so you flip backward through the pages one by one. (You begin to think that maybe binary search wasn't so bad after all...)

Also, beware of overflow.

3)
    
{ 1,999,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
{ "AAA-CZZ", "DAA-FZZ", "GAA-IZZ", "JAA-LZZ",
  "MAA-OZZ", "PAA-RZZ", "SAA-UZZ", "VAA-ZZZ" }
"BBB"
Returns: 3
Based on the expected frequencies, you expect the names beginning with "BBB" to start partway down page 1 and end partway down page 8, so you first flip to page 4, which is too high. Next, you expect the names beginning with "BBB" to start partway down page 1 and end partway down page 3, so you flip to page 2, which is again too high. Finally, you flip to page 1.
4)
    
{ 24,1,32,1,1,1,1,4,1,1,1,1,4,1,1,1,1,1,1,4,1,1,1,1,1,1 }
{ "HAT-HAT", "MCC-MCC" }
"HAT"
Returns: 2
A two-family town in which everybody is named either Hatfield or McCoy. Notice that a page can be entirely filled with a single prefix.
5)
    
{ 4,4,5,4,4,4,4,4,4,4,4,4,4,4,4,5,4,5,6,4,4,3,1,3,2,2 }
{ "AAA-CJS", "CKN-DEF", "EFD-FFT", "GAB-HUD", "ICH-KAL",
  "LIM-MON", "NOP-PQR", "STU-STY", "TAL-WAC", "WAD-ZUC" }
"CJS"
Returns: 2
6)
    
{ 15,10,17,7,5,3,10,19,13,156,3,8,7,1,3,9,21,22,15,12,10,5,5,6,7,1 }
{ "AAA-AAA","AAB-AZZ","BAA-BAA","BAB-BZZ","CAA-CAA",
  "CAB-CZZ","DAA-DAA","DAB-DZZ","EAA-EAA","EAB-EZZ",
  "FAA-FAA","FAB-FZZ","GAA-GAA","GAB-GZZ","HAA-HAA",
  "HAB-HZZ","IAA-IAA","IAB-IZZ","JAA-JAA","JAB-JAB",
  "JAC-JAC","JAD-JZZ","LAA-LAA","LAB-LZZ","MAA-MAA",
  "MAB-MZZ","NAA-NAA","NAB-NZZ","OAA-OAA","OAB-OZZ",
  "PAA-PAA","PAB-PZZ","QAA-QAA","QAB-QZZ","RAA-RAA",
  "RAB-RZZ","SAA-SAA","SAB-SZZ","TAA-TAA","TAB-TZZ",
  "UAA-UAA","UAB-UZZ","VAA-VAA","VAB-VZZ","WAA-WAA",
  "WAB-WZZ","XAA-XAA","XAB-XZZ","YAA-YAA","YAB-YZZ" }
"JJJ"
Returns: 4

Problem url:

http://www.topcoder.com/stat?c=problem_statement&pm=2994

Problem stats url:

http://www.topcoder.com/tc?module=ProblemDetail&rd=5876&pm=2994

Writer:

vorthys

Testers:

PabloGilberto , lbackstrom

Problem categories:

Search