Pick a random financial transaction from the ledger of a typical business
and there is about a 30% chance that the first non-zero digit of the amount of
money involved is a 1. This counter-intuitive fact is a result of Benford's Law,
discovered by astronomer Simon Newcomb in the late 1800's and rediscovered
by physicist Frank Benford in the early 1900's. They found that real world
data are not evenly distributed. Instead, given a random number related to
some natural or social phenomenon satisfying certain conditions, the probability
that the first non-zero digit of
the number is D is log10(1 + 1/D).
Increasingly, financial auditors are using Benford's Law to detect
possible fraud. Given a record of the financial transactions of a company,
if some leading digit appears with a frequency that significantly differs
from that predicted by Benford's Law, it is a signal that those transactions
deserve a closer look.
Create a class BenfordsLaw with a method questionableDigit
that takes a int[] transactions and an int
threshold and returns the smallest digit that appears as the
first non-zero digit of numbers in transactions with a frequency
that is at least threshold times more or less than its expected
frequency (e.g., more than three times the expected frequency or less than
one-third the expected frequency when threshold=3).
If no such digit exists, return -1.
For example, consider the data
5236 7290 200 1907 3336
9182 17 4209 8746 7932
6375 909 2189 3977 2389
2500 1239 3448 6380 4812
The following chart gives the actual frequency of each leading digit,
and its expected frequency according to Benford's law.
digit | 1 2 3 4 5 6 7 8 9
---------|---------------------------------------------
actual | 3 4 3 2 1 2 2 1 2
expected | 6.02 3.52 2.50 1.94 1.58 1.34 1.16 1.02 0.92
Assuming a threshold of 2, there are two digits
that are questionable: the leading digit 1 appears less
than half as often as expected and the leading digit 9
appears more than twice as often as expected. Since 1
is smaller than 9, the answer is 1.
Note that, although the expected frequencies have been rounded to two decimal places
in the above table for the purposes of presentation, you should perform all such calculations
without rounding. Errors of up to 0.000001 in the expected frequencies will not affect the
final answer.
|