The website http://testingbenfordslaw.com/ has a number of large datasets for various magnitudes, such as number of followers for Twitter users, iPhone passcodes for iPhone users, population for different cities in Spain.
The website analyses each dataset to find the distribution of first digits for the magnitude in question.
For example, take a dataset which gives population for all the countries in the world
The first digits are 2, 3, 3, 8, 1.
You’d think it must be just a fluke that those few countries with names beginning with A show more small first digits (1, 2, 3) than bigger ones (4, 5, 6, 7, 8, 9). But it’s not just a fluke.
Use the website to do analyses of a number of datasets. Calculate the mean cumulative percentages (over all the datasets) for first digit 1, first digit 1 or 2, first digit 1 or 2 or 3, etc. (so your final cumulative percentage, for first digit ≤ 9, must be 1). Draw a graph of mean cumulative percentage vs. first digit.
By thinking, or by research on the web, find an explanation for your result. Julian Havil’s session on 20 October will give you a lot of help. Written answers to Mr Adebayo, Mr Osborn, or Mr Thomas by Friday 21 October. Every good partial answer wins a Freddo. Even better a full answer, of course.
This result is so reliable that it is used by accountants to detect fraud. When people are manufacturing fake invoices in order to embezzle money from a firm, they tend to choose a very different pattern of first digits for the amounts from the pattern in real invoices (the most common first digits they choose are 7 and 8).