The winning numbers in a city’s pick-five lottery seem non-random to a well-known investment advisor. Is there a problem with the way winning numbers are selected? You will use Access and Excel to analyse the soundness of the city’s lottery system.
The city you live in has run a pick-five lottery for several years. A player pays £1 for a ticket and picks five numbers between 1 and 56. The lottery system determines winners by generating five numbers between 1 and 56 and displaying them on ping-pong balls, which are rolled out one at a time during a televised drawing for dramatic effect. If all five numbers match a player’s numbers, the player wins a large amount of money. Players who match fewer numbers win less money. Of course, most players match no numbers and gain nothing, except for some entertainment. Lottery proceeds are used by the city council to fund programmes in education, health care and other areas.
Most people think that lotteries are honest in the sense that the winning numbers are randomly chosen. In a random selection, the number 2 is as likely to be chosen as the number 56, for example. People would be disturbed to learn that some numbers are more likely to be chosen than others, and they would probably avoid a lottery that was shown to select winning numbers in a non-random manner. So lottery officials were alarmed when Robin A. Bank, a well-known financial advisor wrote a newspaper article making such a claim. The city lottery has had drawings twice a week for almost seven years. Bank researched each drawing and was troubled by the distribution of numbers in the 706 winning combinations. The distribution is shown graphically in Figure 1.
Figure 1: Bar graph showing distribution of numbers in winning combinations
For example, the number 48 appeared 82 times in the 706 five-number combinations, but the number 55 appeared only 51 times. The distribution is shown in tabular form in Figure 2.
Num Freq Num Freq Num Freq Num Freq
1 55 15 58 29 68 43 63
2 69 16 66 30 60 44 60
3 60 17 69 31 71 45 67
4 68 18 64 32 55 46 69
5 68 19 61 33 59 47 50
6 52 20 69 34 52 48 82
7 61 21 62 35 63 49 49
8 54 22 63 36 74 50 66
9 68 23 54 37 51 51 72
10 67 24 68 38 65 52 71
11 58 25 68 39 69 53 74
12 72 26 58 40 62 54 59
13 65 27 71 41 48 55 51
14 69 28 63 42 59 56 61
Figure 2: Table showing distribution of numbers in winning combinations
Figure 3 shows the five numbers that appeared most frequently in the 706 winning combinations.
Most frequent 48 82
2nd most frequent 36 74
3rd most frequent 53 74
4th most frequent 12 72
5th most frequent 51 72
Figure 3: Table showing five numbers most likely to appear in winning combinations
How can it be, Bank asks, that these five of the 56 numbers appear much more frequently than other numbers? Such an anomaly might be expected if the lottery had been running only a few weeks, he wrote in his article. “You’d expect lumpiness in the distribution early on”, Bank said, but the lottery has now drawn a total of 3530 numbers in the winning combinations after seven years. The distribution should be evening out after thousands of numbers, but Bank says it is not. His advice to his listeners and readers is to find another lottery if you must gamble. If you decide to play the city lottery, include some or all of the numbers 48, 36, 53, 12 and 51 in your choices, because these numbers are obviously favoured by the lottery’s number-selection method.
The problem for city lottery officials is that many people seem to believe Bank’s advice. These officials are convinced that the lottery’s methods are sound and they need some way to reassure the public. You have been called in to help.
You enquire about the lottery’s method for selecting numbers, and you learn that the city has a contract with a company which generate random numbers as needed. Five random numbers between 1 and 56 are selected and then transmitted over secure telecommunications channels to the city lottery office. The same company services many other lotteries. The company’s random number algorithm is well known and is used by many major software vendors. The company’s implementation of the algorithm is certified periodically by independent statisticians. Your city’s lottery officials do not understand what could go wrong within that.
In theory, nothing should go wrong, and professional statisticians would be comfortable with the lottery. But most of the city’s citizens are not statisticians, and at first glance, Bank’s criticisms seem credible. Why are some numbers picked more frequently than others? Wouldn’t 3530 picks in more than 700 lottery drawings be enough to even out the distribution? You decide to conduct the following test:
1. Obtain recent winning lottery numbers from other cities.
2. Examine whether the selection of these numbers is distributed like your city’s lottery numbers.
3. If the numbers are similarly distributed, inform the public to reassure them. However, if the numbers are not distributed similarly, further investigation into your city’s number-selection method is required.
Lottery officials have provided winning number combinations from pick-five lotteries in three other cities. In each case, the most recent 706 winning numbers are provided. You have not been told the names of the cities; you know only that the winning numbers come from cities designated A, B and C. An Access database named LotteryAnalysis.accdb contains the data and is available to you in the MN1505 subfolder within the Management folder on the DEPTS on ‘NTSx’ (R:) drive with the filename LotteryAnalysis.accdb.
Figure 4 shows the design of the WinningNumbers table in the database file.
Figure 4: WinningNumbers table design
The values in the City field are either A, B or C. The values in the Play Num field range from 1 to 706 and represent each lottery drawing. Ball 1 values represent the first number drawn in each lotter; these values range from 1 to 56. The Ball 2, Ball 3, Ball 4, and Ball 5 fields use the same logic as the Ball 1 field.
Figure 5 shows a few records in the WinningNumbers table.
Figure 5: WinningNumbers table data records
For example, in the first lottery shown for City A, the winning numbers were 20, 53, 5, 17 and 25. In the tenth lottery drawing for City A, the winning numbers were 41, 49, 38, 52 and 21.
TASK 1: MAKING QUERIES IN ACCESS
In this task, you will design and run three make-table queries in Access to make three new tables each one containing the winning numbers for one city, A B and C. Obviously, each table should contain 706 records. You should call the tables CityA, CityB and CityC. You should also save each make-table query separately.
When you finish the queries, save and close the LotteryAnalysis.accdb file.
TASK 2: USING EXCEL FOR DECISION SUPPORT
In this task, you will import the data from the three Access tables you created into Excel worksheets. Then you develop information needed to compare the three city lotteries with your city’s lottery.
Importing Table Data
Open a new file in Excel and save it in the MN1505 folder on your Y: drive as LotteryAnalysis.xlsx.
Import the CityA table records into Excel. Click the Data tab, and the select From Access in the Get External Data group on the ribbon. Specify the Access filename LotteryAnalysis.accdb (you may have to navigate to the correct drive and folder), the CityA table name, and where to place the data in the worksheet (cell AI is recommended).
The data is imported into Excel as an Excel data table, which is the format you want. If cell A1 is not already selected, click it. In the Table Style Options group of the Table Tools Design tab, select Total Row to add a Totals row to the bottom of the table. Rename the worksheet CityA. The first few rows of your worksheet should look like Figure 6.
Figure 6: Rows in the CityA worksheet
Import the CityB and CityC table records into other worksheets in the same way. Add a Totals row to each table. Name the worksheets CityB and CityC, respectively.
Using Data Tables, Pivot Tables and Charts to Gather Data
The data tables help you to gather some of the information you need. For each of the three cities, you use the data tables to develop pivot tables and then use them to tabulate frequencies of sinning numbers. You then develop bar graphs to illustrate these frequencies, similar to the bar graph shown in Figure 1.
In the CityA worksheet, use the Total row to compute the average of the values shown for each ball. Then, below the Total row, use the =Average() function to compute the average of all values, which should be in the range C2..G707. Your results should look like the illustrative data shown in Figure 7, but the numbers will be different – from now on the figures will show illustrative data rather than the correct data for CityA.
Figure 7: Summary data for a city’s lottery numbers
There are 56 possible integers. Logically, half of the balls drawn should be numbered between 1 and 28 and the other half should be numbered between 29 and 56. Thus, you would expect the average for any ball’s values, and the overall average, to be near 28 or 29.
You need to develop a frequency table for all values between 1 and 56. To build this table, you can develop frequency tables for each ball and then combine the values for the five balls.
To create a frequency table for a ball, use a pivot table. First, create the pivot table based on the entire data table. Insert the pivot table in the existing worksheet; choose cell location J1. To create a pivot table based on only ball 1 data, drag the Ball 1 field label from the PivotTable Fields list to the Row Labels window and to the ? Values window. Both windows are in the lower-right corner of the worksheet. Drop down the arrow next to Sum of Ball 1 in the ? Values window and select Value Field Settings so that you can change this from Sum to Count to get a count of the 56 ball values, which is equivalent to a frequency chart. Rename cell J1 to Num and cell K1 to Freq and resize these columns to fit the widths of their new labels. The results should resemble the data shown in Figure 8.
Figure 8: Developing a pivot table for ball 1 values
Use the same steps to develop pivot tables for all five balls, as shown in Figure 9.
Figure 9: Developing pivot tables for all ball values
You can then create a summary of the frequencies for all five balls using the =SUM() function, as shown in Figure 10. NOTE: You will have to type the relevant cell references into the function, if you click to select, Excel will put in absolute data table cell references and you won’t be able to use the fill cell to copy the function for the rest of the column. Figure 10 shows only partial data.
Figure 10: Developing a frequency table for numbers 1 to 56
Check your work by summing the Frequency column. For example, a likely cause of error would be forgetting to change from Sum to Count when creating a pivot table. The total should be 3530 (706 lotteries multiplied by five ball values for each lottery). You can also check the table by adding the grand totals for the five pivot tables, which should equal 3530 as well. See the illustrative data in Figure 11.
Figure 11: Checking frequency data
Use the frequency table data to begin creating a stacked column chart, as shown in Figure 12.
Figure 12: Start of column chart
The columns are a combination of the Number and Frequency values, as indicated by the coloured legends. You do not want to include the Number values in the column, so click in the Number section of the chart’s bars and then press the Delete key. You chart should resemble the one shown in Figure 13.
Figure 13: Developing the column chart
You can make the chart look more professional by adding axis labels and changing the title. Values in the figure are for illustrative purposes only and may not match your own. By visual inspection you can see which five values were chosen most frequently in the lottery. Just examine the frequency values on the Y-axis of the chart. You can also verify these values by placing the cursor at the top of the column: Excel will show the related X and Y values.
You should also develop a list of the five most frequently selected numbers by sorting the frequency data. Select the frequency data and then copy it to the right of the chart (be sure to make the copy using the Paste-Values option. Then use the Data-Sort option to sort the values on the Frequency field from largest to smallest. You should see results similar to those in Figure 14.
Figure 14: Five most frequently selected numbers developed by sorting frequency values
As you can see in the figure, the five most frequently selected numbers in this illustrative data were 7 (80 times), 44 (80), 21 (76), 42 (76) and 35 (74).
Use the same procedures for the data in the CityB and CityC worksheets.
Summarising Data for Three Cities
Your city’s lottery officials hope your analysis shows that the city lottery is working properly, and that Robin A. Bank’s objections are nothing to worry about. To dispel these objections, your analysis must show that the number-generating algorithm used by your city’s lottery and by other cities produces reasonable results in a non-repeating manner. Specifically, officials hope your analysis answers the following questions in clear-cut ways that impress lay people:
1. Does the algorithm produce many unusual ball values? Ball averages should be close to each other and in the range of 27 to 29.
2. Does the algorithm favour the five numbers that Bank says people should select
3. Do the top five values in the other three cities follow your city’s top five pattern? The five most common frequencies should range from 71 to 85 which is your city’s range.
4. Does the algorithm favour any ball values? Do values repeat across the city lotteries?
If your analysis cannot dispel doubts, then more sophisticated statistical analysis will be needed to refute Bank.
Summarize your data in a way that allows you to answer the preceding questions. Create a new worksheet named Summary in which you manually gather summary data from the other three worksheets. In one part of the summary sheet, record the average of the values for each of the five balls in the three cities (this data has been computed elsewhere, see Figure 7). This data will be useful when you address question 1.
In another part of the summary sheet, record the five balls drawn most frequently as winning numbers, including the frequencies for your city. This data will be useful when you answer questions 2 and 3.
In another part of the summary sheet, you should copy the top five frequency values for cities A, B, and C into a single column, and then sort the data from largest to smallest frequencies. This data shows whether any ball values repeated in the three city lotteries. This data helps you answer question 4. One repeating values would probably not be surprising, but if other ball values repeated, further statistical analysis might be needed to show that the algorithm’s number generation is random.
In your summary sheet, manually enter notes that answer the preceding list of questions. You need these notes for later reference in Task 3.
TASK 3: DOCUMENTING FINDINGS IN A MEMORANDUM
For this assignment, you write a memorandum in Microsoft Word that documents your findings. In your memo, observe the following requirements:
• Your memo should have proper headings, such as Date, To, From, and Subject. You can address the memo to the city lottery officials. Please use a fictional name rather than your own name on this memorandum.
• Briefly outline the situation. However, you need not provide too much background — you can assume that readers are generally familiar with your task — but you need to show that you have understood the situation and what is required of you.
• In the body of the memo, summarise the four questions from the preceding section and how your analysis addresses the questions.
• List the answers to the four questions using relevant data to support them, this should include charts and/or tabular data as appropriate. Then tell the lottery officials whether you think more sophisticated statistical analysis is needed to dispel Bank’s objections.