by: Louie Beuschlein

YES! I'd like to download the original file in Microsoft Word 6.0 or higher.

I'd like the MAC Mathematica File for this lesson!

I'd like the PC Mathematica File for this lesson!

Note to teacher :
This unit is intended for high school seniors in AP Statistics.
Its intent is to introduce students both to Mathematica and to
the Monte Carlo method of solving probability problems, in particular,
the "Birthday Problem." Using Mathematica's random-number-generating
capability along with its ability to work with lists, students
can use the computer to simulate the birthdays of any number of
people and find approximate but reliable solutions to the "Birthday
Problem." The unit will also serve as an introduction to
programming in Mathematica for those students who are not yet
familiar with this powerful piece of software. In this unit,
students will be asked to copy and evaluate Mathematica code and
analyze the output. They will also be asked to modify code and
create their own code in order to obtain certain desired results.
Questions have been inserted throughout the unit as an aid to
assessment. It is recommended that students do not proceed with
the unit if they are unable to answer any of the questions, since
this may mean that the students are not fully understanding either
some concepts of probability or the Mathematica code necessary
in the simulation.

__NCTM Standards__:
This unit incorporates several of the goals from the NCTM Statistics
Standard. In particular, students will be guided in designing
and conducting a "probability experiment" whose results
will then be interpreted. Students will also be asked to draw
inferences from probability graphs generated by computer simulation.

The Birthday Problem

You may be sitting in a classroom
right now with about 25 other students. Do you think that any
of you have the same birthdays? It probably seems unlikely, since
there are 365 days in the year and only around 25 of you. Write
down your guess (as a percentage) about the likelihood of there
being a match.

Let's use Mathematica to simulate
this problem and get an idea of what the probability is that there
is at least one pair of matching birthdays in the class. Here's
the plan. We'll let Mathematica create a list of 25 random integers
between 1 and 365, inclusive. Each number will represent the
birthday of one of the 25 people in the room. We'll assume that
people are born on all days of the year with equal likelihood.
That is, being born on, say, November 12 is just as likely as
being born on, say, February 3. Why do think we restrict the
random numbers to integers from 1 to 365?

The following is the Mathematica
code for creating such a list. You can try it out by typing it
in to a Mathematica notebook and evaluating it by pressing the
Shift and Enter keys together.

**Table[Random[Integer,{1,365}],{25}]
**

Were there any matching birthdays?
Explain. Can you think of any ways of simulating birthdays without
using a computer? See if you can write the code to create a list
of 7 random REAL numbers between 2 and 5.

An easier way to check for matches
is to have Mathematica sort the list after it creates it. Go
back to your first input and sort it as follows. (Don't forget
the square brackets.)

**Sort[Table[Random[Integer,{1,365}],{25}]]
**

An even better way to determine whether or not there is a match is to have Mathematica check the list for us. That way, we won't even have to inspect it. We can do this using the Union function. Type and evaluate each of these inputs.

**Union[{3,2,6,1,8}, {6,3,8,1.6,12,9}]
**

**Union[{5,6,6,14,34,77,89,89,89,103,235}]
**

Notice that when you take the
Union of one set, any elements of the set that are repeated are
eliminated. Therefore, if the Union of our list is different
than the list itself, we must have had at least one match! To
test if they are different, we can use the **If** function.
Notice that the **If** function can take three arguments:
the test condition, what to do if the condition tests true, and
what to do if it tests false. Try the next input and explain
the output. (Note: "=!=" means "not equal.")

**If[{1,2,3,3,4} =!= Union{1,2,3,3,4},
Print["Match!"], Print["No Match"]]
**

See if you can write an **If**
statement that outputs "Even Number" if an integer is
even and "Odd Number" if the integer is odd. You may
wish to use the **EvenQ** function. Note: information on any
function can be found by looking it up in the Help menu (top right)
or by typing a question mark ("?") and then the name
of the function in a new cell. For example:

**?EvenQ
**

EvenQ[expr] gives True if expr is an even

integer, and False otherwise.

For our experiment, we would like
to test whether or not any of the 25 random integers in our list
is duplicated. To make our programming a little less cumbersome,
let's give our list a name, "datelist." Then we'll
use that list in an **If** statement. You can simply click
on your first input and insert "datelist =" at the beginning
of the line.

**datelist = Table[Random[Integer,
{1,365}],{25}]
**

Now for our **If** statement:

**If[datelist =!= Union[datelist],**

** Print["Match!"],
Print["Sorry, no match this time."]]
**

Note: Mathematica generally ignores
carriage returns, extra spaces, tabs, etc. Therefore, you should
include them as necessary in order to make your code more readable.
Evaluate the cell containing the **If** statement ten times,
keeping track of how many times a match is made and record your
results. Each time you evaluate that cell, you are really performing
a trial. What does each trial represent? Based on this little
experiment, what is the approximate probability of their being
a match?

So far we've found a way to simulate
the birthdays of 25 people and decide if any of them is the same.
Of course, doing a mere ten trials is not sufficient to determine
the probability with great confidence. This would take hundreds,
perhaps thousands, of trials. It would become very tiresome to
evaluate that cell hundreds of times and keep a tally on paper.
Therefore, let's get Mathematica to do it! To do this we will
need a **Do** loop. Let's see how one works first. Try the
next line and explain the output.

**Do[Print[i," ",
i^2," ", i^3], {i,1,3}]**

1 1 1

2 4 8

3 9 27

The "i" in the above
cell is called an iterator; it starts at 1 and stops at 3. The
**Print** statement is called three times (while i <= 3).
The quotation marks only serve to insert spaces between results.
See if you can write a **Do** statement that will print the
integers and the decimal approximations of their square roots
from 4 to 16. For this you will want to use the **Sqrt** function
and the **N** function, which gives decimal approximations
of numbers. Note: It is also possible to accomplish this using
the **Table** function.

Now let's build our **Do**
loop. Besides creating and testing many lists, we would also
like Mathematica to keep track of how many of those lists contain
matches:

**c=0;**

**Do[{ datelist[i]=Sort[Table[Random[Integer,
{1,365}],{25}]]**

** If[Union[datelist[i]]
=!= datelist[i], c=c+1]**

** }, {1000}**

** ]**

Let's take a closer look at the
above code. "datelist[i]" is a subscripted variable
as in x1 or x57. Rather than having a certain numerical value,
though, each datelist[i]
is its own list of twenty five numbers. That is, datelist[1],
datelist[2], datelist[3], ... , datelist[1000] each represent
a set of 25 integers from 1 to 365. Each datelist[i] is checked
for repeated numbers, and if at least one pair of numbers is repeated,
c is increased by one. c, therefore, serves as a counter, keeping
track of "the number of groups of 25 people with at least
one pair of matching birthdays." Note that c is set equal
to zero in the beginning. The braces in the above code group
together all the operations that will be carried out in the **Do
**loop, in this case, 1000 times.

If you tried evaluating this code,
you probably noticed that there was no output. That's because
we haven't yet asked Mathematica to tell us anything. It knows
how many of the thousand lists contain matches, but we must ask
it to share that information. Let's have it output the portion
of lists that had at least one match:

**N[c/1000]
**

What is the approximate probability
that in any given set of 25 people there will be at least two
people born on the same day of the year? Explain your answer.
In what way could the Mathematica code be modified in order to
better approximate the probability?

Note: Using Mathematica to perform
excessive numbers of trials is acceptable as long as computer
time is not at a premium! Every time you evaluate that cell,
Mathematica will approximate the desired probability for 25 people.
See if you can modify the code to find the comparable probabilities
for only two people. Record your result and compare it to what
you would predict based on your knowledge of probability. *Hint:
What's the probability of two birthdays *not* matching?
*

Do you think there is a certain
number of people beyond which there is absolute certainty that
there will be at least one match? Explain your reasoning.

Try changing the number of people
several times between (2 and 100 people) and record your results.
What are the maximum and minimum values for the probability?
What appears to be happening to the probability when the number
of people gets large? Is this what you would expect?

It is possible to define a function
in Mathematica that takes as its argument the number of people
and returns the probability of there being a matching birthday
in a group of that many people. Before we do that, though, let's
learn about how to create and work with functions in Mathematica.
First of all, square brackets are always used, not parentheses,
and the variables can be entire words. For reasons which will
not be explained here, the independent variable is followed by
an underscore, and ":=" is used rather than "="
in defining the function. In the example below, a function called
"func" of the independent variable x is defined to be
2x+ 5x + 1. Then func is evaluated at x
= 1, 0, and var1 + var2. Note that unless you've told Mathematica
the values of var1 and var2 beforehand, it just substitutes their
sum in for x.

**func[x_]:= 2x^2 + 5x + 1**

**func[1] **

**func[0]**

**func[var1+var2]**

8

1

2

1 + 5 (var1 + var2) + 2 (var1
+ var2)

See if you can write a function
called "logsum" that takes two variables, "num1"
and "num2" and adds the sum of their natural logs, returning
a decimal approximation. In defining your function you will want
to use Mathematica's built in **Log **and** N **functions.
Then use logsum to find the sum of the logs of 2 and 3. Compare
that result with what Mathematica gives for the log of 6. Surprised?
Record your results.

Now let's write a function called
"prob" that takes one variable, "people" and
returns the corresponding probability:

**prob[people_]:= ( c=0;**

** Do[ { datelist[i] = Sort[Table[Random[Integer,
{1,365}],{people}]]**

** If[Union[datelist[i]] =!=
datelist[i], c = c+1] }, {1000}]; N[ c/1000] )**

Notice that if we want a function
to several operation (such as initializing a variable, carrying
out a **Do** loop, and returning a number), we need to group
those operations with parentheses. To find the probability of
at least one duplicate birthday in a group of, say, 30 people,
all we need to do now is to find prob(30):

**prob[30]**

0.715

Let's incorporate our "func"
function into a **Do** loop so that we can evaluate it at several
x values all at once:

**Do[ Print[func[x]], {x,-2,5}
]**

-2

1

8

19

34

53

76

We could have also used the **Table**
function:

**Table[func[x], {x,-2,5}]**

{-1, -2, 1, 8, 19, 34, 53, 76}

For the purposes of our "Birthday
Problem," we would like to include our "test" function
into a **Table** so that we can evaluate "test" at
many different values. **Table** is preferable to **Do **in
this case since it returns lists, which are easily manipulated
in Mathematica. We'll call our list "data." We will
also include the number of people in the output:

**data = Table[{people, prob[people]},
{people,15,30}]**

{{15, 0.246}, {16, 0.273}, {17, 0.291},

{18, 0.371}, {19, 0.376}, {20, 0.433},

{21, 0.447}, {22, 0.458}, {23, 0.527},

{24, 0.524}, {25, 0.547}, {26, 0.581},

{27, 0.626}, {28, 0.626}, {29, 0.67},

{30, 0.713}}

Notice that Mathematica outputs
a "list of lists." What is the approximate probability
when there are 17 people? About how many people do you need to
have in a room in order to have about a 50-50 chance of there
being a matching birthday?

We can make the output move visually
appealing with the **TableForm** function:

**TableForm[data]**

15 0.246

16 0.273

17 0.291

18 0.371

19 0.376

20 0.433

21 0.447

22 0.458

23 0.527

24 0.524

25 0.547

26 0.581

27 0.626

28 0.626

29 0.67

30 0.713

You may have noticed that Mathematica
lists the probability for 24 people as slightly less than that
for 23. This obviously cannot be true. Did Mathematica make
a miscalculation? Explain this phenomenon.

We can also ask Mathematica to
display these results graphically:

**ListPlot[data]
**

Notice that built-in Mathematica
functions always start with a capital letter, and if they are
made up of two English words, they are both capitalized with no
spaces in between. We can change the appearance of the graph
as follows:

**ListPlot[data, PlotRange->{{0,60},{0,1}},**

** AxesLabel->{"
Number of People", "Probability"}]**

**
**

"PlotRange" and "AxesLabel"
are examples of *options. *To find out what options a function
can take you can type "**??***FunctionName".
*

It may seem by looking at the
graph that the probability is a linear function of the number
of people. Explain why this can't be so, at least over a large
range of people.

See if you can make your own graph
that displays the probabilities for 2 through 60 people. You should
get something like this:

Does this graph make sense? What
appears to be happening to the probability as the number of people
gets very large? Explain.

With the "PlotJoined"
option we can "connect the dots."

**ListPlot[data, PlotRange->{{0,60},{0,1}},
PlotJoined->True, **

** AxesLabel->{" Number
of People", "Probability"}]**

Compare the previous graph with
the following graph. See if you can explain why the following
graph is much smoother. *Hint: It has nothing to do with the
options for ListPlot. Rather, it came about from a slight
modification of the "prob" function.
*

*Note: Creating the data necessary
to produce the above required about an hour of computer time on
my computer. (I used 10,000 trials for each of 59 different-sized
groups of people, groups of 2 up to groups of 60).
*

Now that you're a master at programming
probability simulations in Mathematica, see if you can write the
code needed to solve the "Birth Month Problem"--the
probability of there being at least two people in a room who were
born in the same month--using the Monte Carlo Method. Graph probability
as a function of people. How many people are required for a match
to be a certainty? What about for the probability to be at least
50%?