An Introduction to Simulations

Introduction

Simulations represent an important application area within the field of computing. Questions naturally arise for which exact answers are either impractical through experimentation or too time consuming to be helpful.

As with many computing applications, a simulation to address a simple initial question may evolve into a series large and sophisticated problems. As a result, this reading is paired with a continuing discussion of program management, that identifies software-development approaches that help manage complexity and ensure program correctness.

Simple Simulation: Version 1

To begin, consider the following question (that I was asked during a work break while I was on my first sabbatical leave).

Simulation Question

A couple decides to have children, until it has at least one boy and at least one girl, after which they will stop having children. How many children can the couple expect to have?

Some assumptions and a first simulation

In starting to investigate this question, several approaches might be considered.

One might try to identify a large number of couples who followed this approach, collect data on the number of children they had, and do a statistical study.
One might make some assumptions about the gender of successive children and build a mathematical model. One also might consider how changes in assumptions might impact the model and the results obtained.
One might build assumptions about gender into a computer program that simulates family size and then run the simulation for 1 or 20 or 1000 couples. Changes in assumptions might allow reasonably easy adjustments to program parameters.

In considering these alternative investigations, approach A would be challenging. Finding such couples could require substantial time and effort, and locating a large number of these couples seems impractical. Approach B requires in-depth knowledge of the notion of "expected value" within statistics, solid understanding of power series (in calculus), and substantial analytical skill. Although possible for simplifying assumptions (this was the basis of my first solution to the problem), approach B may require tinkering with the underlying model and adjustment of the mathematical analysis as assumptions change.

In contrast to Approaches A and B, Approach C can be reasonably straightforward. To begin, we make two simplifying assumptions.

Assume the gender of the next child is equally like to be a boy or a girl.
Assume the gender of any child is independent of the genders of any previous children.

Later, in a subsequent reading on program management we consider how to adjust our study by changing some of these assumptions. For now, we use these assumptions as a means to begin.

Within a program, the basic approach will be to consider the gender of successive children, and we need a mechanism that will sometimes allow a conclusion of "girl" and sometimes "boy". A common approach utilizes a pseudo-random number generator.

Pseudo-random Numbers

C provides several functions that return apparently random numbers. Called pseudo-random number generators, successive use of these functions gives a sequence of numbers that seem random according to various statistical tests. Behind the scenes, an algorithm produces one number after another, based on an underlying value, called a seed. (Since an algorithm is involved, the number sequence is not really random — hence the term "pesudo-random number generator.) From a user's perspective, however, the numbers produced by a pseudo-random number generator seem quite random.

The traditional random number generator in C is called rand. Defined in the stdlib.h library, the function returns integers between 0 and a defined constant RAND_MAX. Given this range, the expression rand() / ((double) RAND_MAX) yields a random number between 0.0 and 1.0.

As a final detail in using rand or another pseudo-random number generator, the computation of these numbers depends upon a behind-the-scenes value seed. By default, when a program runs, this seed is set to 1, so the program will produce the same sequence of numbers each time the program is run. To get different numbers with each run, a common approach is to use the current time of day to set the seed at the start of a simulation program. Although the details are somewhat intricate, the appropriate initial command is

srand (time ((time_t *) 0) );

`rand/srand` example

To illustrate how rand works and the need for srand, consider the following program, rand-basic.c:

int main ()
{
  printf ("the first 12 numbers returned by the rand function are:\n");
  for (int i = 0; i < 4; i++)
    printf ("%15d %15d %15d\n", rand(), rand(), rand());

  return 0;
}

Without any further code, this program printed the following every time it is run on a particular Linux workstation:

the first 12 numbers returned by the rand function are:
     1681692777       846930886      1804289383
      424238335      1957747793      1714636915
      596516649      1649760492       719885386
     1350490027      1025202362      1189641421

When the program starts, some bookkeeping is performed to initialize the random number generator. Calls to rand follow a behind-the-scenes algorithm to produce success numbers that seem random. However, since the initial bookkeeping is the same for each program run, the subsequent sequence of generated numbers also is the same for every program run.

To avoid this difficulty, srand is placed at the beginning of a program (not in the loop) to change the behind-the-scenes bookkeeping.

The expression time ((time_t *) 0) uses C's general time function. With the parameter (time_t *) 0 tells the time function to return the number of seconds since midnight on 1 January 1970. In practice, this means that time ((time_t *) 0) will provide a different value each time the program is run.
srand uses its parameter (in this case, a number of seconds) to adjust the behind-the-scenes bookkeeping for the random number generator. Since the bookkeeping will be different with each run of the program, the sequence of random numbers also will change whenever the program is run.

A first simulation program

Using these details for rand, program couple-1.c simulates the family size for one couple.

   This program simulates counting how many children 
      the couple might have.
*/

#include <stdio.h>

/* libraries for the random number generator */
#include <stdlib.h>
#include <time.h>
/* Within the time.h library,
 *    time returns a value based upon the time of day
 * Within the stdlib.h library
 *    rand returns a random integer between 0 and RAND_MAX
 */

Preliminaries

The program requires C's standard libraries for printing (stdio.h), functions, rand and srand, for the random number generation (stdlib.h), and time (time.h).

int main ()
{
  /* initialize random number generator */
  /* change the seed to the random number generator, 
     based on the time of day */
  srand (time ((time_t *) 0) );

Initializing the random number generator

The time of day is used to set the behind-the-scenes seed for the random number generator, so the program will utilize a different sequence of numbers for each run (each time the program is run, the time will be different).

  /* couple starts with no children */
  int boys = 0;
  int girls = 0;

Loop variable initialization

The variables boys and girls represent how many children of each gender for the current family.

  /* couple has children */
  while ((boys == 0) || (girls == 0))

The main simulation loop

The while loop focuses on the number of boys and the number of girls.

    {  
      if ((((double) rand()) / ((double) RAND_MAX)) < 0.5) 
        boys++;
      else
        girls++;
    }

Determining gender

The rand function returns a pseudo-random number between 0 and RAND_MAX, so

((double) rand()) / ((double) RAND_MAX))

is between 0.0 and 1.0. Half of the time, this fraction should be less than 0.5, and those cases are considered to represent boys.

  /* reporting of family size */
  printf ("Simulation of family size\n");
  printf ("    boys: %2d    girls: %2d    total:  %2d\n", 
          boys, girls, boys + girls);
  return 0;
}

Wrap up

The results are printed after the couple stops having children. Here are three sample runs.

Simulation of family size
    boys:  1    girls:  1    total:   2
Simulation of family size
    boys:  1    girls:  2    total:   3
Simulation of family size
    boys:  3    girls:  1    total:   4

A Slightly Better Simulation: Version 2

Although this basic simulation provides information about one couple, the program must be run several times to learn about many couples. To consider multiple couples, the simulation itself might be placed within a loop that repeats the simulation 20 times.

  /* initialize pseudo-random number generator */
  /* change the seed to the pseudo-random number generator, 
     based on the time of day */
  srand (time ((time_t *) 0) );


  printf ("Simulation of family size\n");

  int couple;
  for (couple = 0; couple < 20; couple++)
      {  
          /* couple starts with no children */
          int boys = 0;
          int girls = 0;
      
          /* couple has children */
          while ((boys == 0) || (girls == 0))
            {  
               if ((((double) rand()) / ((double) RAND_MAX)) < 0.5) 
                  boys++;
                else
                  girls++;
            }
      
          /* reporting of family size */
          printf ("    boys: %2d    girls: %2d    total:  %2d\n", 
                  boys, girls, boys + girls);
      
      }

In this version,

The initial setting of the seed, based on time, is accomplished once at the start and not repeated later. If the seed were re-initialized inside the loop, the clock might not change enough to alter the underlying seed, and the simulation might repeat the same sequences of pseudo-random numbers.
The opening printf statement is moved early, outside the main loop, so the title, "Simulation of family size", is printed just once, not many times.

The variable couple is used to keep track of the number of times the simulation is run.
The entire simulation is repeated within the main loop.
- The number of girls and boys is reset to 0.
- The couple has children until there is at least one boy and at least one girl.
- The number of boys and girls is printed.

Sample output from one run of this program follows:

Simulation of family size
    boys:  1    girls:  1    total:   2
    boys:  1    girls:  1    total:   2
    boys:  1    girls:  3    total:   4
    boys:  1    girls:  3    total:   4
    boys:  1    girls:  2    total:   3
    boys:  3    girls:  1    total:   4
    boys:  3    girls:  1    total:   4
    boys:  1    girls:  1    total:   2
    boys:  1    girls:  2    total:   3
    boys:  2    girls:  1    total:   3
    boys:  1    girls:  2    total:   3
    boys:  1    girls:  1    total:   2
    boys:  2    girls:  1    total:   3
    boys:  1    girls:  1    total:   2
    boys:  2    girls:  1    total:   3
    boys:  3    girls:  1    total:   4
    boys:  1    girls:  3    total:   4
    boys:  1    girls:  1    total:   2
    boys:  1    girls:  1    total:   2
    boys:  2    girls:  1    total:   3

Avoiding "Magic Numbers"

Although this code works, it has at least three difficulties:

The number of couples (20) appears out of context and is sometimes called a magic number. The number is arbitrary and is easily mistyped. Further, changing the number likely requires a programmer to search through the entire program. In a longer program, the programmer may have to track down all references to this number — a potentially time consuming and error-prone endeavor.
The nesting of a one loop (e.g., while) within another (e.g., for) generates some logical complexity. The point of the inner while loop is to simulate the number of children for a couple, but this program combines all details together in one code segment. Such complexity may be manageable in the context of this reasonably simple problem, but complexity can become overwhelming without mechanisms to control it.
The program prints the results of all the couples, but a user will have to review the full data to make any conclusions. Since the main question is "how many children might a couple expect to have?", a more useful output might display the average number of children a couple might have and the maximum number for 100 or 1000 couples.

This section addresses the first of these difficulties. The next reading on program management addresses the second and third.

As noted, the number 20 in the above program does not provide a programmer or reader with any insight about what it represents: a natural question is "why 20?" Instead, the program would be clearer and easier to understand if the number 20 were replaced by a descriptive name, such as numberOfCouples

In practice, numberOfCouples could be specified in two or three ways.

The program could include a directive to the compiler that the variable numberOfCouples will always represent the number 20. This is done by inserting the statement
```
#define numberOfCouples 20
```
early in the program, before the main procedure. This location is easy for both programmers and readers to find and change as needed, and all subsequent references will then use then descriptive name numberOfCouples rather than 20.

In C, the #define directive tells the compiler to replace the name numberOfCouples by the text that follows — in this case 20.

With this definition, the for loop in the above program becomes
```
  for (couple = 0; couple < numberOfCouples; couple++)
```
and compiler substitutes 20 for numberOfCouples, as desired.

Since #define instructs the compiler to replace the name numberOfCouples by whatever follows — a type of "find-and-replace" for a word processor, a programmer must be careful to specify exactly the text desired. For example, if a semi-colon were added to yield

#define numberOfCouples 20;

then the resulting for loop, after compiler substitution, would produce an unwanted semi-colon and a resulting syntax error:

  for (couple = 0; couple < 20;; couple++)

The program could define an additional, new variable:
```
 
int numberOfCouples = 20;
```
or
```
 
const int numberOfCouples = 20;
```
Within the program, this variable might be declared at the very beginning (e.g., before main), where it will be visible to all parts of the program; or the variable might be declared at the start of main itself, where it might be used within main as desired.

Although numberOfCouples could be specified either with #define or with a variable declaration, many C programmers will state a strong preference for #define for two reasons.

#define can be handled by the C compiler in a beginning step, so #define allows quick processing before the program is finally compiled or run.
The declaration of a variable with int numberOfCouples requires the computer to allocate space every time a program is run, and access to this variable requires time. Thus, the variable declaration is both space- and time-consuming every time a program is run.

In the jargon of C, #define specifies a macro which is handled by the compiler before a program is run.

Program couple-2.c utilizes the full, revised simulation, using a #define statement to clarify the number of couples considered and to avoid "magic numbers".

Adding the word const tells the compiler that the variable numberOfCouples must not be changed within the program. Any attempt to assign a new value to this variable will generate an error when the program is compiled.

created 24 July 2016 by Henry M. Walker revised 5 August 2016 by Henry M. Walker
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu .

CSC 115.005/006	Sonoma State University	Spring 2022
	CSC 115.005/006: Programming I
Instructor: Henry M. Walker Lecturer, Sonoma State University Professor Emeritus of Computer Science and Mathematics, Grinnell College

Notes: