More Program Management: Functions, Value Parameters, and Assertions

Preliminary Note

This reading continues the discussion of a family-size simulation, begun in the introductory reading on simulation. Please review that discussion before proceeding.

Introduction

This reading expands the on-going discussion in this course of approaches to program-management when working to solve complex problems. As part of this discussion, the reading builds on the family-size simulation, begun in the introductory reading on simulation.

As with many computing applications, the family-size problem illustrates that a simple initial question may evolve into a series large and sophisticated problems. To anticipate the evolution of problems under study, this reading helps identify software-development approaches that help manage complexity and ensure program correctness. Some basic issues include:

Defining common code segments once, in a way that can be used throughout the program.
Separating the high-level framework of a solution from the numerous details, so that the details do not distract from the main ideas.
Incorporating error detection and/or correction within a program, so that errors in one part of a program do not cause harm later on (e.g., do not administer a drug to a patient if an error is detected).

The beginning segment of this course introduced simple functions as one mechanism to help organize code and address complexity. This reading expands the study of functions in several ways and introduces the concept of assertions.

Reading Outline

Simulation with Functions (Version 3)
Simulation with Functions and Parameters (Version 4)
Functions with Return Values (Version 5)
Functions with Assertions (Version 6)

Simulation with Functions (Version 3)

As a start in trying to manage complexity for our simulation, we might create a new procedure for the simulation to count children for a couple, using ideas from the introductory reading on program organization. In this approach, we create a procedure simulate_couple ():

/* procedure to simulate the number of children for one couple */
void simulate_couple ()
{
       /* couple starts with no children */
       int boys = 0;
       int girls = 0;

       /* couple has children */
       while ((boys == 0) || (girls == 0))
         {  
           if ((((double) rand()) / ((double) RAND_MAX)) < 0.5) 
              boys++;
           else
              girls++;
         }

       /* reporting of family size */
       printf ("    boys: %2d    girls: %2d    total:  %2d\n", 
          boys, girls, boys + girls);
}

The procedure begins with a header that includes both a comment describing the function's purpose and the formal C definition:

void simulate_couple ()
{
   // body of function goes here
}

The body of the function, placed within braces { }, describes the full simulation for a couple, previously contained within the main program.

The main part of the program then becomes quite simple:

  int couple;
  for (couple = 0; couple < numberOfCouples; couple++)
    {
      simulate_couple ();
    }

With void simulate_couple() defined, the main program simply calls the procedure.

The main program highlights the high-level structure of the overall program.
Function simulate_couple organizes details in an easily-found context that does not distract a programmer or user from the overall structure of the program solution.

Taking program organization one step further, we might put the main couple loop in its own simulate_several_couples procedure.

/* procedure to conduct simulation for several couples */
void simulate_several_couples ()
{
  int couple;
  for (couple = 0; couple < numberOfCouples; couple++)
    {
      simulate_couple ();
    }
}

Since this procedure uses simulate_couple, the procedure simulate_several_couples should be declared after simulate_couple. With this placement, the compiler will know what simulate_couple represents when the procedure is referenced within simulate_several_couples.

Similarly, simulate_several_couples is used within main, the procedure simulate_several_couples should be defined before main.

The full program is couple-3.c combines these elements into a working program.

Simulation with Functions and Parameters (Version 4)

Within this basic program structure, let us return to the simulation assumption that the gender of a child is equally likely to be a boy or a girl. The programs up to now utilize this assumption by generating a random number between 0.0 and 1.0 and then determining if the number is less than 0.5. To change this assumption, we could compare the random number with a different value. For example, if one assumed the percentage of boys was 51.2%, then 0.5 might be replaced by the decimal 0.512 in the simulation program.

To include this in the simulation of a couple, we change the function header to include a parameter fraction_boys in the parentheses after the procedure name. The revised procedure becomes:

According to an article "Is a pregnant woman's chance of giving birth to a boy 50 percent?" by Marc Weisskopf in the November 15, 2014, issue of Scientific American", "in most industrialized countries about 105 boys are born for every 100 girls, for a ratio of 1.05. ... This is often expressed as the percentage of boys among all births, or about 51.2 percent." The article goes on to identify adjustments in this percentage in different ethnic groups and according to the age of the parents.

/* procedure to simulate the number of children for one couple 
   parameter fraction_boys:  the percentage of boys born, 
                             expressed as a decimal fraction
*/
void simulate_couple (double fraction_boys)
{
       /* couple starts with no children */
       int boys = 0;
       int girls = 0;

       /* couple has children */
       while ((boys == 0) || (girls == 0))
         {  
           if ((((double) rand()) / ((double) RAND_MAX)) < fraction_boys) 
              boys++;
           else
              girls++;
         }

       /* reporting of family size */
       printf ("    boys: %2d    girls: %2d    total:  %2d\n", 
          boys, girls, boys + girls);
}

In this function,

a variable name, called a formal parameter, is placed in the parentheses after function name, and the purpose of the parameter is described in the comment before the function.
Since all variables in C have a type, the specification of each formal parameter includes both its name (e.g., fraction_boys) and its type (e.g., double).
Once defined, the parameter may be used anywhere, as desired, in the body of the function.

Once defined, the function simulate_couple may be called as many times as desired. For example, the main program might contain any or all of the following:

simulate_couple (0.500) // run simulate with 50% chance of a boy
simulate_couple (0.512) // run simulate with 51.2% chance of a boy
simulate_couple (0.450) // run simulate with 45% chance of boy

// call the function with successive values for fraction_boys
  double boy_fraction;
  for (boy_fraction = 0.5; boy_fraction <= 0.52; boy_fraction += 0.002)
    {
      simulate_couple (boy_fraction);
    }

In using this function,

Each time the function is called (e.g., in the main program), a value, called an actual parameter, is designated for fraction_boys, and the function is executed with that value.
These lines illustrate that, once defined in a function, a program can execute the code within the function several times — the same function body can be used with many different starting values.
Since the function is defined once, it can be tested thoroughly. Then a programmer can be confident that the functions works properly, and there is no danger that copying the code from one place to another in a program might have introduced errors.

For this simulation, we might place the loop for multiple couples within a separate function as well:

/* procedure to conduct simulation for several couples 
   parameter numCouples:     the number of couples to be simulated
   parameter fraction_boys:  the percentage of boys born, 
                             expressed as a decimal fraction
*/
void simulate_several_couples (int numCouples, double fraction_boys)
{
  printf ("simulation with fraction of boys:  %6.3lf\n", fraction_boys);
  int couple;
  for (couple = 0; couple < numCouples; couple++)
    {
      simulate_couple (fraction_boys);
    }
}

The function simulate_several_couples handles all work for a specified number of couples and for a designation fraction of boys.

The function header has two formal parameters, numCouples for the number of couples to be studied and fraction_boys for the decimal fraction of boys.
- The two formal parameters are identified with their names and types.
- Multiple parameters in the procedure header are separated by commas.
Within a function, additional variables (e.g., couple) may be declared and used.

With simulate_several_couples defined, the main program can run the simulation for a range of decimal fractions for the likelihood of boys.

#define numberOfCouples 20

  ...

  /* run simulation for several couples */
  double boy_fraction;
  for (boy_fraction = 0.5; boy_fraction <= 0.52; boy_fraction += 0.002)
    {
      simulate_several_couples (numberOfCouples, boy_fraction);
    }

Program couple-4.c prints simulation results for 20 couples for each of the decimal fractions 0.500, 0.502, 0.504, ..., 0.520.

Part of the 232 lines of output from a sample run of this program follows:

Simulation of family size
simulation with fraction of boys:   0.500
    boys:  1    girls:  2    total:   3
    boys:  1    girls:  1    total:   2
    ...
    boys:  5    girls:  1    total:   6
    boys:  1    girls:  1    total:   2
simulation with fraction of boys:   0.502
    boys:  1    girls:  1    total:   2
    boys:  2    girls:  1    total:   3
    ...
    boys:  1    girls:  5    total:   6
simulation with fraction of boys:   0.504
    boys:  2    girls:  1    total:   3
    ...
simulation with fraction of boys:   0.520
    boys:  1    girls:  1    total:   2
    ...
    boys:  8    girls:  1    total:   9
    ...
    boys:  1    girls:  2    total:   3

Several details apply when using a function with several parameters.

As with specifying parameters with printf, the order of the parameters matters. For simulate_several_couples, the first value supplied in a function call will be assigned to the first parameter (numCouples), and the second value will be assigned to the second parameter (fraction_boys). Thus, for the call
```
simulate_several_couples (15, 0.512)
```
the value 15 (the first actual parameter) will be assigned to the formal parameter numCouples, and the value 0.512 (the second actual parameter) will be assigned to fraction_boys.
The names of any actual parameters in a function call (e.g., numberOfCouples) may be the different as the name of a formal parameter (e.g., numCouples), but the variable name in the call also may be the same as the parameter. The key is the order of the variables in a function call, not their names.
The types of the values or variables in a call should match the types of the corresponding formal parameters. As with arithmetical expressions, the compiler might convert an int value to a double, if the parameter is declared as a double . However, the compiler may generate a warning or error, if a float or double is specified in a call, where the parameter is declared as an int.

Functions with Return Values (Version 5)

Although the previous simulations provide considerable raw data to address the original question of family size, at least two difficulties may confront a user.

The volume of output may yield challenges in analyzing the results and reaching conclusions.
For each percentage of boys, the program reports results for only 20 couples. Without knowing the possible variability of family sizes, a user may not know whether a different simulation would produce substantially different results.

To address these challenges, the program might be modified as follows:

Rather than report the family size for each couple, the program might calculate the average number of children and the maximum number of children among all couples with a given percentage for boys.
The simulation might be expanded to 1000 or more couples, given the percentage for boys.

We consider each of these challenges in turn.

Return the Result of a Simulation

To determine an average or a maximum, the simulation for a couple should report the number of children rather than print the results of each simulation.

In C, a function can return a value. For example, in the family size simulation, simulate_couple might return the total number of children for a given couple.

/* procedure to simulate the number of children for one couple 
   parameter fraction_boys:  the percentage of boys born, 
                             expressed as a decimal fraction
   return:  the total number of children for the couple
*/
int simulate_couple (double fraction_boys)
{
       /* couple starts with no children */
       int boys = 0;
       int girls = 0;

       /* couple has children */
       while ((boys == 0) || (girls == 0))
         {  
           if ((((double) rand()) / ((double) RAND_MAX)) < fraction_boys) 
              boys++;
           else
              girls++;
         }

       /* report the family size */
      return boys + girls;
}

Since this function will return a value, the purpose of that value is reflected in the opening comment, and void is replaced by int in the function header:
```
int simulate_couple (double fraction_boys)
```
Here:
- int indicates the result of the function will be an integer that a calling environment can use (just as rand returned a pseudo-random integer).
- Previously, void indicated that the function would not return any type of result.
Within the function, a return statement indicates what value to send back to the calling function.
- The type of expression after return must agree with the specification in the header. In this case, boys+girls is an integer, agreeing with int in the function header.
- Once execution reaches return, the computer stops work in the function and goes back to where the function was called. Additional code might be present after the return statement, but it will not be executed once return is encountered.

With this adjustment to simulate_couple, the procedure simulate_many_couples can use the result in several ways.

/* procedure to conduct simulation for several couples 
   parameter numCouples:     the number of couples to be simulated
   parameter fraction_boys:  the percentage of boys born, 
                             expressed as a decimal fraction
*/
void simulate_several_couples (int numCouples, double fraction_boys)
{
   int couple;
  int total_children = simulate_couple (fraction_boys);
  int max_children = total_children;
  for (couple = 1; couple < numCouples; couple++)
    {
      int couple_children = simulate_couple (fraction_boys); 

      /* accumulate total number of children */
      total_children += couple_children;

In the line
```
      int couple_children = simulate_couple (fraction_boys); 
```
the function simulate_couple is called, the total number of children computed, and the result returned and stored in the variable couple_children.
To determine the average number of children for many couples, we need to compute the total number of children over all couples and divide by the number of couples.
- Variable total_children keeps track of the total number of children for all couples.
- total_children starts at the number of children for the first couple.
- The number of children for each subsequent couple couple_children is added to total_children as the result of each couple is determined.
- After all couples are evaluated, the average is the total number of children divided by the number of couples. In this case, a decimal average is desired, even though both the total number of children and the number of couples are integers. Casting total_children to a double requires the division to be performed as a double

      /* check for new maximum */
      if (max_children < couple_children)
        max_children = couple_children;
    }

  double avg_children = ((double) total_children) / numCouples;
  printf (" fraction boys:  %6.3lf     average:  %6.2lf     maximum:  %3d\n",
          fraction_boys, avg_children, max_children);
  
}

The fully revised program is available as couple-5.c

The variable max_children is used to compute a maximum.
- At the start, the maximum is the number of children for the first couple.
- Thereafter, the number of children for the next couple is compared with the previous maximum. If the next couple had more children, the maximum is updated.
For simplicity, the results of the simulation of the couples is printed on one line.

Sample output from one run of the revised program follows.

Simulation of family size with 1000 couples
 fraction boys:   0.500     average:    3.10     maximum:   13
 fraction boys:   0.502     average:    3.04     maximum:   12
 fraction boys:   0.504     average:    3.05     maximum:   12
 fraction boys:   0.506     average:    2.98     maximum:   11
 fraction boys:   0.508     average:    2.99     maximum:   14
 fraction boys:   0.510     average:    3.05     maximum:   15
 fraction boys:   0.512     average:    3.03     maximum:   13
 fraction boys:   0.514     average:    2.95     maximum:   10
 fraction boys:   0.516     average:    2.94     maximum:   14
 fraction boys:   0.518     average:    3.07     maximum:   10
 fraction boys:   0.520     average:    2.97     maximum:   12

Although these results are open to additional exploration, one the surface it seems that small changes in the percentage of boys have little effect on family size; the average number of children seems to be about 3.

Also, many runs of the program suggest that the maximum number of children over 1000 couples typically is between 10 and 15. However, occasionally, numbers as high as 22 are reported!

Functions with Assertions (Version 6)

Before completing this discussion of functions with parameters and/or return values, some comments regarding correctness are in order.

The family size simulation depended upon a couple having some reasonable chance of having a boy and a girl. In particular, although nothing was stated explicitly, the formal parameter fraction_boys is assumed to be between 0.0 and 1.0, with neither 0.0 or 1.0 allowed. Further, family sizes could be expected to be extremely large if fraction_boys was close to either of these boundary values.

With this in mind, it seems appropriate to place bounds on the allowed limits for fraction_boys. For example, we might want to require that fraction_boys is between 0.33 and 0.66. Toward this end, we might make such an assumption explicit in the header comment for simulate_couple:

/* library for the assert function to check 
   assertions/pre-conditions */
#include <assert.h>

#define numberOfCouples 1000

/* procedure to simulate the number of children for one couple 
   parameter fraction_boys:  the percentage of boys born, 
                             expressed as a decimal fraction
   return:  the total number of children for the couple
   pre-condition:  0.33 <= fraction_boys <= 0.66
*/
int simulate_couple (double fraction_boys)
{

`assert.h`

As a program runs, appropriate processing may depend upon certain assumptions being met. (In the family-size simulation, the probability of a girl must not be either 0.0 or 1.0, for in either case only one gender of children is possible.) As will be discussed later in this reading, C's assert.h library contains a capability assert that allows assumptions to be checked as the program runs.

Assertions, Pre-conditions, and Post-conditions

Generally in solving problems, it is important to develop clear statements of what initial information may be available, what results are wanted, and what circumstances should be true at various points in processing. In computing jargon,

An assertion is a statement about variables at any specified point in processing.
Pre-conditions are constraints on the types or values of variables at the start of processing or at the start of a function or procedure.
Post-conditions specify what should be true at the end of a program, function, or procedure.

Thus, a pre-condition is an assertion about variable values at the start of processing, and a post-condition is an assertion at the end of a code segment.

It is good programming practice to state the pre- and post-conditions for each procedure or function as comments.

Pre- and Post-Conditions as a Contract

One can think of pre- and post-conditions as a type of contract between the developer of a code segment or function and the user of that function.

The user of a function is obligated to meet a function's pre-conditions when calling the function.
Assuming the pre-conditions of a function are met, the developer is obligated to perform processing that will produce the specified post-conditions.

As with a contract, pre- and post-conditions also have implications concerning who to blame if something goes wrong.

The developer of a function should be able to assume that pre-conditions are met.
If the user of a function fails to satisfy one or more of its pre-conditions, the developer of a function has no obligations whatsoever — the developer is blameless if the function crashes or returns incorrect results.
If the user meets the pre-conditions, then any errors in processing or in the function's result are the sole fault of the developer.

To Test Pre-Conditions or Not?

Although the user of a function has the responsibility for meeting its pre-conditions, computer scientists continue to debate whether functions should check that the pre-conditions actually are met. Here, in summary, are the two arguments.

Pre-conditions should always be checked as a safety matter; a function should be sufficiently robust that it will detect variances in incoming data and respond in a controlled way.
Since meeting pre-conditions is a user's responsibility, a developer should not add complexity to a function by handling unnecessary cases; further, the execution time should not be increased for a responsible user just to check situations that might arise by careless users.

Actual practice tends to acknowledge both perspectives in differing contexts. More checking is done when applications are more critical. As an extreme example, in software to launch a missile or administer drugs to a patient, software may perform extensive tests of correctness before taking an action — the cost of checking may be much less than the consequences resulting from unmet pre-conditions.

As a less extreme position, it is common to check pre-conditions once — especially when checking is relatively easy and quick, but not to check repeatedly when the results of a check can be inferred.

The `assert` function in C

At various points in processing, we may want to check that various pre-conditions or assertions are being met. C's assert function in the assert.h library serves this purpose.

The assert function takes a Boolean expression as a parameter. If the expression is true, processing continues as planned. However, if the expression is false, assert discovers the undesired condition, and processing is halted with an error message.

Applying assert to the simulate_couple function in the family-size simulation, we might add the following at the very beginning of the function.

  /* actively enforce the pre-condition */
  assert ((0.33 <= fraction_boys) && (fraction_boys <= 0.66));

With this addition, simulate_couple will check its pre-condition as it starts processing. If this assertion is valid, the function will proceed as desired. However, if this function is used with fraction_boys outside the specified range, the program will terminate and an error will be reported.

In the case of this simulation, a value for fraction_boys outside the specified range might lead to a very long loop or possibly an infinite loop. The assert test avoids such circumstances.

Program couple-6.c includes the assert statement within the full family-size simulation.

For the family-size simulation, values for fraction_boys outside the specified range might be annoying, but a user can stop the program manually with "control-C". However, in some other applications, identified errors might be dangerous and require user action. For example, if a program were administering drugs in a hospital or launching a missile, any identified error likely should lead to a stoppage of the program and a careful review by people on the scene!

Acknowledgment

Material on assertions, pre-conditions, and post-conditions is based on material for CSC 161 at Grinnell College, created 18 May 2008 by Henry M. Walker, revised 7 February 2010, and edited with updated format 4 November 2014.

created 24 July 2016 by Henry M. Walker revised 5 August 2016 by Henry M. Walker
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu .

CSC 115.005/006	Sonoma State University	Spring 2022
	CSC 115.005/006: Programming I
Instructor: Henry M. Walker Lecturer, Sonoma State University Professor Emeritus of Computer Science and Mathematics, Grinnell College

Notes: