CSC 115.005/006 Sonoma State University Spring 2022
Scribbler 2
CSC 115.005/006:
Programming I
Scribbler 2
Instructor: Henry M. Walker

Lecturer, Sonoma State University
Professor Emeritus of Computer Science and Mathematics, Grinnell College


Course Home References Course Details: Syllabus, Schedule, Deadlines, Topic organization MyroC Documentation Project Scope/
Acknowledgments

Notes:

Strings in C

Single characters arise naturally in some common application.

However, many [most?] applications involve sequences of characters: words, phrases, sentences, etc. Further, even the applications listed above sometimes use two-or three-character codes: grade "A-" or vitamin "B12". Often a sequence of characters is called a string.

As with many elements of C, the storage of strings is closely tied to the processing of string data. This reading examines both string storage and processing in several sections.

Notation in C

C uses different notation for characters and strings:

For example, in C, 'A' represents character, whereas "A" represents a string containing one letter. As this reading discusses, C stores these two entities differently.


String storage

Conceptually, a string may be considered as a sequence of characters. In C, this concept is implemented as an array of type char. For example, consider the literal string "Henry M. Walker". Behind the scenes, this string is stored in a char array. 'H' is the character with index 0; 'e' is the character with index 1, etc. Thus, the following code prints the letters H, r, and . on separate lines:

printf ("%c\n", "Henry M. Walker"[0]);
printf ("%c\n", "Henry M. Walker"[3]);
printf ("%c\n", "Henry M. Walker"[7]);

Within C, a literal string is enclosed in double quotes and cannot be changed. The string, "Henry M. Walker", is stored in a section of main memory reserved for constant data. Thus, this string can be accessed, characters within the string can be referenced, and the string can be copied. However, the string itself cannot be modified.


Since much text processing involves editing, a mechanism is needed to store sequences of characters in variables, for which data can be adjusted. Sometimes editing will change the number of characters in the sequence.

Some important capabilities for editing are illustrated in the following table.

starting string edited string notes
Henry M. Walker HENRY M. WALKER change some characters
Henry M. Walker Henry Walker remove characters
Henry M. Walker Henry MacKay Walker add characters


Allocated space

Although flexibility may be needed for text editing, C requires a compiler to allocate space when each variable is declared. To accommodate both flexibility and space allocation, string storage for variables typically involves three basic actions:


In calculating array storage for strings, note that every array must contain space for the desired characters plus space for the NULL at the end. Thus, in working with the string "Computing", a program must allocate at least 10 char locations — 9 for the letters within "Computing" and one for the final NULL.

Previously, the discussion of arrays in C indicated that array variables always represent the base address or starting point of an array. Given an array variable, however, there is no way to determine its size. By declaring a char array, we know the starting point for a sequence of characters, but we do not know the either the logical end of the character sequence or the size of the array. The NULL provides a mechanism to determine both the end and the number of characters.

According to the 2011 draft C standard, the NULL character must be encoded as the integer 0. Thus, the following two assignment statements are equivalent.

char ch;
ch = NULL;  //may generate compiler warning regarding type sizes
ch = 0;

In supporting arrays, many modern languages store several descriptive elements, including both the starting point and the size. Although access to this information may be convenient, storing multiple descriptive pieces of data takes space. C chooses to save space and utilize other mechanisms (e.g., the NULL character) for possible descriptive data.


Example 1

The following program string-example-1.c illustrates the declaration, initialization, and simple processing for strings.

/* Program begins with the string "Cs"
           saves it in a relatively large char array,
           converts all letters to upper case
           edits the string to yield "CS FOR ALL"
   Throughout, printing is accomplished with %s format
*/

Example 1 commentary

With the program specification given, the actual output of this code is:

original string:  Cs
capitalized string:  CS
final string:  CS FOR ALL

#include <stdio.h>
#include <ctype.h>

int main ()
{

Both the stdio.h and ctype.h libraries are used.


  /* save "Cs" string in 14-character array */
  char text [14] = "Cs";
  printf ("original string:  %s\n", text);
string initiallization

Declaration and Initialization

As with arrays of any type, the declaration of a char array indicates the type (char), the variable name (e.g., text), and the array size (e.g., 14).

A string array need not be initialized, in which case the characters stored in the array can be anything.

A string array can be initialized as part of the declaration.




  /* convert all letters to upper case */
  int i;
  for (i = 0; i < 14; i++)
    text[i] = toupper (text[i]);
  printf ("capitalized string:  %s\n", text);

Since each element within a char array is a character, all functions in ctype.h apply to these array elements.

printf in the stdio.h library uses %s format to print strings for char arrays, starting at the given array base address and continuing through the NULL character.


  /* add " FOR ALL" to string */
  for (i = 0; i < 8; i++)
    text[i+2] = " FOR ALL"[i];

One approach to edit "CS" to yield "CS FOR ALL" is to copy the characters from the string " FOR ALL", character by character. In this example,


  /* place NULL character at end */
  text[10] = 0;

  printf ("final string:  %s\n", text);

  return 0;
}

Every string must conclude with a NULL character. This code explicitly marks the end of the string. Another approach would be to copy the NULL at the end of FOR ALL" by extending the previous for loop to yield i < 9.


char arrays and char * variables

Consider the declaration

char text [14];

With this declaration,

Expanding this second point, text specifies the address of the first array element; in Example 1 above, text records the address of the character 'C' in the string "Cs". From our work on function parameters, the address of a character could be specified as having type char *.

Putting these pieces together, the declaration char text [14] yields three results:


Next, consider the declaration

char * place

With this code, place is identified as the location of some character, but no space for the character has actually been allocated. We cannot use place directly in a program, until the address stored refers to space allocated separately. The following code segment illustrates both some limitations and some capabilities possible with the char * type.

char text1 [8] = "abcde";
char text2 [6] = "wxyz";

char * place = text1;
printf ("first string:  %s\n", place);

place = text2;
printf ("second string:  %s\n", place);

In this example, the characters "abcde" and "wxyz" are stored in arrays text1 and text2, respectively, and a NULL character is added to the end of each string.

When place is declared and initialized, it refers to the base address for the text1 array, so the first printf statement prints abcde. Later, the assignment place = text2 causes the place variable to refer to the base address for the text2 array, so the first printf statement prints wxyz.

char * example

To extend this example, we add the following lines after the second printf statement:

place[2] = '7';
printf ("final string:  %s\n", place);
printf ("final text1:  %s\n", text1);
printf ("final text2:  %s\n", text2);

When this code is executed, place has been assigned the text2 base address. Thus, place[2] references the 'y' character in the text2 array, and the assignment changes 'y' to '7'. Note there is only one copy of the text2 array, and both text2 and place access it.

Moving to the printf statements, the text1 array remains unchanged, so the printing for that line yields abcde. However, the text2 string has changed to "wx7z". Since both text2 and place access this storage, the printing of both of these variables yields the same wx7z.

With the statement place = text2, we say that place is an alias for the variable text2. After the assignment, both of these refer to the same locations in memory, so a change in either yields a change for both.


Potential array overflow

In an earlier modules, we discovered that if arr is an array, then the computer does not check whether the expression arr[i] references a location within the array. Although this issue can arise with any array, the potential for trouble can be high when working with strings. As a simple example, we return to the first example in this reading.

  char text [14] = "CS";
  
  /* add " FOR ALL" to string, including the final NULL */
    for (i = 0; i < 9; i++)
        text[i+2] = " FOR ALL"[i];

In this example, all copied characters fit within the first 11 elements of the array. However, consider this variation:

  char text [14] = "Computing";
  
  /* add " FOR ALL" to string, including the final NULL */
    for (i = 0; i < 9; i++)
        text[i+9] = " FOR ALL"[i];

In this code, "Computing" has 9 characters (excluding the final NULL), and " FOR ALL" has 8 characters (excluding the NULL). Each of these strings fits nicely within the 14 characters allocated for text. However, the desired combined string will require 9 + 8 + 1 = 18 characters (including the NULL). Thus, the loop will be storing character data beyond the end of the text. Although we can only speculate what variables might be stored in these additional locations, but we know something else will be changed — perhaps the variable i, perhaps another variable, perhaps nothing of interest, or perhaps another part of the program).


As a second simple example, consider the following code segment:

char str [4] = "one";
str[3] = 's';
printf ("str:  %s\n", str);

In this code, all characters are stored within the array. However, printing the str array starts with the 'o' in array position 0, but then printing continues until a NULL — wherever that might be! In practice, string processing will cause array accesses beyond the allocated space, with unknown consequences!

string overflow

Both of these examples illustrate that string processing has the potential to change memory beyond the intended strings. Throughout processing, therefore, there is a need to check that array accesses stay within the space actually allocated for string arrays!

Security Warning

Several security and privacy problems with software arise when array references extend beyond allocated space. If an outsider can place data at locations throughout main memory, then the outsider might be able to change the behavior of a program or might be able to access private information!


String functions in C

C contains numerous library functions that support string processing. Documentation for strings and library functions is widely available. Two basic sources are particularly helpful:

The following table identifies commonly-used functions for char arrays. (Check man pages for additional functions!)

Category Functions for NULL-terminated strings Functions limiting processing length Functions for char arrays, ignoring NULLs
General: determine length of string strlen
General: initialize block of memory memset
String/character copying strcpy strncpy memcpy, memmove
String concatenation strcat strncat
String comparison strcmp strncmp memcmp
Search for character; return location strchr, strrchr, index, rindex, strpbrk memchr
Search for character; return index strspn, strcspn
Search for substring strstr
Break string into pieces strtok

Function Notes

The description and use of various string functions build upon many of the ideas discussed throughout this reading!


Example 2

The following program string-example-2.c illustrates several elements of string processing, including the use of char arrays and string functions.


/* program to compile information about people in one family */

#include <string.h>
#include <stdio.h>

int main ()
{

Program notes


  /* initialize two given_name names, as character arrays 
     with NULL at end 
   */
  char given_name1 [10] = {'H', 'e', 'n', 'r', 'y', 0};
  char given_name2 [10] = {'T', 'h', 'e', 'r', 'e', 's', 'a', 0};

  /* initial two more given_name names as strings
   */
  char given_name3 [10] = "Donna";
  char given_name4 [10] = "Barbara";

  /* initial common last name in family */
  char surname [10] = "Walker";

  /* add space before surname */
  char space_surname [20] = " ";
  strcat (space_surname, surname);

  /* compute full names */
  char full_name1 [20];
  char full_name2 [20];
  char full_name3 [20];
  char full_name4 [20];

  /* copy given names */
  strcpy (full_name1, given_name1);
  strcpy (full_name2, given_name2);
  strcpy (full_name3, given_name3);
  strcpy (full_name4, given_name4);

  /* combine given and surname (with space) to obtain full name */
  strcat (full_name1, space_surname);
  strcat (full_name2, space_surname);
  strcat (full_name3, space_surname);
  strcat (full_name4, space_surname);

  /* print full names in family */
  printf ("People in this family\n");
  printf ("     %s\n", full_name1);
  printf ("     %s\n", full_name2);
  printf ("     %s\n", full_name3);
  printf ("     %s\n", full_name4);

  /* determine which given_name comes first in alphabetical order 
     check name by name; 
     before indicates earliest name found during processing
  */
  char * before = given_name1;

  if (strcmp (before, given_name2) > 0)
    before = given_name2;

  if (strcmp (before, given_name3) > 0)
    before = given_name3;

  if (strcmp (before, given_name4) > 0)
    before = given_name4;

  printf ("Given name coming first in alphabetical order: %s\n",
          before);

  printf ("Number of characters in first alphabetical name: %d\n",
          strlen (before));

  return 0;
}


created 25 May 2016 by Henry M. Walker
expanded and edited 27 May 2016 by Henry M. Walker
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.