Character-by-Character Input
As noted in the previous session on using scanf for input, processing of user input often involves two steps:
-
read data as a sequence of characters
-
process the character sequence as needed (e.g., converting to a number)
C's scanf function combines these two steps and is useful in many contexts. However, sometimes processing in a program may allow varying types of input — for example, the next element to be read may be a number or a word. In such cases, a programmer may need to separate these two input steps.
This reading explores character-by-character input in three parts:
Reading Characters
For some applications, a program reads one character and subsequent processing may depend upon the character read.
C provides three approaches for reading individual characters.
-
The C function getchar() reads and returns a single character — without skipping over white space .
Example:char ch; ch = getchar();
-
The c format for scanf reads an individual character. — again without skipping over white space.
Example:char ch; scanf ("%c", &ch);
-
The c format, preceded by a space, for scanf reads an individual character. — white space is skipped.
Example:char ch; scanf (" %c", &ch);
Data conversion
Some references declare int variables and sets them to char values, while the examples given here declare char variables set to char values. The local system automatically converts the int values that getchar and getc into char values when set to a char variable, so on the local system, these declarations are valid. Just be aware that some systems may not support this automatic conversion.
Example 1: A simple option-based program
Consider a program in which a user enters integers and real numbers, and the program computes sums in each category. In particular,
-
If a user enters 'i', then an integer should be typed on the same line, and the number added to an integer sum.
-
If a user enters 'r', then a real number should be typed on the same line, and the number should be added to the real sum.
-
If the user enters 'q', then the program should report the integer and real sums and quit.
In this application, user commands and input are on the same line, so a program must read an option and then decide what to do next.
Two versions of this program follow, illustrating alternatives in character processing.
Version 1: using getchar
The following program is available as option-prog-1.c.
/* program to read integers and real numbers and to compute sums in each category. if a user types 'i', an integer should follow on the same line if a user types 'r', a real number should follow on the same line if the user types 'q', the program should report the integer and real sums and quit version a: processing the user option with getchar */ #include <stdio.h> #include <ctype.h> int main () { /* variable declaration and initialization */ int int_sum = 0; double real_sum = 0.0; char ch; int i; double d; printf ("program to compute integer and real sums\n"); printf ("user options:\n"); printf (" enter i and an integer on the same line\n"); printf (" enter r and a real number on the line\n"); printf (" enter q to quit\n");
Challenges in this application arise when processing the end of each line of user input. In particular, after the user enters an 'i' or 'r' option, the user will type a number. White space might follow the number on the same line, and then the user will type a newline character '\n'.
-
If the program reads the number and then uses getchar or scanf("%c", &ch;), the character read will likely be the newline character (or any white space after the number). This white space must be read and discarded to find the desired option on the next line.
-
A user might decide to type a space or tab at the start of a line, before typing 'i', 'r', or 'q'. If this is to be allowed, then the program will need to skip over potential white space at the start of a line.
/* strip initial white space */ while (isspace (ch = getchar ())); /* allow both uppercase and lowercase options */ ch = tolower (ch);
C allows an assignment statement within a call to a function. Here,
-
The assignment ch = getchar () reads a character and stores the result in the ch variable.
-
The isspace function is called with value stored in ch as parameter.
/* process line */ while (ch != 'q') { if (ch == 'i') { scanf ("%i", &i); int_sum += i; } else if (ch == 'r') { scanf ("%lf", &d); real_sum += d; } else { printf ("invalid option: %c\n", ch); }
After the user option is identified, the program reads and processes an integer or a real number, as directed.
/* strip newline and any other white space */ while (isspace (ch = getchar ())); /* allow both uppercase and lowercase options */ ch = tolower (ch); } printf ("totals:\n"); printf (" integer sum: %d\n", int_sum); printf (" real sum: %lf\n", real_sum); return 0; }
Once the user types 'q', the main loop exists, and results are printed.
Version 2: using scanf with white space in format string
Normally, the %c option in a scanf does not skip white space. However, as noted on the session on using scanf for input, white space within a format string directs the computer to skip over any amount of white space until a non-white-space character is encountered. With this observation, the lines
/* strip initial white space */ while (isspace (ch = getchar ()));
in version 1 of the program might be replaced with
/* strip initial white space */ scanf (" %c", &ch);
The revised program is available as option-prog-2.c.
Reading Strings or Lines of Input
C also provides at least four approaches for reading strings of characters. Each function has its own special characteristics.
In what follows, each approach includes an example. The sample code utilizes the following macro and declaration
#define MaxLen 10 char str [MaxLen];
These declarations allow room for 10 characters in the str array, including the null character at the end.
-
The s format for scanf reads a sequence of non-white-space characters. Initial white space is skipped. Once reading into the string begins, reading continues until white space is encountered. As with all strings, a null character ('\0') is added at the end of the string.
Example:scanf ("%s", str);
Notes
-
Since str is an array, the variable represents a base address and no ampersand & is added.
-
Recall that a newline character is considered white space. Thus, reading continues until a space, a horizontal tab, a vertical tab, or a newline is encountered.
-
Warning: For the code shown, scanf reads characters until white space, without regard for the size of the char array. If more characters are read than fit in the array, the characters may overflow to fill data stored in other variables.
-
To limit the number of characters read, include a field width after the %.
Example:scanf ("%9s", str);
In this approach, the field width (e.g., 9) should be no larger than the array size - 1 (leaving room for the NULL character at the end).
-
The C function gets() reads an entire line or until end of file. As with scanf, a null character is added at the end of the string.
Example:gets (str);
-
Reading starts immediately; no white space is skipped when reading begins.
-
Warning: Just as scanf reads characters until white space, gets reads until the end of line or end of file, without regard for the size of the char array. If more characters are read than fit in the array, the characters may overflow to fill data stored in other variables.
-
The C function fgets() reads up to n characters from a line or until the end of a file. As with scanf, a null character is added at the end of the string.
Example:char str[10]; fgets (str, MaxLen, stdin);
-
stdin is the C variable for "standard input".
-
Up to MaxLen - 1 characters are read from the user, leaving room for a null character at the end.
-
Reading starts immediately; no white space is skipped when reading begins.
-
Since fgets controls the number of characters read, fgets (when properly used) can be considered "safe".
-
The C function getchar could be used in a loop to read character-by-character into an array until a specified condition (e.g., white space) holds. This approach allows explicit checking for array bounds.
Example:int i = 0; while ((i < MaxLen-1) && !isspace (ch = getchar ())) { str[i] = ch; i++; } /* insert null character at end of string */ str[i] = 0;
-
This code starts reading immediately; an earlier loop would be needed to skip initial white space.
-
Since this approach limits the number of characters read, fgets can be considered "safe".
Of the approaches identified, the warnings given indicate:
-
gets is dangerous and should never be used!
-
Similarly, scanf with %s, but without a field width, is dangerous and should never be used!
Altogether, in reading strings, consider using one of the following approaches:
-
scanf with %s and a field width
-
fgets
-
Writing your own loop that includes checking for access within array bounds.
Processing Strings
Once a string is read, processing can retrieve desired information. As with everything else in this reading, C provides at least two approaches.
-
C's function sscanf works much like scanf except that data are extracted from a string rather than from the terminal.
-
C contains two functions to extract numeric data from the start of a string:
-
atoi examines the first part of a string and returns an integer (here atoi stands for "alphanumeric to integer").
-
atof examines the first part of a string and returns a double (here atof stands for "alphanumeric to floating point (or double)".
-
Use the online manual for details on each of these functions:
man sscanf man atoi man atof
To illustrate these two basic approaches, consider the problem of reading from the terminal the 2-letter abbreviation for a state and its population. For the purposes of illustration, assume the first 21 characters on a line will give the state's 2-letter abbreviation and the characters following on the line will specify the current population. For example, any of the following lines might be encountered as input:
IA 3107000 NH 1330608 NC 10042802 TN 6512027
In this example, the 21-character field width for states provides substantial room for a state abbreviation, many of those characters (e.g. 19 or so characters) are spaces.)
Assuming no line would exceed 100 characters, a program might read the entire line of input with fgets:
char line [101]; /* allow room for the terminating NULL */ fgets (line, 101, stdin);
sscanf approach
Since the problem specifies a 2-letter abbreviation for each state, the desired data can be stored in a small char array together with an integer variable. With this set up, the sscanf then reads from the line in much the same way that scanf reads from the terminal:
char state [3]; int pop; sscanf (line, "%2s %d", state, &pop);
Using atoi (and/or atof)
Now suppose the 2-letter state abbreviation is replaced in the input by the full state name:
Iowa 3107000 New Hampshire 1330608 North Carolina 10042802 Tennessee 6512027
As this data highlights, a state name may include one or two words. Thus, reading using scanf with a format string "%21s" is not adequate. The "21" width will restrict input to the specified 21 characters. However, "%s" reads just one word, and there is no way to know whether the state being read will have one word or two.
Since state names are guaranteed to fall within the first 21 characters, processing could copy the state and then focus on the remaining characters.
char state [22]; int pop; /* copy first 21 characters, with NULL at end */ strncpy (state, line, 21); state [21] = '\0';
/* convert the string characters, starting at position 21, to an integer */ pop = atoi (line + 21);
-
Since line marks the start of the input array, line + 21 specifies the address of the character after the initial 21-character state field. Thus, line + 21 gives the base address for the char array containing the population.
-
atoi skips any initial white space and converts the first number it encounters. If text continues after the first number, it is ignored.
created 10 April 2008 by Henry Walker revised 6 March 2010 by Henry Walker revised 5 August 2011 by April O'Neill revised 28 October 2011 by Dilan Ustek minor editing 29 October 2011 by Dilan Ustek revised, expanded, and reformatted 1-2 June 2016 by Henry Walker |
![]() ![]() |
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu. |