Section 2.13 Storing Characters
Almost all programs perform a great deal of text string manipulation. Text strings are made up of arrays of characters. The first program you wrote was probably a “Hello world” program. If you wrote it in C, you used a statement like:printf("Hello world\n");
cout << "Hello world" << endl;
store each of the characters in a location in memory where the control unit can access them, and
generate the machine instructions to write the characters on the screen.
man ascii
.
bit | bit | bit | bit | ||||
pat. | char | pat. | char | pat. | char | pat. | char |
NUL (Null) |
(space) |
@ |
` |
||||
SOH (Start of Heading) |
! |
A |
a |
||||
STX (Start of Text) |
" |
B |
b |
||||
ETX (End of Text) |
# |
C |
c |
||||
EOT (End of Transmit) |
$ |
D |
d |
||||
ENQ (Enquiry) |
% |
E |
e |
||||
ACK (Acknowledge) |
& |
F |
f |
||||
BEL (Bell) |
' |
G |
g |
||||
BS (Backspace) |
( |
H |
h |
||||
HT (Horizontal Tab) |
) |
I |
i |
||||
LF (Line Feed) |
* |
J |
j |
||||
VT (Vertical Tab) |
+ |
K |
k |
||||
FF (Form Feed) |
, |
L |
l |
||||
CR (Carriage Return) |
- |
M |
m |
||||
SO (Shift Out) |
. |
N |
n |
||||
SI (Shift In) |
/ |
O |
o |
||||
DLE (Data-Link Escape) |
0 |
P |
p |
||||
DC1 (Device Control 1) |
1 |
Q |
q |
||||
DC2 (Device Control 2) |
2 |
R |
r |
||||
DC3 (Device Control 3) |
3 |
S |
s |
||||
DC4 (Device Control 4) |
4 |
T |
t |
||||
NAK (Negative ACK) |
5 |
U |
u |
||||
SYN (Synchronous idle) |
6 |
V |
v |
||||
ETB (End of Trans. Block) |
7 |
W |
w |
||||
CAN (Cancel) |
8 |
X |
x |
||||
EM (End of Medium) |
9 |
Y |
y |
||||
SUB (Substitute) |
: |
Z |
z |
||||
ESC (Escape) |
; |
[ |
{ |
||||
FS (File Separator) |
< |
\ |
| |
||||
GS (Group Separator) |
= |
] |
} |
||||
RS (Record Separator) |
> |
^ |
~ |
||||
US (Unit Separator) |
? |
_ |
DEL |
0
–9
, are in a contiguous sequence in the code, a
–z
, and of the upper case characters, A
–Z
. Notice that the lower case alphabetic characters are numerically higher than the upper case.
The codes in the left-hand column of Table 2.13.1, ctrl-d
generates an EOT
(End of Transmission) character.
ASCII codes are usually stored in the rightmost seven bits of an eight-bit byte. The eighth bit (the highest-order bit) is called the parity bit. It can be used for error detection in the following way. The sender and receiver would agree ahead of time whether to use even parity or odd parity. Even parity means that an even number of ones is always transmitted in each character; odd parity means that an odd number of ones is transmitted. Before transmitting a character in the ASCII code, the sender would adjust the eighth bit such that the total number of ones matched the even or odd agreement. When the code was received, the receiver would count the ones in each eight-bit byte. If the sum did not match the agreement, the receiver knew that one of the bits in the byte had been received incorrectly. Of course, if two bits had been incorrectly received, the error would pass undetected, but the chances of this double error are remarkably small. Modern communication systems are much more reliable, and parity is seldom used when sending individual bytes.
In some environments the high-order bit is used to provide a code for special characters. A little thought will show you that even all eight bits will not support all languages, e.g., Greek, Russian, Chinese. The Unicode character standard was first introduced in 1987 and has evolved over the years. It includes additional bytes so it can handle other alphabets. Unicode is backwards compatible with ASCII. We will only use ASCII in this book.
A computer system that uses an ASCII video system can be programmed to send a byte to the screen. The video system interprets the bit pattern as an ASCII code (from Table 2.13.1) and displays the corresponding character on the screen.
Getting back to the text string, "Hello world\n"
, the compiler would store this as a constant array of characters. There needs to be a way to specify the length of this array. In a C-style string this is accomplished by using the sentinel character NUL
at the end of the string. So the compiler must allocate thirteen bytes for this string. An example of how this string is stored in memory is shown in Figure 2.13.2. Notice that C uses the LF
character as a single newline character even though the C syntax requires that the programmer write two characters, “\n
”. The area of memory shown includes the three bytes immediately following the text string.
Address | Contents |