Computer Science
Character Coding Schemes

Character coding schemes use binary patterns to represent character data (text).

A common code in all computers ensures that information can easily be transferred between machines.

American Standard Code For Information Interchange (ASCII)

7 bits are used allowing 128 different characters to be represented. Each character in the standard set is assigned a number. The 7 bit binary representation of that number is used to represent that character.

The first 32 characters are control codes such as TAB or Line Feed. Digits, lower and upper case letters and standard symbols are represented. The extended ASCII uses the eighth bit and codes more characters and symbols.

A number will be stored differently depending on whether it is being displayed or used for calculations. The ASCII representation of the character data "23" is not the same as the pure binary pattern for this number using the same number of bits. The representation of this data within the computer system depends on the context and use of the data.

ASCII Table

It's worth looking carefully at the ASCII table. Look, for example at the codes for upper case letters and compare them with corresponding codes for lower case. There is a difference of only one bit. Another nice feature of the codes chosen is that you only need to change 2 bits to convert the character code of a number into the binary representation of that number.

DecimalBinaryHexadecimalCharacter
000000000NUL (null)
100000011SOH (start of heading)
200000102STX (start of text)
300000113ETX (end of text)
400001004EOT (end of transmission)
500001015ENQ (enquiry)
600001106ACK (acknowledge)
700001117BEL (bel)
800010008BS (backspace)
900010019TAB (horizontal tab)
100001010ALF (NL line feed, new line)
110001011BVT (vertical tab)
120001100CFF (NP form feed, new page)
130001101DCR (carriage return)
140001110ESO (shift out)
150001111FSI (shift in)
16001000010DLE (data link exchange)
17001000111DC1 (device control 1)
18001001012DC2 (device control 2)
19001001113DC3 (device control 3)
20001010014DC4 (device control 4)
21001010115NAK (negative acknowledge)
22001011016SYN (synchronous idle)
23001011117ETB (end of trans. block)
24001100018CAN (cancel)
25001100119EM (end of medium)
2600110101ASUB (substitute)
2700110111BESC (escape)
2800111001CFS (file separator)
2900111011DGS (group separator)
3000111101ERS (record separator)
3100111111FUS (unit separator)
32010000020SPACE
33010000121!
34010001022"
35010001123#
36010010024$
37010010125%
38010011026&
39010011127'
40010100028(
41010100129)
4201010102A*
4301010112B+
4401011002C,
4501011012D-
4601011102E.
4701011112F/
480110000300
490110001311
500110010322
510110011333
520110100344
530110101355
540110110366
550110111377
560111000388
570111001399
5801110103A:
5901110113B;
6001111003C<
6101111013D=
6201111103E>
6301111113F?
64100000040@
65100000141A
66100001042B
67100001143C
68100010044D
69100010145E
70100011046F
71100011147G
72100100048H
73100100149I
7410010104AJ
7510010114BK
7610011004CL
7710011014DM
7810011104EN
7910011114FO
80101000050P
81101000151Q
82101001052R
83101001153S
84101010054T
85101010155U
86101011056V
87101011157W
88101100058X
89101100159Y
9010110105AZ
9110110115B[
9210111005C\
9310111015D]
9410111105E^
9510111115F_
96110000060`
97110000161a
98110001062b
99110001163c
100110010064d
101110010165e
102110011066f
103110011167g
104110100068h
105110100169i
10611010106Aj
10711010116Bk
10811011006Cl
10911011016Dm
11011011106En
11111011116Fo
112111000070p
113111000171q
114111001072r
115111001173s
116111010074t
117111010175u
118111011076v
119111011177w
120111100078x
121111100179y
12211110107Az
12311110117B{
12411111007C|
12511111017D}
12611111107E~
12711111117F 

Unicode

The ASCII codes can now be considered to be a subset of unicode. In fact, the first 128 characters of both are the same. Many common character encoding systems, like UTF-8, are backwards-compatible with ASCII.

Unicode is so named because of the intention that it describes a universal character set. That is, it contains all possible characters for all languages and scripts. There are over 10000 characters in the Unicode character set. It uses 16 bits to represent each character.

The Unicode Consortium manages the standards for Unicode including the addition of new symbols where necessary.