Computer Science
Character Coding Schemes

Character coding schemes use binary patterns to represent character data (text).

A common code in all computers ensures that information can easily be transferred between machines.

American Standard Code For Information Interchange (ASCII)

7 bits are used allowing 128 different characters to be represented. Each character in the standard set is assigned a number. The 7 bit binary representation of that number is used to represent that character.

The first 32 characters are control codes such as TAB or Line Feed. Digits, lower and upper case letters and standard symbols are represented. The extended ASCII uses the eighth bit and codes more characters and symbols.

A number will be stored differently depending on whether it is being displayed or used for calculations. The ASCII representation of the character data "23" is not the same as the pure binary pattern for this number using the same number of bits. The representation of this data within the computer system depends on the context and use of the data.

ASCII Table

It's worth looking carefully at the ASCII table. Look, for example at the codes for upper case letters and compare them with corresponding codes for lower case. There is a difference of only one bit. Another nice feature of the codes chosen is that you only need to change 2 bits to convert the character code of a number into the binary representation of that number.

Decimal	Binary	Hexadecimal	Character
0	0000000	0	NUL (null)
1	0000001	1	SOH (start of heading)
2	0000010	2	STX (start of text)
3	0000011	3	ETX (end of text)
4	0000100	4	EOT (end of transmission)
5	0000101	5	ENQ (enquiry)
6	0000110	6	ACK (acknowledge)
7	0000111	7	BEL (bel)
8	0001000	8	BS (backspace)
9	0001001	9	TAB (horizontal tab)
10	0001010	A	LF (NL line feed, new line)
11	0001011	B	VT (vertical tab)
12	0001100	C	FF (NP form feed, new page)
13	0001101	D	CR (carriage return)
14	0001110	E	SO (shift out)
15	0001111	F	SI (shift in)
16	0010000	10	DLE (data link exchange)
17	0010001	11	DC1 (device control 1)
18	0010010	12	DC2 (device control 2)
19	0010011	13	DC3 (device control 3)
20	0010100	14	DC4 (device control 4)
21	0010101	15	NAK (negative acknowledge)
22	0010110	16	SYN (synchronous idle)
23	0010111	17	ETB (end of trans. block)
24	0011000	18	CAN (cancel)
25	0011001	19	EM (end of medium)
26	0011010	1A	SUB (substitute)
27	0011011	1B	ESC (escape)
28	0011100	1C	FS (file separator)
29	0011101	1D	GS (group separator)
30	0011110	1E	RS (record separator)
31	0011111	1F	US (unit separator)
32	0100000	20	SPACE
33	0100001	21	!
34	0100010	22	"
35	0100011	23	#
36	0100100	24	$
37	0100101	25	%
38	0100110	26	&
39	0100111	27	'
40	0101000	28	(
41	0101001	29	)
42	0101010	2A	*
43	0101011	2B	+
44	0101100	2C	,
45	0101101	2D	-
46	0101110	2E	.
47	0101111	2F	/
48	0110000	30	0
49	0110001	31	1
50	0110010	32	2
51	0110011	33	3
52	0110100	34	4
53	0110101	35	5
54	0110110	36	6
55	0110111	37	7
56	0111000	38	8
57	0111001	39	9
58	0111010	3A	:
59	0111011	3B	;
60	0111100	3C	<
61	0111101	3D	=
62	0111110	3E	>
63	0111111	3F	?
64	1000000	40	@
65	1000001	41	A
66	1000010	42	B
67	1000011	43	C
68	1000100	44	D
69	1000101	45	E
70	1000110	46	F
71	1000111	47	G
72	1001000	48	H
73	1001001	49	I
74	1001010	4A	J
75	1001011	4B	K
76	1001100	4C	L
77	1001101	4D	M
78	1001110	4E	N
79	1001111	4F	O
80	1010000	50	P
81	1010001	51	Q
82	1010010	52	R
83	1010011	53	S
84	1010100	54	T
85	1010101	55	U
86	1010110	56	V
87	1010111	57	W
88	1011000	58	X
89	1011001	59	Y
90	1011010	5A	Z
91	1011011	5B	[
92	1011100	5C	\
93	1011101	5D	]
94	1011110	5E	^
95	1011111	5F	_
96	1100000	60	`
97	1100001	61	a
98	1100010	62	b
99	1100011	63	c
100	1100100	64	d
101	1100101	65	e
102	1100110	66	f
103	1100111	67	g
104	1101000	68	h
105	1101001	69	i
106	1101010	6A	j
107	1101011	6B	k
108	1101100	6C	l
109	1101101	6D	m
110	1101110	6E	n
111	1101111	6F	o
112	1110000	70	p
113	1110001	71	q
114	1110010	72	r
115	1110011	73	s
116	1110100	74	t
117	1110101	75	u
118	1110110	76	v
119	1110111	77	w
120	1111000	78	x
121	1111001	79	y
122	1111010	7A	z
123	1111011	7B	{
124	1111100	7C	\|
125	1111101	7D	}
126	1111110	7E	~
127	1111111	7F

Unicode

The ASCII codes can now be considered to be a subset of unicode. In fact, the first 128 characters of both are the same. Many common character encoding systems, like UTF-8, are backwards-compatible with ASCII.

Unicode is so named because of the intention that it describes a universal character set. That is, it contains all possible characters for all languages and scripts. There are over 10000 characters in the Unicode character set. It uses 16 bits to represent each character.

The Unicode Consortium manages the standards for Unicode including the addition of new symbols where necessary.

MultiWingSpan

Computer Science

Data Representation

Program Design

Hardware & Software

Networks

Databases

Data Structures

Algorithms

Other

Computer Science
Character Coding Schemes

American Standard Code For Information Interchange (ASCII)

ASCII Table

Unicode

MultiWingSpan

Computer Science

Data Representation

Program Design

Hardware & Software

Networks

Databases

Data Structures

Algorithms

Other

Computer ScienceCharacter Coding Schemes

American Standard Code For Information Interchange (ASCII)

ASCII Table

Unicode

Computer Science
Character Coding Schemes