Computer Science
Real Numbers

Standard Form

Very large or very small denary numbers are often written in standard form. This clearly saves writing out a lot of digits. Standard form is a number between 1 and 10 multiplied by a power of 10.

For example,

5.67 x 103 = 5.67 x 1000 = 5670
6.23 x 10-2 = 6.23 x 0.01 = 0.0623

To convert a decimal number to standard form, move the decimal point so that the number lies between 1 and 10. The power of 10 is the number of places the decimal point was moved, positive if moved to the left, negative if moved to the right.

Real numbers are stored in the computer using a similar principle to standard form. Instead of using a power of 10 however, they are stored using a power of 2. The decimal part of the number is known as the mantissa, and the power of 2 to which it is raised is known as the exponent. For simplicity in the examples given will use 16 bits. In practice real numbers are stored using a minimum of 32 bits. The greater the number of bits for the mantissa, the greater the precision that the number can be stored. The greater the number of bits for the exponent the greater the range of the number.

Our 16 bit numbers will use 10 bits for the mantissa and 6 bits for the exponent.

Converting From Denary To Two's Complement Format

6.5

Convert the absolute value of the decimal number to fixed point binary.110.1
Move the binary point so that the first digit is non-zero..1101 (3 places to the left)
Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits.0110100000
If the original number was negative, convert it to two's complement form. This makes the mantissa.0110100000
Convert the number of places the binary point moved into a 6 bit binary number.000011
If the point was moved to the right, convert the number to two's complement form. This makes the exponent.000011
The whole floating point number is the mantissa followed by the exponent.0110100000000011

0.125

Convert the absolute value of the decimal number to fixed point binary.0.001
Move the binary point so that the first digit is non-zero..1 (moved 2 places to the right)
Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits.0100000000
If the original number was negative, convert it to two's complement form. This makes the mantissa.0100000000
Convert the number of places the binary point moved into a 6 bit binary number.000010
If the point was moved to the right, convert the number to two's complement form. This makes the exponent.111110
The whole floating point number is the mantissa followed by the exponent.0100000000111110

-42.75

Convert the absolute value of the decimal number to fixed point binary.101010.11
Move the binary point so that the first digit is non-zero..10101011 (moved 6 places to the left)
Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits.0101010110
If the original number was negative, convert it to two's complement form. This makes the mantissa.1010101010
Convert the number of places the binary point moved into a 6 bit binary number.000110
If the point was moved to the right, convert the number to two's complement form. This makes the exponent.000110
The whole floating point number is the mantissa followed by the exponent.1010101010000110

-0.1875

Convert the absolute value of the decimal number to fixed point binary.0.0011
Move the binary point so that the first digit is non-zero..11 (moved 2 places to the right)
Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits.0110000000
If the original number was negative, convert it to two's complement form. This makes the mantissa.1010000000
Convert the number of places the binary point moved into a 6 bit binary number.000010
If the point was moved to the right, convert the number to two's complement form. This makes the exponent.111110
The whole floating point number is the mantissa followed by the exponent.1010000000111110

Converting From Two's Complement Format To Denary

0100010000000011

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1.000011 = 3
If the mantissa was negative, perform two's complement to convert to a positive number.0100010000
Replace the first zero with a binary point..100010000
Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative).100.010000
Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative.4.25

0111000000111110

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1.111110 = -2
If the mantissa was negative, perform two's complement to convert to a positive number.0111000000
Replace the first zero with a binary point..111000000
Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative)..00111000000
Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative.0.21875

1001111110000111

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1.000111 = 7
If the mantissa was negative, perform two's complement to convert to a positive number.0110000010
Replace the first zero with a binary point..110000010
Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative).1100000.1
Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative.-96.5

IEEE Standard For Floating Point

This system uses 32 bits to represent a number. The bit pattern is slightly different to Two's Complement format. From the left, the bit pattern represents,

1 - Sign Bit
8 - Exponent stored in excess 127 mode (127 is added to the exponent before it is stored
23 - Mantissa (a leading 1-bit is implied with a binary point after it

Minifloat Format

Minifloat format is a 16 bit representation of real numbers. It uses a sign bit, a 5-bit excess 15 mode exponent, 10 mantissa bits with an implied leading 1-bit and binary point.

Normalisation Of Floating Point Numbers

Precision

The precision of a floating point number depends on the number of bits used to represent the mantissa. To illustrate this point, consider the following denary number,

42 012 000

We can express this in standard form as .42012 x 108 using 5 digits for the mantissa. If we only use 4 digits for the mantissa, we get .4201 x 108 and lose some accuracy.

If we put the decimal point in another place, say .042012 x 109, we need more digits for the mantissa. Systems for representing numbers need to allow the maximum precision for a given number of digits stored.

With binary floating point, numbers are normalised to allow this to happen.

Example 1

Place 0000100000000110 in normalised form.

0000100000000110 = .000100000 x 26

To normalise the number the decimal point should be moved in front of the first non-zero bit. If the decimal point is moved n places to the right then the power of 2 is reduced by n.

.000100000 x 26 = .100000 x 23 = 0100000000000011

Example 2

Place 1110111000000011 in normalised form.

1110111000000011 = -.00100100 x 23

-.00100100 x 23 = -.100100 x 21

-.100100 x 21 = -0100100000000001

-0100100000000001 = 1011100000000001

Key Facts

Normalised numbers always start with 2 different bits (01 for positive, 10 for negative). The mantissa of a positive number always lies between 0.5 and 1, and the mantissa of a negative number always lies between -0.5 and -1

Normalisation is used to,

  • ensure the maximum precision for a given number of bits
  • ensure that there is only one representation of a number