Data Presentation

1           Data representation

The representation of data types is always a problem, as different computer systems use different ways to store and represent data. For example, the PC, which is based on Intel microprocessors, uses the little endian approach of representing a floating-point value. The little endian form starts with the least-significant byte in the lowest memory location, and the most-significant byte in the highest location. The big endian form, as used with Motorola-based systems, always starts with the high-order byte and ends with the lowest-order byte. For example with little endian, the value to store the 16-bit integer values of 4 (0000 0000 0000 0100b), 5,241 (0001 0100 0111 1001 b) and 26,152 (0110 0110 0010 1000b) would be:

Memory location Contents (hex) Contents (binary) Value
00 04 0000 0100 4
01 00 0000 0000
02 79 0111 1001 5,241
03 14 0001 0100
04 28 0010 1000 26,152
05 66 0110 0110

Whereas, in big endian, it would be stored as:

Memory location Contents (hex) Contents (binary) Value
00 00 0000 0000 4
01 04 0000 0100
02 14 0001 0100 5,241
03 79 0111 1001
04 66 0110 0110 26,152
05 28 0010 1000

Thus a program which has been written for a PC would incorrectly read data which has been written for a big endian program (typically for a UNIX workstation), and vice versa. Another particular problem is that different computer systems represent data (such as numeric values) in different formats. For example, an integer can be represented with either 16 bits, 32 bits, 64 bits, or even, 128 bits. The more bits that are used, the larger the integer value that can be represented.

All these problems highlight the need for a conversion technique that knows how to read the value from memory, and convert it into a standard form that is independent of the operating system or the hardware of the computer. This is the function of eXternal Data Representation (XDR), which represents data in a standard format. In XDR the basic data types are:

  • Unsigned integer and signed integer. An unsigned and signed integer uses a 32-bit value. The unsigned value uses the range from 0 to 232–1 (4,294,967,295), whereas the signed integer uses 2’s complement which gives a range of –2,147,483,648 (1111 1111 1111 … 1111 1111) to +2,147,483,647 (0111 1111 1111 … 1111).
  • Single-precision floating point. A single-precision floating-point value uses a 32-bit IEEE format of a floating-point value. An example is given next. The range is from ±3.4´10-38 to ±3.4´1038.
  • Double-precision floating point. A double-precision floating-point value uses a 64-bit IEEE format of a floating-point value. The range is from ±1.7´10-308 to ±1.7´10308.
  • String. A string is represented with a number of bytes. The first four bytes define the number of ASCII characters defined. For example, if there were four characters in the string then the first four bytes would be: 0, 0, 0, 4, followed by the four characters in the string. Note that this differs from the way that the C programming language represents strings, as C uses the NULL ASCII character to define the end of a string.

A1.1.1    Negative numbers

Signed integers use a notation called 2’s complement to represents negative values. In this representation the binary digits have a ‘1’ in the most significant bit column if the number is negative, else it is a ‘0’. To convert a decimal value into 2’s complement notation, the magnitude of the negative number is represented in binary form. Next, all the bits are inverted and a ‘1’ is added. For example to determine the 16-bit 2’s complement of the value –65, the following steps are taken:

+65            00000000 01000001
invert         11111111 10111110
add 1          11111111 10111111

Thus, –65 is 11111111 1011111 in 16-bit 2’s complement notation. Table A1.4 shows that with 16 bits the range of values that can be represented in 2’s complement is from –32 768 to 32 767 (that is, 65 536 values).

Two’s complement is also useful in subtraction operations, where the value to be subtracted is converted in its negative form, and then added to the value it is to be subtract from. For example to subtract 42 from 65, first 42 is converted into 2’s complement (that is, –42) and added to the binary equivalent of 65. The result gives a carry into the sign bit and a carry‑out (these are ignored).

65       0100 0001
–42       1101 0110
= 23

  (1) 0001 0111

For a 16‑bit signed integer can vary from –32768 (1000000000000000) to 32767 (0111111111111111).

Table A1.4 16-bit 2’s complement notation

Decimal  2’s complement 
–32 768–32 767::::–2–101

2

::::

32 766

32 767

10000000 00000000

10000000 00000001

::::

11111111 11111110

11111111 11111111

00000000 00000000

00000000 00000001

00000000 00000010

::

01111111 11111110

01111111 11111111

A1.1.2    Hexadecimal and octal numbers

Often it is difficult to differentiate binary number from decimal numbers (as one hundred and one can be seen as 101 in binary). A typical convention is to use a proceeding b for binary numbers, for example 010101111010b and 101111101010b are binary numbers. Hexadecimal and octal are often used to represent binary digits, as they are relatively easily to convert to and from binary. Table A1.5 shows the basic conversion between decimal, binary, octal and hexadecimal numbers. A typical convention is to append a hexadecimal value with an ‘h’ at the end of a hexadecimal numbers (and octal number with an o). For example, 43F1h is a hexadecimal value whereas 4310o is octal.

To represent a binary digit as a hexadecimal value, the binary digits are split into groups of four bits (starting from the least significant bit). A hexadecimal equivalent value then replaces each of the binary groups. For example, to represent 0111 0101 1100 0000b the bits are split into sections of four to give:

Binary

0111

0101

1100

0000

Hex 7

5

C

0

Thus, 75C0h represents the binary number 0111010111000000b. To convert from decimal to hexadecimal the decimal value is divided by 16 recursively and each remainder noted. The first remainder gives the least significant digit and the final remainder the most significant digit. For example, the following shows the hexadecimal equivalent of the decimal number 1103:

16

1103

68

r F   <<< LSD (least significant digit)

4

r 4

0

r 4  <<< MSD (most significant digit)

Thus, the decimal value 1103 is equivalent to 044Fh.

Table A1.5Decimal, binary, octal and hexadecimal conversions

Decimal Binary Octal Hex
0 0000 0 0
1 0001 1 1
2 0010 2 2
3 0011 3 3
4 0100 4 4
5 0101 5 5
6 0110 6 6
7 0111 7 7
8 1000 10 8
9 1001 11 9
10 1010 12 A
11 1011 13 B
12 1100 14 C
13 1101 15 D
14 1110 16 E
15 1111 17 F

A1.1.3    Floating-point representation

A single-precision floating-point value uses 32 bits, where the most-significant bit represents the sign bit (S), the next eight bits represents the exponent of the number in base 2, minus 127 (E). The final 23 bits represent the base-2 fractional part of the number’s mantissa (F).  The standard format is:

Value = -1S´2(E–127) ´1.F

For example:

1.23                    = 3F9D 70A4h

= 0 01111111 00111010111000010100100b

= -10´2(127–127) ´ (1+ 2–3+2–4+2–5+2–9+2–10+2–11+2–16+2–18+2–21)

–5.67                 = C0B5 70A4h

= 1 10000001 0110101011100010100100b

= -11´2(129–127) ´ (1+ 2–2+2–3+2–5+2–7+2–9+2–10+2–11+2–15+2–17+2–20)

100.442              = 42C8 E24Eh

= 0 10000101 10010001110001001001110b

= -10´2(133–127) ´ (1+ 2–1+2–4+2–8+2–9+2–10+2–14+2–17+2–20+2–21+2–22)

A single-precision floating-point value uses 64 bits, where the most-significant bit represents the sign bit (S), the next eight bits represents the exponent of the number in base 2, minus 1023 (E). The final 52 bits represent the base-2 fractional part of the number’s mantissa (F).

A1.1.4    ASCII

As we have seen, there are standard formats for integers and floating-point values. There are many standards for the representation of characters (known as character sets), but the most common one is known as ASCII. In its standard form it uses a 7-bit binary code to represent characters (letters, giving a range of 0 to 127). This is rather limited as it does not support symbols such as Greek lines, and so. To increase the number of symbols which can be represented, extended ASCII is used which has an 8-bit code.

Appendix 4 shows the standard ASCII character set (in binary, decimal, hexadecimal and also as a character). For example the ‘a’ character has the ASCII binary representation of 0110 0001b (61h), and the ‘A’ character has the binary representation of 0100 0001 (41h). One thing that can be noticed is that the upper and lower case versions of the letters (‘a’ to ‘z’) only differ by a single bit (the 6th bit, from the right-hand side).

In 1963, ANSI defined the 7-bit ASCII standard code for characters. At the same time IBM had developed the 8-bit EBCDIC code which allowed for up to 256 characters, rather than 128 characters for ASCII. It is thought that the 7-bit code was used for the standard as it was reckoned that eight holes in punched paper tape would weaken the tape. Thus the world has had to use the 7-bit ASCII standard, which is still popular in the days of global communications, and large-scale disk storage.

Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char Dec  Oct   Hex
-------------------------------------------------------------------------------------
(nul)   0 0000 0x00 | (sp)   32 0040 0x20 | @      64 0100 0x40 | `      96 0140 0x60
(soh)   1 0001 0x01 | !      33 0041 0x21 | A      65 0101 0x41 | a      97 0141 0x61
(stx)   2 0002 0x02 | "      34 0042 0x22 | B      66 0102 0x42 | b      98 0142 0x62
(etx)   3 0003 0x03 | #      35 0043 0x23 | C      67 0103 0x43 | c      99 0143 0x63
(eot)   4 0004 0x04 | $      36 0044 0x24 | D      68 0104 0x44 | d     100 0144 0x64
(enq)   5 0005 0x05 | %      37 0045 0x25 | E      69 0105 0x45 | e     101 0145 0x65
(ack)   6 0006 0x06 | &      38 0046 0x26 | F      70 0106 0x46 | f     102 0146 0x66
(bel)   7 0007 0x07 | '      39 0047 0x27 | G      71 0107 0x47 | g     103 0147 0x67
(bs)    8 0010 0x08 | (      40 0050 0x28 | H      72 0110 0x48 | h     104 0150 0x68
(ht)    9 0011 0x09 | )      41 0051 0x29 | I      73 0111 0x49 | i     105 0151 0x69
(nl)   10 0012 0x0a | *      42 0052 0x2a | J      74 0112 0x4a | j     106 0152 0x6a
(vt)   11 0013 0x0b | +      43 0053 0x2b | K      75 0113 0x4b | k     107 0153 0x6b
(np)   12 0014 0x0c | ,      44 0054 0x2c | L      76 0114 0x4c | l     108 0154 0x6c
(cr)   13 0015 0x0d | -      45 0055 0x2d | M      77 0115 0x4d | m     109 0155 0x6d
(so)   14 0016 0x0e | .      46 0056 0x2e | N      78 0116 0x4e | n     110 0156 0x6e
(si)   15 0017 0x0f | /      47 0057 0x2f | O      79 0117 0x4f | o     111 0157 0x6f
(dle)  16 0020 0x10 | 0      48 0060 0x30 | P      80 0120 0x50 | p     112 0160 0x70
(dc1)  17 0021 0x11 | 1      49 0061 0x31 | Q      81 0121 0x51 | q     113 0161 0x71
(dc2)  18 0022 0x12 | 2      50 0062 0x32 | R      82 0122 0x52 | r     114 0162 0x72
(dc3)  19 0023 0x13 | 3      51 0063 0x33 | S      83 0123 0x53 | s     115 0163 0x73
(dc4)  20 0024 0x14 | 4      52 0064 0x34 | T      84 0124 0x54 | t     116 0164 0x74
(nak)  21 0025 0x15 | 5      53 0065 0x35 | U      85 0125 0x55 | u     117 0165 0x75
(syn)  22 0026 0x16 | 6      54 0066 0x36 | V      86 0126 0x56 | v     118 0166 0x76
(etb)  23 0027 0x17 | 7      55 0067 0x37 | W      87 0127 0x57 | w     119 0167 0x77
(can)  24 0030 0x18 | 8      56 0070 0x38 | X      88 0130 0x58 | x     120 0170 0x78
(em)   25 0031 0x19 | 9      57 0071 0x39 | Y      89 0131 0x59 | y     121 0171 0x79
(sub)  26 0032 0x1a | :      58 0072 0x3a | Z      90 0132 0x5a | z     122 0172 0x7a
(esc)  27 0033 0x1b | ;      59 0073 0x3b | [      91 0133 0x5b | {     123 0173 0x7b
(fs)   28 0034 0x1c | <      60 0074 0x3c | \      92 0134 0x5c | |     124 0174 0x7c
(gs)   29 0035 0x1d | =      61 0075 0x3d | ]      93 0135 0x5d | }     125 0175 0x7d
(rs)   30 0036 0x1e | >      62 0076 0x3e | ^      94 0136 0x5e | ~     126 0176 0x7e
(us)   31 0037 0x1f | ?      63 0077 0x3f | _      95 0137 0x5f | (del) 127 0177 0x7f

Base-64

When sending text, we can use ASCII. Unfortunately some of the codes representing in ASCII are non-printable, so if we have a binary file then the characters within the file may be non-printing ones. For the Internet, some communication protocols require that we have printable characters, such as for SMTP (which sends emails). Thus we often have to convert a binary file into Base-64. For this we take six bits at a time. The coding that we use is then given by the Base-64 table:

Example 1

If we take an example of “fred“, then we get:

ASCII      f       r         e        d
Binary 01100110 01110010 01100101 01100100

Next we group in 6-bits:

Binary 011001 100111 001001 100101 011001 00

and then map these using the Base-64 table:

Binary  011001 100111 001001 100101 011001 00
Decimal   25     39     9      37     25    0
Base-64   Z      n      J       l     Z     A

The result is ZnJlZA

Hash signatures

One thing that we use Base-64 for is to represent the hash signature of some data. For this we have 24-bit groups of the input bits, and then will pad the binary input value to fit. For this we need to create groups-of-four Base64 characters, so we pad at the end to make sure that we can have a multiple of 4 characters:

Binary011001 100111 001001 100101 011001 00[0000] xxxxx xxxxxx

The extra padding at the end is represented with a “=” character to give:

ZnJlZA==

Example 2

If we take an example of “napier “, then we get:

ASCII      f       r         e        d
Binary 01101110 01100001 01110000 01101001 01100101 01110010

Next we group in 6-bits:

Binary 011011 100110 000101 110000 011010 010110 010101 110010

We thus do not need any padding, as we have a multiple of four characters, and then map these using the Base-64 table:

Binary  011011 100110 000101 110000 011010 010110 010101 110010
Decimal   27     38     5      48     26     22      21    50
Base-64   b       m     F      w      a      W        V     y

The result is bmFwaWVy

Base-64 table

The table is:

     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A            17 R            34 i            51 z
         1 B            18 S            35 j            52 0
         2 C            19 T            36 k            53 1
         3 D            20 U            37 l            54 2
         4 E            21 V            38 m            55 3
         5 F            22 W            39 n            56 4
         6 G            23 X            40 o            57 5
         7 H            24 Y            41 p            58 6
         8 I            25 Z            42 q            59 7
         9 J            26 a            43 r            60 8
        10 K            27 b            44 s            61 9
        11 L            28 c            45 t            62 +
        12 M            29 d            46 u            63 /
        13 N            30 e            47 v
        14 O            31 f            48 w         (pad) =
        15 P            32 g            49 x
        16 Q            33 h            50 y

Presentation

One thought on “Data Presentation

Leave a comment