My first impression about computer is those zeros and ones. I still remembered the time when they say, computers are nothing but those two digits - and I, the young boy, never fully understood what that means.

Transplanted Post

After setting up this website I am gradually transferring all my previous blogs stored locally and in other blog system towards here. This is one of them. Thus, you should be aware that the date and time displayed on this blog is not accurate.

Post Series

This posts belongs to the series: Encoding Basics. You can see all posts in this series here.

Expected Reader Experience: None

This post is good for readers having zero or very limited knowledge on the topic: Encoding. However, since this post belongs to a series, you are expected to master the previous posts in the series.

Machine representation and True value

It is not that hard to understand why computers are zeros and ones. If computers are machines that process data (more or less they are), they must have a way to represent those data: 1, 0, 4, 10, 999, -1, log(2), “ab”, “apple”, an article, a slide, an image of the sky, a piece of your favorite music, a video game about how a plumber saves the princess… To represent all these data, computers only use numerous zeros and ones.

How? A 0 in computer represents number 0, and a 1 represents number 1, 10 in computer represents number 2, and 11 represents number 3… Note that as we run out of available digit type (0,1), we increase the digit count to represents larger number. This is called base-2 (or binary) numeral system and it is really nothing foreign. Think about our mundane base-10 numeral system: we count from number 0 to 9, and when we want to represent a number that is 1 larger than 9 we are out of digit type. So, we simply increases a digit count by 1 - Ta-da, we have “10”.

Any base-n numeral system uses Positional Notation, where the meaning of digit is contingent of its relative position in a number. In base-10, The digit “5” in “50” represents a quantity that is ten times larger than the quantity represented by the “5” in “15”. If we observe closely, the number 27 in base-10 is really \( 27 = 2 * 10^1 + 7 * 10^0 \) . Similarly, in base-2, \( 11011 = 1 * 2^4 + 1 * 2^3 + 0 * 2^2 + 1 * 2^1 + 1 * 2^0 \), so 11011’s equivalent number in base-10 is 27. 27 in base-10 and 11011 in base-2 have the same true value.

Pay close attention to the exponential value (controlled by digit position) and base value (controlled by the base of numeral systems). In mathematical terms, a base-n number \( S_{(n)} \), if expressed in base-10, would be \( S_{(10)} = \Sigma(a_i * n^i)) \), where \( a_i \) is the \( i \)-th digit of \( S_{(n)} \) in its base-n form, counting from the right, starting from 0.

Decimals are handled in the same manner. For example, \(123.45 = 1 * 10^2 + 2 * 10^1 + 3 * 10^0 + 4 * 10^{-1} + 5 * 10^{-2}\). Binary decimal will look a bit strange to our brains long used to base-10, but \(10.1 = 1 * 2^1 + 0 * 2^0 + 1 * 2^{-1}\).

Yes, there are other bases for numeral systems. In fact, 45 in base-10, 55 in base-8 and 101101 in base-2 all represent the same value of number. That value is what we call true value of this number, and a true value has many representations (can be represented in various numeral systems). Our computer chose the particular representation method of base-2; hence, we have the famous “computers are nothing but zeros and ones”. (Why binary though? Because transistor - the fundamental building block of modern computers - have two states: on and off. Hence, it is really more of a practical choice. See here.)

Here is a table of equivilent values in 4 commonly-used positional numeral systems:

Base-10 (Decimal) Base-2 (Binary) Base-8 (Octal) Base-16 (Hexadecimal)
0 0 0 0
1 1 1 1
2 10 2 2
3 11 3 3
4 100 4 4
5 101 5 5
6 110 6 6
7 111 7 7
8 1000 10 8
9 1001 11 9
10 1010 12 A
11 1011 13 B
12 1100 14 C
13 1101 15 D
14 1110 16 E
15 1111 17 F
16 10000 20 10

Letters are used in base-16 numeral system because we are used to the base-10 system and thus simply do not have digit beyond 9.

Byte, and when Numbers are not just Numbers

Hence, we you see number 45 on the screen (like now), you know you there is a 0010 1101 sitting in your machine somewhere that causes your screen to render this shape 45.

Why 0010 1101 instead of 10 1101? Why eight digits? First you have to know any numbers of zeros at the front does not change its value. 101101, 0010 1101 and 0000 0010 1101 are all equal - just like how in base-10, 34, 0034 and 000034 are all equal. Choice of 8 digits in computer representation carries practical significance because eight binary digits makes a Byte, which is a important unit of data practically.

A bit is a binary digit, and a Byte equals 8 bits. Bits are rarely seen alone in computers. They are almost always bundled together into 8-bit collections, we give these collections a name: Byte.

Unit Quantity Comments
bit (b) a binary digit
Byte (B) 1 Byte = 8 bits The smallest unit in computer stoarge
WORD 1 WORD = 2 Bytes Stores a Chinese character in Windows
DWORD 1 DWORD = 2 WORDs
1kB 1 kB = 1024 Bytes Differentiate kb and kB! 1 kB = 8 kb
1MB 1 MB = 1024 kB
1GB 1 GB = 1024 MB
1TB 1 TB = 1024 GB

With a bit you can represent two possible numbers:

0 = \( 0_{(10)}\)
1 = \( 1_{(10)}\)

With 2 bits the number of possibilities you can represent is 4:

00 = \( 0_{(10)}\)
01 = \( 1_{(10)}\)
10 = \( 2_{(10)}\)
11 = \( 3_{(10)}\)

With a byte you can represent 256 possible numbers:

00000000 = \( 0_{(10)}\)
00000001 = \( 1_{(10)}\)

11111110 = \( 254_{(10)}\)
11111111 = \( 255_{(10)}\)

But wait - why should the 256 different byte-long digits always represent different numbers? What about letting them represent 256 different letters, or symbols? What about this:

0000 0000 = letter “a”
0000 0001 = letter “b”

1111 1110 = symbol “,”
1111 1111 = symbol “.”

Does this 1-1 representation scheme reminds you anything?

And when you realize that 256 possibilities are not sufficient to represent all letters in all languages, you start to think about using two bytes for a character instead of one. Using more bits expands the number of possibilities that you can represent.

Moreover, What about:

0000 0000 0000 0000 0000 0000 = black color
0000 0000 0000 0000 0000 0001 = black, but with a bit red
...
0000 0000 0000 0001 0000 0000 = black, but with a bit green
...
1111 1111 1111 1111 1111 1111 = white color

This epiphany is a basic principle of encoding in computers - that bits are not just numbers, but possibilities. The possibilities can be numerical possibilities (in which case bits represent numbers), lexical possibilities (in which case bits represent letters and symbols), chromatic possibilities (in which case bits represent colors)… Each of these types of possibilities is called a Encoding Method. We give computer these bits, and we just need some extra bits to tell the computer which encoding method should be used in the current context for it to interpret the bits correctly (Incidentally, this information is usually communicated by setting a file extension).

Remember, computers only have zeros and ones - but with these bits we can represent numbers, texts, audios, videos and more - we just need different encoding methods.

Conclusion

With the driving question “why computers are said to be comprising of zeros and ones”, we got to know the concept of number representation, and extended it to representation of different types of possibilities in general - this is the key spirit of computer encoding. The next post in the series will delve further into the binary representation of numbers.