Zero, Zero, What Are You? A Deep Dive into Type Conversion

Almost every computer uses a text input buffer for handling user input, providing feasibility and wide-ranging support across various applications. However, it may not be immediately apparent how the characters we type are converted internally for numeric manipulation. Let's dive into how computers store and process input from the keyboard.

When you type a number like "10" on your keyboard, your first assumption might be that it's stored as the integer 10 or its binary representation "00001010". However, that's not the case. Let's explore why.

Character Encoding Systems

Computers use character encoding systems to represent text in binary or hexadecimal format. Some popular encoding systems include ASCII, UTF-8, and UTF-16. For this example, we'll focus on ASCII (American Standard Code for Information Interchange).

ASCII Representation

When you type "10" on your keyboard, it's actually stored as two separate characters: "1" and "0". In ASCII:

"1" is represented as "0110001" (decimal 49)
"0" is represented as "0110000" (decimal 48)

This means we can't perform arithmetic operations directly on these values. If we were to add "1" and "0" in their ASCII representations, we'd be adding 49 and 48, which is clearly not what we want.

Let's verify this with a simple Python program:

from sys import stdin

if __name__ == "__main__":
    text = stdin.buffer.readline()
    print("you entered: ", text)
    print("ASCII codes:", [b for b in text])

> py test.py
10
you entered:  b'10\r\n'
ASCII codes: [49, 48, 13, 10]

So how can we get around this problem, if you take a close look at the above table, you'll see all numbers are in sequence not only in ascii but in all major character encoding system and that's not the coincidence.

We can get the actual number by subtracting the first representation of character, which is 0 in our case.

Example:
0 -> 48 = 48 - 48 = 0

1 -> 49 = 49 - 48 = 1

But what about multi-digit numbers, let's take 455 as an example. We know that it's a combination of units, tens and hundreds. So, in case of 455 its 5x(10^0) + 5x(10^1) + 5x(10^2).

So, the whole process would be like convert each character to their original number by subtracting its first representation and multiplying it by respective unit and adding it the result to the result of next character.

Example:

455:
4 -> 52 - 48 = 4 x 10^2
5 -> 53 - 48 = 4 x 10^1
5 -> 53 - 48 = 4 x 10^0

But obviously the whole thing would be in binary in the computer, so the accurate operation would be:

(0110100 - 0110000)*10^2 + (0110101 - 0110000)*10^1 + (0110101 - 0110000)*10^0

Let's look at the python equivalent of it:

result = 0
for chr_code in text:
    result = (chr_code - ord("0")) + result*10

from sys import stdin

if __name__ == "__main__":
    text = stdin.buffer.readline().strip()
    print("you entered: ", text)
    print("ASCII codes:", [b for b in text])

    result = 0
    for chr_code in text:
        result = (chr_code - ord("0")) + result*10

    print("Actual number is: ", result)

This code iterates through each character, converts it to its numeric value, and builds up the final number by multiplying the previous result by 10 and adding the new digit.

>py test.py
10
you entered:  b'10'
ASCII codes: [49, 48]
Actual number is:  11
> py test.py
455
you entered:  b'455'
ASCII codes: [52, 53, 53]
Actual number is:  455

That's all, we understood the native implementation of str to int in most of the programming languages.

Zero, Zero, What Are You? A Deep Dive into Type Conversion

Subscribe to my newsletter

Cosmicoppai

Cosmicoppai