Random Numbers: Are They Truly Random?

Rahul R KatarkiRahul R Katarki
7 min read

Imagine you are at a lottery shop and have to buy 1 ticket. The owner of the shop randomly picks a number between 1 to 1000 and the person with that ticket wins the lottery. Suppose he uses this following code to pick the random number.

#include <stdio.h> 
#include <stdlib.h> 

int main() 
{
    int x = rand()%1000 + 1;
    printf("%d\n", x);
    return 0; 
}

You are at luck since we can “predict” that the code will always return 384. Continue reading the article to know why this happens.

What are random numbers and why do we care about them?

Random numbers are numbers that are generated in a way that they cannot be predicted and follow no pattern. Each number has an equal probability of being chosen.

In the world of cryptography, random numbers are fundamental in generating encryption keys and securing communication. If the random numbers used in cryptographic systems aren’t truly random or unpredictable, the entire system becomes vulnerable to attacks.

How to generate them?

There are mainly 2 types of random number generators(RNGs):

  • Pseudo Random Number Generator (PRNG)

  • True Random Number Generator (TRNG)

Pseudo Random Number Generators

These are numbers that appear random but are generated by deterministic processes, usually algorithms. They are called pseudorandom because, while they can mimic randomness, they are based on an initial seed value, and the sequence can be reproduced if the seed is known. This is widely used because true randomness is hard to achieve purely using code.

Seed value is a number which uniquely characterizes the sequence of random numbers. We can use the seed value to get back the sequence thus enabling us to predict future random numbers.

Examples for PRNGs include LCG, Mersenne Twister, Xorshift, and Middle Square Method. Let’s look into LCG:

Linear Congruence Generator (LCG)

It is one of the oldest and simplest PRNG algorithms. It is based on the following recurrence relation:

$$X_{n+1} = (aX_n + c)\mod m$$

Where:

$$X_n \text{ is the current number}$$

$$X_{n+1} \text{ is the next number}$$

$$a \text{ is the multiplier constant}$$

$$c \text{ is the increment constant}$$

$$m \text{ is the prime modulus}$$

This recurrence generates numbers in the interval 0 to m-1. We start off by choosing the first number of the sequence which is the seed and get the subsequent numbers by the recurrence.

For example:

Let:

$$X_1 = 11 \text{, } a = 3\text{, } c = 13 \text{ and } m = 67$$

#include <stdio.h> 
#include <stdlib.h>

int seed = 11;
int a = 3;
int c = 13;
int m = 67;

int LCG_next_number(int current_number){
    int next_number = (a*current_number + c)%m;
    return next_number;
}


int main() 
{
    int current_number = seed;
    int length = 10;
    for (int i = 0; i < length; i++)
    {
        printf("%d, ", current_number);
        int next_number = LCG_next_number(current_number);
        current_number = next_number;
    }
    printf("\n");
    return 0;
}

The first 10 numbers are: 11, 46, 17, 64, 4, 25, 21, 9, 40, 66

There are a few issues with this type of generation:

  • Periodicity:

    Let’s look at the first 50 terms of this sequence:

    11, 46, 17, 64, 4, 25, 21, 9, 40, 66, 10, 43, 8, 37, 57, 50, 29, 33, 45, 14, 55, 44, 11, 46, 17, 64, 4, 25, 21, 9, 40, 66, 10, 43, 8, 37, 57, 50, 29, 33, 45, 14, 55, 44, 11, 46, 17, 64, 4, 25

    We can notice the numbers repeat after a certain point, i.e after the 22nd number, we repeat 11, 46, 17, 64, 4, 25, 21, 9, 40, 66, 10, 43, 8, 37, 57, 50, 29, 33, 45, 14, 55, 44 forever. This is clearly bad for a RNG.

    The period is the number of values generated before the sequence starts repeating. The maximum period of an LCG is m, but in practice, the actual period is usually much shorter depending on the choice of parameters a, c, and m.

  • Some numbers never appear:

    For the above parameters let’s count the frequency of numbers in the first 1000 numbers.

      #include <stdio.h> 
      #include <stdlib.h>
    
      int seed = 11;
      int a = 3;
      int c = 13;
      const int m = 67;
    
      int LCG_next_number(int current_number){
          int next_number = (a*current_number + c)%m;
          return next_number;
      }
    
      int frequency[m];
    
      int main() 
      {
          int current_number = seed;
          int length = 1000;
          for (int i = 0; i < length; i++)
          {
              frequency[current_number]++;
              int next_number = LCG_next_number(current_number);
              current_number = next_number;
          }
          for (int i = 0; i < m; i++)
          {
              printf("%d, ", frequency[i]);
          }
          printf("\n");
          return 0;
      }
    

    We get:

    0, 0, 0, 0, 46, 0, 0, 0, 45, 46, 45, 46, 0, 0, 45, 0, 0, 46, 0, 0, 0, 46, 0, 0, 0, 46, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 46, 0, 0, 45, 45, 45, 46, 0, 0, 0, 45, 0, 0, 0, 0, 45, 0, 45, 0, 0, 0, 0, 0, 0, 46, 0, 46

    We notice a bunch of zeros which means in the first 1000 numbers we never get those numbers. The frequency count of these numbers will always be zero since the sequence in periodic in nature. Ideally, the frequency distribution should look something like this:

    11, 18, 11, 15, 16, 18, 20, 16, 10, 11, 15, 17, 11, 15, 12, 16, 14, 17, 19, 21, 18, 10, 17, 10, 13, 11, 11, 14, 19, 23, 14, 17, 11, 13, 12, 14, 16, 7, 11, 13, 17, 18, 14, 13, 4, 16, 15, 16, 12, 16, 18, 17, 12, 13, 20, 19, 20, 15, 18, 16, 15, 13, 24, 13, 19, 12, 18

    Each number should have a frequency count of around 1000/67 ≈ 15 since each number has an equal likelihood of being chosen.

Predictable:

If we have the parameters a, c and m, we can predict the next number. But what if we know m, and 3 consecutive terms of the sequence? We can solve for the parameters a and c in this case. Consider the sequence discussed above:

Assume we have the first 3 terms: 11, 46 and 17 and m = 67 . Thus:

$$46 = (11a + c)\mod 67$$

$$17 = (46a + c) \mod 67$$

Let’s forget about modulo operation for some time. This would change our equations to linear equation in 2 variables. Solving them we get:

$$a = \frac{-29}{35} \text{ and } c = \frac{1929}{35}$$

Now bringing in the modulo part by taking the modular inverse of denominator:

$$a = -29 \times 35^{-1} \mod 67 = 3$$

$$c = 1929 \times 35^{-1} \mod 67 = 13$$

Upon solving we get a = 3 and c = 13 which are precisely the parameters we chose to begin with. Python code for the same:

def lcg_solver(x_1, x_2, x_3, m):
    n1 = x_2 - x_3
    d1 = x_1-x_2
    a = n1*pow(d1, -1, m)
    a = a%m
    c = x_2 - x_1*a
    c = c%m
    return a, c

if __name__ == "__main__":
    x_1 = 11
    x_2 = 46
    x_3 = 17
    m = 67
    a, c = lcg_solver(x_1, x_2, x_3, m)
    print("a = {}\nc = {}\n".format(a, c))

Thus LCG is not a good PRNG. C uses LCG to as it’s random number generator. At the start of the article we saw we always got the same number generated. This is because C sets the seed to 1 by default if not initialized. You can use srand(time(0)) to get a random seed. But be careful since this may too give predictable results. In Python, we can use the Crypto.Random package to generate cryptographically secure random numbers. Mersenne Twister is a good choice for PRNG as it has an extremely long period and produces good random numbers. It works by maintaining an internal state that consists of 624 32-bit integers (19,937 bits total). These values are updated through a series of bitwise operations and modular arithmetic to produce the next random number in the sequence.

True Random Number Generator

A true random number generator (TRNG) uses a nondeterministic source to make randomness. It relies on capturing random events from physical processes that are fundamentally unpredictable such as thermal noise, human input like keystrokes or mouse movements, atmospheric noise, etc. These are suitable for cryptographic use as the entropy in these numbers is high.

6
Subscribe to my newsletter

Read articles from Rahul R Katarki directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rahul R Katarki
Rahul R Katarki