Continuous Probability Distributions
Introduction
Probability distributions are a fundamental concept in statistics and data science. Continuous probability distributions are an essential subset of probability distributions, and they play a crucial role in statistical analysis, decision making, and inference. In this blog, we will discuss what continuous probability distributions are, different types of continuous probability distributions, their properties, and their practical applications. We will also include examples, exercises, and interview questions to help readers understand the topic better.
What are Continuous Probability Distributions?
A continuous probability distribution is a probability distribution that describes the probability of obtaining a random variable with an infinite number of possible values. Unlike discrete probability distributions, which are used to describe random variables with a finite number of possible values, continuous probability distributions are used to describe random variables that can take any value within a range. For example, the height of a person can be any value between 0 and infinity, making it a continuous random variable.
Types of Continuous Probability Distributions
Normal Distribution
The normal distribution is the most common type of continuous probability distribution. It is often referred to as the bell curve because it has a bell-shaped probability density function (PDF). The normal distribution is used to model a wide range of real-world phenomena, including human heights, IQ scores, and stock market returns. The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1.
The formulas for the probability density function (PDF), cumulative distribution function (CDF), mean, and variance of a normal distribution:
Probability Density Function (PDF)
$$f(x | μ, σ^2) = \frac{1}{\sigma\sqrt{2\pi}} \, \mathrm{exp}\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
Cumulative Density Function (CDF)
$$F(x | μ, σ^2) = \frac{1}{2}\left[1 + \mathrm{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right]$$
Here, erf
denotes the error function, which can be defined as:
$$\mathrm{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} \mathrm{exp}(-t^2)\, dt$$
Mean
$$\mu = E[X] = \int_{-\infty}^{\infty} x\, f(x | \mu, \sigma^2)\, dx$$
Variance
$$\sigma^2 = \mathrm{Var}[X] = \int_{-\infty}^{\infty} (x-\mu)^2\, f(x | \mu, \sigma^2)\, dx$$
Note that the PDF and CDF are functions of the random variable X
, and the mean and variance are parameters of the distribution. The vertical bar |
is used to separate the random variable from the parameters.
Exponential Distribution
The exponential distribution is used to model the time between events in a Poisson process. For example, the time between arrivals at a customer service desk can be modeled using the exponential distribution. The PDF of the exponential distribution is skewed to the right, and it has a long tail on the right side.
The formulas for the probability density function (PDF), cumulative distribution function (CDF), mean, and variance of an exponential distribution:
PDF:
$$f(x | λ) = \begin{cases} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & x < 0 \end{cases}$$
CDF:
$$F(x | λ) = \begin{cases} 1 - e^{-\lambda x} & x \geq 0 \\ 0 & x < 0 \end{cases}$$
Mean:
$$\mu = E[X] = \int_{0}^{\infty} x\, f(x | \lambda)\, dx = \frac{1}{\lambda}$$
Variance:
$$\sigma^2 = \mathrm{Var}[X] = \int_{0}^{\infty} (x-\mu)^2\, f(x | \lambda)\, dx = \frac{1}{\lambda^2}$$
Here, λ
is the rate parameter of the exponential distribution. Note that the PDF and CDF are functions of the random variable X
, and the mean and variance are parameters of the distribution. The vertical bar |
is used to separate the random variable from the parameters. Also, note that the PDF and CDF are defined piecewise to account for the fact that the exponential distribution is only defined for non-negative values of x
.
Uniform Distribution
The uniform distribution is used to model situations where all outcomes are equally likely. For example, the roll of a fair die can be modeled using the uniform distribution. The PDF of the uniform distribution is constant within a given range.
The formulas for the probability density function (PDF), cumulative distribution function (CDF), mean, and variance of a uniform distribution:
PDF:
$$f(x | a, b) = \begin{cases} \frac{1}{b-a} & a \leq x \leq b \\ 0 & \text{otherwise} \end{cases}$$
CDF:
$$F(x | a, b) = \begin{cases} 0 & x < a \\ \frac{x-a}{b-a} & a \leq x \leq b \\ 1 & x > b \end{cases}$$
Mean:
$$\mu = E[X] = \frac{a+b}{2}$$
Variance:
$$\sigma^2 = \mathrm{Var}[X] = \frac{(b-a)^2}{12}$$
Here, a
and b
are the lower and upper bounds of the uniform distribution, respectively. Note that the PDF and CDF are functions of the random variable X
, and the mean and variance are parameters of the distribution. The vertical bar |
is used to separate the random variable from the parameters. Also, note that the PDF and CDF are defined piecewise to account for the fact that the uniform distribution is only defined for values of x
within the interval [a,b]
.
Gamma Distribution
The gamma distribution is used to model the time it takes for a process to occur a certain number of times. For example, the time it takes for a machine to fail after a certain number of uses can be modeled using the gamma distribution. The PDF of the gamma distribution is skewed to the right and has a long tail on the right side.
The formulas for the probability density function (PDF), cumulative distribution function (CDF), mean, and variance of a gamma distribution:
PDF:
$$f(x | k, θ) = \begin{cases} \frac{x^{k-1} \mathrm{e}^{-\frac{x}{\theta}}}{\theta^k \Gamma(k)} & x \geq 0 \\ 0 & x < 0 \end{cases}$$
CDF:
$$F(x | k, θ) = \begin{cases} 0 & x < 0 \\ \frac{\gamma(k,\frac{x}{\theta})}{\Gamma(k)} & x \geq 0 \end{cases}$$
where Γ(k)
is the gamma function and γ(k, x/θ)
is the lower incomplete gamma function, which can be expressed as:
$$\Gamma(k) = \int_0^\infty t^{k-1} \mathrm{e}^{-t}\, dt$$
$$ \gamma(k, x/θ) = \int_0^{x/\theta} t^{k-1} \mathrm{e}^{-t}\, dt$$
Mean:
$$\mu = E[X] = k\theta$$
Variance:
$$\sigma^2 = \mathrm{Var}[X] = k\theta^2$$
Here, k
and θ
are the shape and scale parameters of the gamma distribution, respectively. Note that the PDF and CDF are functions of the random variable X
, and the mean and variance are parameters of the distribution. The vertical bar |
is used to separate the random variable from the parameters. The gamma function Γ(k)
is a generalization of the factorial function, and is defined for all positive real numbers k
.
Properties of Continuous Probability Distributions
Probability Density Function (PDF): The PDF of a continuous probability distribution describes the probability of obtaining a random variable within a certain range. The area under the PDF curve represents the total probability of obtaining a value within the range.
Cumulative Distribution Function (CDF): The CDF of a continuous probability distribution describes the probability of obtaining a random variable less than or equal to a certain value. The CDF can be obtained by integrating the PDF.
Mean: The mean of a continuous probability distribution is the average value of the random variable. The mean can be obtained by integrating the product of the random variable and the PDF.
Variance: The variance of a continuous probability distribution measures the spread of the random variable. The variance can be obtained by integrating the product of the squared difference between the random variable and the mean and the PDF.
Practical Applications of Continuous Probability Distributions
Continuous probability distributions are used in many fields, including finance, engineering, physics, and biology. Here are some examples of their practical applications:
In finance, continuous probability distributions are used to model stock market returns, bond yields, and interest rates.
In engineering, continuous probability distributions are used to model the failure rates of machines and the lifetime of products.
In physics, continuous probability distributions are used to model the distribution of particle velocities and the decay rates of radioactive isotopes.
In biology, continuous probability distributions are used to model the size of organisms and the time between mutations.
Examples of using PDFs
Let's consider some examples to understand which PDF to use for what type of problems.
Example 1
Suppose we want to model the time it takes to complete a task, and we know that the time follows an exponential distribution. What is the probability that the task will be completed in less than 10 minutes if the mean time to complete the task is 15 minutes?
Solution:
We know that the PDF of the exponential distribution is given by:
$$f(x) = \lambda e^{-\lambda x}$$
where λ is the rate parameter, which is equal to the reciprocal of the mean.
In this case, the mean time to complete the task is 15 minutes, so λ = 1/15. Therefore, the PDF of the exponential distribution is:
$$f(x) = (1/15) e^{-(1/15)x}$$
The probability of completing the task in less than 10 minutes is given by the cumulative distribution function (CDF), which is the integral of the PDF from 0 to 10:
$$P(X < 10) = \int_{0}^{10} f(x) dx = \int_{0}^{10} \frac{1}{15} e^{-\frac{1}{15}x} dx$$
Using integration by substitution, we can solve the integral to obtain:
$$P(X < 10) = 0.4866$$
Therefore, the probability of completing the task in less than 10 minutes is approximately 0.4866.
Example 2
Suppose we want to model the heights of a population, and we know that the heights follow a normal distribution with a mean of 170 cm and a standard deviation of 10 cm. What is the probability that a randomly selected person is taller than 180 cm?
Solution
We know that the PDF of the normal distribution is given by:
$$f(x) = \frac{1}{\sigma\sqrt{2\pi}}\, \mathrm{exp}\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
where μ is the mean and σ is the standard deviation.
In this case, μ = 170 cm and σ = 10 cm. Therefore, the PDF of the normal distribution is:
$$f(x) = \frac{1}{10\sqrt{2\pi}}\, \mathrm{exp}\left(-\frac{(x-170)^2}{2*10^2}\right)$$
The probability of selecting a person taller than 180 cm is given by the integral of the PDF from 180 to infinity:
$$P(X > 180) = \int_{180}^{\infty} f(x)\, dx = \int_{180}^{\infty} \frac{1}{10\sqrt{2\pi}}\, \mathrm{exp}\left(-\frac{(x-170)^2}{200}\right)\, dx$$
This integral does not have a closed-form solution, so we can use a table or software to obtain the value of the integral. Using a table or software, we can find that:
$$P(X > 180) = 0.1587$$
Therefore, the probability of selecting a person taller than 180 cm is approximately 0.1587.
Exercises:
Suppose the weight of apples in a certain orchard follows a uniform distribution between 50 grams and 100 grams. What is the PDF of the distribution?
Suppose the lifetimes of a certain type of battery follow a gamma distribution with a shape parameter of 3 and a scale parameter of 2. What is the mean and variance of the distribution?
Suppose the IQ scores of a certain population follow a normal distribution with a mean of 100 and a standard deviation of 15. What is the probability that a randomly selected person has an IQ score between 110 and 130?
Interview Questions:
What is the difference between a continuous probability distribution and a discrete probability distribution?
What is the normal distribution, and what are its properties?
What is the difference between the PDF and the CDF of a continuous probability distribution?
How do you calculate the mean and variance of a continuous probability distribution?
What is the difference between the exponential distribution and the Poisson distribution?
What is the gamma distribution, and where is it commonly used?
What is the central limit theorem, and why is it important in statistics?
Conclusion:
Continuous probability distributions are an important concept in statistics and probability theory. They are used to model a wide range of phenomena, including the sizes of organisms, the time between mutations, and the heights of populations.
In this blog, we discussed the properties of continuous probability distributions, including the PDF, CDF, mean, and variance. We also provided examples of how to use different continuous probability distributions, such as the exponential distribution, normal distribution, and gamma distribution, to solve problems.
Finally, we provided some exercises and interview questions to help readers test their understanding of continuous probability distributions. By mastering the concepts discussed in this blog, readers will be well-equipped to apply continuous probability distributions to real-world problems in statistics, finance, and other fields.
Subscribe to my newsletter
Read articles from Smart Shock directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by