A probability distribution is:
a mathematical description
of a random phenomenon
in terms of all its possible outcomes
and their associated probabilities
In this image, what do you expect to be:
The main types of distributions are:
Every horizontal line you draw can be interpreted as a continuous distribution. Every barplot as a discrete distribution.
All the distributions we are going to discuss are variations of these two basic types!
For parametric distributions, we have a formula that describes the line / bars. You just put in the parameters, and the output is the line / bars.
A probability mass function (pmf) describes the probability distribution of discrete variables.
Consider a toin coss:
\[ f(x) = \begin{cases} 0.5 & x \text{ is head} \\ 0.5 & x \text{ is tails} \end{cases} \]
This is the pmf of the Bernoulli distribution
The probability is a function \(f\) over the sample space \(\mathscr{S}\) of a discrete random variable \(X\), which gives the probability that \(X\) is equal to a certain value. \[f(x) = P(X = x)\]
Each pmf satisfies these conditions: \[ \begin{align} f(x) \geq 0 ,\, \forall x \in X\\ \Sigma_{x \in \mathscr{S}} \, f(x) = 1 \end{align} \]
for a collection \(\mathscr{A}\) \[P(\mathscr{A} \in \mathscr{S}) =\Sigma_{x_i \in \mathscr{A}} f(x_i)\]
For continuous distributions, we use a probability density function (pdf).
This might look like unnecessary mathematical details. But it is actually important to understand the difference.
Example: can you answer the question “What is the probability your body temperature is 37.0 C?”
The answer might be unexpected: 0!
Let’s say your answer is 25%. But what if your temperature is 37.1? does that count? Or 37.01?
Because the distribution is continuous you can only say something about the range “What is the probability your temperature is between 36.5 and 37.2 C?”
This is one of the distributions that is used most often.
A major reason for this is, that if you keep sampling and adding from a population you always end up with a normal distribution.
Take a persons height.
Thus, height will be normally distributed. So will the weight of wolves or the length of a penguins wing.
However, multiplying values will give you a long tail!
This is the case when variables interact in some way, and are not independent.
\[4 + 4 + 4 + 4 = 16\]
but
\[4 * 4 * 4 * 4 = 256\]
However, multiplying values will give you a long tail!
This is the case when variables interact in some way, and are not independent.
\[4 + 4 + 4 + 4 = 16\]
but
\[4 * 4 * 4 * 4 = 256\]
This is common if variables interact with each other. Examples are stock prices, failures of machines, ping times on a network, income distribution.
multiplying values will give you a fat-tail distribution! This will typically be a log-normal distribution:
if \(X\) is log-normal distibuted, then \(Y = log(X)\) will be a normal distribution.
The shaded area is the 99% confidence interval.
Raise your hand if you think it is reasonable to say that increasing variable \(x\) will lower variable \(y\)
Now I tell you that the \(x\) axis is the amount of hours invested in study, and the \(y\) axis is the average grade of a student.
Raise your hand if you think it is reasonable to say that increasing variable \(x\) will lower variable \(y\)
Raise your hand if you think it is reasonable to say that increasing variable \(x\) will lower variable \(y\)
Raise your hand if you think it is reasonable to say that increasing variable \(x\) will lower variable \(y\)