Statistics

Statistics   is the discipline that concerns the collection, analysis, interpretation and presentation of data. Those data relate to any underlying phenomenon and are usually gathered by sampling observable values, defined as population. In finance it often pertains return of a security or asset price.


For the statistics to follow, be:

Single variable

Mean

The mean of a set of numbers is a simple mathematical average. One distinguishes between:

Mean
Average Definition Sample Population Probability
Arithmetic  Sum of a collection of numbers, divided by the sum of occurrences of those numbers in the collection. \( \bar{x} = \frac{1}{n} \sum_{i=1}^n x_{i} \space = \space \frac{x_{1} \space+\space x_{2} \space+\space ... \space+\space x_{n}}{n}\) \( \mu = \frac{1}{N} \sum x \) \( E(X) \)
Geometric  The nth root of the product on n numbers \( (\prod_{i=1}^n x_{i})^\frac{1}{n} \space = \space \sqrt[n]{x_{1} \times x_{2} \times ... \times x_{n}} \)

Variance, Standard deviation

Variance and Standard deviation measure how data are scattered, the dispersion of a dataset relative to its mean.

Variance, Standard deviation
Measure Definition Sample Population Probability
Variance A measure of how far each value in the data set is from the mean. Computed as the arithmetic average of the squares of each data point's deviation relative to the sample mean (distance). Squaring deviations, eliminates negative values. \( s^2 = \frac{1}{n-1} \space \sum_{i=1}^n \space ( x_{i} \space - \space \bar{x} )^2 \) \( \sigma^2 = \frac{1}{N} \space \sum_{i=1}^n \space ( x_{i} \space - \space \mu )^2 \) \( Var(X) = E[(X - \mu)^2 ] \)
The difference in denominators between both formulae (n-1, N) is the Bessel's correction , an approach to reduce the bias due to finite sample size. The population mean is unknown and taking the sample mean creates a calculation bias equal to \( \bar{x} - \mu \) , which can be corrected by multiplying the biased sample variance by the factor \( \frac{n}{n-1} \) .
Standard deviation The square root of the variance. \( s = \sqrt{ \frac{1}{n-1} \space \sum_{i=1}^n \space ( x_{i} \space - \space \bar{x} )^2 } \) \( \sigma = \sqrt {\frac{1}{N} \space \sum_{i=1}^n \space ( x_{i} \space - \space \mu )^2 } \) \( SD(X) = \sqrt{ E[(X - \mu)^2 ] }\)

In finance, dispersion of returns for a given asset, security or market index is called volatility. There are several ways to measure it  🔎. Historical volatility (statistical volatility) gauges the fluctuations of an underlying asset by measuring price changes over predetermined periods of time. Statistically it measures the variance of returns. It gives an estimate of the uncertainty of future returns. The larger this value, the larger the risk or uncertainty associated with an asset. Higher volatility means less predictable prices.

Multiple variables

Covariance

Covariance is a measure of the joint variability of two random variables. The sign of the covariance shows the tendency in the linear relationship between the variables: tend to show similar (+) or opposite (-) behavior. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

Covariance
Measure Definition Sample Population Probability
Covariance Expected value (or mean) of the product of their deviations from their individual expected values. \( s_{x_{j}x_{k}} = \)

\( \frac{1}{n - 1} \space \sum_{i=1}^n ( x_{ij} - \bar{x_{j}} ) ( x_{ik} - \bar{x_{k}} ) \)
\( \sigma_{x_{j}x_{k}} = \)

\( \frac{1}{N} \space \sum_{i=1}^n ( x_{ij} - \mu_{j} ) ( x_{ik} - \mu_{k} ) \)
\( Cov(X_{j},X_{k}) = \)

\( E[ \space (X_{j} - E[X_{j}]) \space (X_{k} - E[X_{k}]) \space ] \)
Pearson correlation coefficient Covariance of two variables, divided by the product of their standard deviations \( r_{x_{j},x_{k}} = \frac{ \sum_{i=1}^n ( x_{ij} \space - \space \bar{x_{j}} ) ( x_{ik} \space - \space \bar{x_{k}} )}{ \sqrt{ \sum_{i=1}^n ( x_{ij} \space - \space \bar{x_{j}} )^2 } \sqrt{ \sum_{i=1}^n ( x_{ik} \space - \space \bar{x_{k}} )^2 } } \)

\( \rho_{X_{j},X_{k}} = \frac{ Cov(X_{j},X_{k}) }{ \sigma_{X_{j}} \sigma_{X_{k}} } \)