Statistics
Statistics is the discipline that concerns the collection, analysis, interpretation and presentation of data. Those data relate to any underlying phenomenon and are usually gathered by sampling observable values, defined as population. In finance it often pertains return of a security or asset price.
- Descriptive statistics aims to summarize a sample and is solely concerned with properties of the observed data. The indexes calculated most often concern two sets of properties of a distribution (sample or population):
- Central tendency (location) seeks to characterize the distribution's central or typical value.
- Dispersion (variability) characterizes the extent to which members of the distribution depart from its center and each other.
- Inferential statistics use the data to learn about the larger population that the observed sample of data is assumed to represent. Data analysis is used to deduce properties of an underlying probability distribution allowing to draw conclusions from data that are subject to random variation.
For the statistics to follow, be:
- \( X, X_{j}, X_{k} \) random variables
- \( x_{i} \) value of the ith element in the sampling
- \( n \) sample size
- \( N \) population size
Single variable
Mean
The mean of a set of numbers is a simple mathematical average. One distinguishes between:
Mean
Average |
Definition |
Sample |
Population |
Probability |
Arithmetic |
Sum of a collection of numbers, divided by the sum of occurrences of those numbers in the collection. |
\( \bar{x} = \frac{1}{n} \sum_{i=1}^n x_{i} \space = \space \frac{x_{1} \space+\space x_{2} \space+\space ... \space+\space x_{n}}{n}\)
|
\( \mu = \frac{1}{N} \sum x \)
|
\( E(X) \)
|
Geometric |
The nth root of the product on n numbers |
\( (\prod_{i=1}^n x_{i})^\frac{1}{n} \space = \space \sqrt[n]{x_{1} \times x_{2} \times ... \times x_{n}} \)
|
|
Variance, Standard deviation
Variance and Standard deviation measure how data are scattered, the dispersion of a dataset relative to its mean.
Variance, Standard deviation
Measure |
Definition |
Sample |
Population |
Probability |
Variance |
A measure of how far each value in the data set is from the mean. Computed as the arithmetic average of the squares of each data point's deviation relative to the sample mean (distance). Squaring deviations, eliminates negative values. |
\( s^2 = \frac{1}{n-1} \space \sum_{i=1}^n \space ( x_{i} \space - \space \bar{x} )^2 \)
|
\( \sigma^2 = \frac{1}{N} \space \sum_{i=1}^n \space ( x_{i} \space - \space \mu )^2 \)
|
\( Var(X) = E[(X - \mu)^2 ] \)
|
The difference in denominators between both formulae (n-1, N) is the Bessel's correction , an approach to reduce the bias due to finite sample size. The population mean is unknown and taking the sample mean creates a calculation bias equal to \( \bar{x} - \mu \) , which can be corrected by multiplying the biased sample variance by the factor \( \frac{n}{n-1} \) . |
Standard deviation |
The square root of the variance. |
\( s = \sqrt{ \frac{1}{n-1} \space \sum_{i=1}^n \space ( x_{i} \space - \space \bar{x} )^2 } \)
|
\( \sigma = \sqrt {\frac{1}{N} \space \sum_{i=1}^n \space ( x_{i} \space - \space \mu )^2 } \)
|
\( SD(X) = \sqrt{ E[(X - \mu)^2 ] }\)
|
In finance, dispersion of returns for a given asset, security or market index is called volatility. There are several ways to measure it
🔎. Historical volatility (statistical volatility) gauges the fluctuations of an underlying asset by measuring price changes over predetermined periods of time. Statistically it measures the variance of returns. It gives an estimate of the uncertainty of future returns. The larger this value, the larger the risk or uncertainty associated with an asset. Higher volatility means less predictable prices.
Multiple variables
Covariance
Covariance is a measure of the joint variability of two random variables. The sign of the covariance shows the tendency in the linear relationship between the variables: tend to show similar (+) or opposite (-) behavior. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.
Covariance
Measure |
Definition |
Sample |
Population |
Probability |
Covariance |
Expected value (or mean) of the product of their deviations from their individual expected values. |
\( s_{x_{j}x_{k}} = \)
\( \frac{1}{n - 1} \space \sum_{i=1}^n ( x_{ij} - \bar{x_{j}} ) ( x_{ik} - \bar{x_{k}} ) \)
|
\( \sigma_{x_{j}x_{k}} = \)
\( \frac{1}{N} \space \sum_{i=1}^n ( x_{ij} - \mu_{j} ) ( x_{ik} - \mu_{k} ) \)
|
\( Cov(X_{j},X_{k}) = \)
\( E[ \space (X_{j} - E[X_{j}]) \space (X_{k} - E[X_{k}]) \space ] \)
|
Pearson correlation coefficient |
Covariance of two variables, divided by the product of their standard deviations |
\( r_{x_{j},x_{k}} = \frac{ \sum_{i=1}^n ( x_{ij} \space - \space \bar{x_{j}} ) ( x_{ik} \space - \space \bar{x_{k}} )}{ \sqrt{ \sum_{i=1}^n ( x_{ij} \space - \space \bar{x_{j}} )^2 } \sqrt{ \sum_{i=1}^n ( x_{ik} \space - \space \bar{x_{k}} )^2 } } \)
|
\( \rho_{X_{j},X_{k}} = \frac{ Cov(X_{j},X_{k}) }{ \sigma_{X_{j}} \sigma_{X_{k}} } \)
|