Part 1 - Section 4: Measures of Variability
This chapter covers basic information regarding data visualisation using R.
Means, quantiles and a mode – measures of location – describe one property of frequency distribution – location. Another important property is dispersion (variation) which we describe by several measures of variation.
The range of variation R is defined as difference between the largest and the smallest value of the variable
The interquartile range:
The interdecile range:
The nterpercentile range:
The interquartile range indicates the width of the interval which includes 50 % of middle values of ordered sample. By analogy the interdecile or the interpercentile range indicatethe width of the interval which includes 80 % or 98 % of middle values of ordered sample.
We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18
and 21. We have the following values:
X0.10 =2, X0.25 =6, X0.50 =11, X0.75 =15.5, X0.90 =2.
The range of variation is
\( R= X_{\max} - X_{\min} = 21 - 2 =19 . \)
The interquartile range is
\( R_Q = X_{0.75} - X_{0.25} = 15.5 - 6 =9.5 . \)
The interdecile range is
\( R_D = X_{0.90} - X_{0.10} = 21 - 2 =19 . \)
The quartile deviation is defined by the formula
The percentile deviation is defined by the formula
Example:
Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12,
13, 18 and 21.
The quartile deviation is
\( Q= R_Q /2 = 9.5/2 =4.75 . \)
The decile deviation is
\( D= R_D /8 = 19/8 =2.375 . \)
It means that the average width of two (eight) middle quartile
(decile) intervals is 4.75 (2.375).
The average deviation is defined as the arithmetic mean of the absolute deviations
Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9. Since the arithmetic mean is \( \overline{X} = 5.75 , \) we obtain
Subtitle: Variance
The variance sn2 is defined as the arithmetic mean of squares of deviations
Elementary properties of the variance:
- if the variable is constant, then the variance is zero.
- if we add a constant to the values of the variable, then
\[ s_n^2 = \frac{1}{n} \, \sum_{i=1}^n \left[ \left( X_i + c \right) - \left( \overline{X} + c \right) \right]^2 . \]
- f we multiply the values of the variable by a constant
c, then
\[ \frac{1}{n} \, \sum_{i=1}^n \left( c \cdot X_i - c \cdot \overline{X} \right)^2 = c^2 \cdot s_n^2 . \]
The square root of the variance is called standard deviation
The sample variance s2 if defined by the formula
Example: Calculate the variance, the standard deviation, the sample variance and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8, 8 and 9.
The arithmetic mean is \( \overline{X} = 5.75 . \) So we have