// Remove fullscreen button from SageCell.

An R TUTORIAL for Statistics Applications

Part 1 - Section 4: Measures of Variability

This chapter covers basic information regarding data visualisation using R.

Email Vladimir Dobrushkin

Means, quantiles and a mode – measures of location – describe one property of frequency distribution – location. Another important property is dispersion (variation) which we describe by several measures of variation.

The range of variation R is defined as difference between the largest and the smallest value of the variable

\[ R = X_{\max} - X_{\min} . \]
It is the simplest but the rawest measure of variation. It indicates the width of the interval where all values are included.

The interquartile range:

\[ R_Q = X_{0.75} - X_{0.25} . \]

The interdecile range:

\[ R_D = X_{0.90} - X_{0.10} . \]

The nterpercentile range:

\[ R_C = X_{0.99} - X_{0.01} . \]

The interquartile range indicates the width of the interval which includes 50 % of middle values of ordered sample. By analogy the interdecile or the interpercentile range indicatethe width of the interval which includes 80 % or 98 % of middle values of ordered sample.

We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18 and 21. We have the following values:
X0.10 =2, X0.25 =6, X0.50 =11, X0.75 =15.5, X0.90 =2.

The range of variation is \( R= X_{\max} - X_{\min} = 21 - 2 =19 . \)
The interquartile range is \( R_Q = X_{0.75} - X_{0.25} = 15.5 - 6 =9.5 . \)
The interdecile range is \( R_D = X_{0.90} - X_{0.10} = 21 - 2 =19 . \)

The quartile deviation is defined by the formula

\[ Q = R_Q /2 . \]
The decile deviation is defined by the following formula:
\[ D = R_D /8 . \]

The percentile deviation is defined by the formula

\[ C = R_C /98 . \]

Example: Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12, 13, 18 and 21. The quartile deviation is
\( Q= R_Q /2 = 9.5/2 =4.75 . \)
The decile deviation is \( D= R_D /8 = 19/8 =2.375 . \)
It means that the average width of two (eight) middle quartile (decile) intervals is 4.75 (2.375).

The average deviation is defined as the arithmetic mean of the absolute deviations

\[ d_{\overline{X}} = \frac{1}{n} \, \sum_{i=1}^n \left\vert X_i - \overline{X} \right\vert . \]

Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9. Since the arithmetic mean is \( \overline{X} = 5.75 , \) we obtain

\begin{eqnarray*} d_{\overline{X}} &=& \frac{1}{8} \left[ |1 - 5.75| + |2- 5.75|+|5 - 5.75|+ |6 - 5.75| \right] + \\ && \frac{1}{8} \left[ |7 - 5.75| + |8-5.75| + |8-5.75| + |9-5.75| \right] = 2.3125 . \end{eqnarray*}

Subtitle: Variance

The variance sn2 is defined as the arithmetic mean of squares of deviations

\[ s_n^2 = \frac{1}{n} \, \sum_{i=1}^n \left\vert X_i - \overline{X} \right\vert^2 . \]
Expanding the sum above, we get
\begin{eqnarray*} s_n^2 &=& \frac{1}{n} \left( \sum_{i=1}^n X_i^2 - 2\,\overline{X} \,\sum_{i=1}^n X_i + \sum_{i=1}^n \overline{X}^2 \right) \\ &=& \frac{1}{n} \left[ \sum_{i=1}^n X_i^2 - 2\,n\,\overline{X}^2 + n\,\overline{X}^2 \right) \\ &=& \frac{1}{n} \, \sum_{i=1}^n X_i^2 - \overline{X}^2 = \overline{X^2} -\overline{X}^2 . \end{eqnarray*}

Elementary properties of the variance:

  1. if the variable is constant, then the variance is zero.
  2. if we add a constant to the values of the variable, then
    \[ s_n^2 = \frac{1}{n} \, \sum_{i=1}^n \left[ \left( X_i + c \right) - \left( \overline{X} + c \right) \right]^2 . \]
  3. f we multiply the values of the variable by a constant c, then
    \[ \frac{1}{n} \, \sum_{i=1}^n \left( c \cdot X_i - c \cdot \overline{X} \right)^2 = c^2 \cdot s_n^2 . \]

The square root of the variance is called standard deviation

\[ s_n = \sqrt{s_n^2} . \]

The sample variance s2 if defined by the formula

\[ s^2 = \frac{1}{n-1} \, \sum_{i=1}^n \left( \cdot X_i - \overline{X} \right)^2 . \]
The square root of the sample variance is called sample standard deviation
\[ s = \sqrt{s^2} . \]
It is obvious that
\[ s_n^2 = \frac{n-1}{n} \, s^2 . \]

Example: Calculate the variance, the standard deviation, the sample variance and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8, 8 and 9.

The arithmetic mean is \( \overline{X} = 5.75 . \) So we have

\begin{eqnarray*} s_n^2 &=& \frac{1}{8} \left[ |1 - 5.75|^2 + |2- 5.75|^2 +|5 - 5.75|^2 + |6 - 5.75|^2 \right] + \\ && \frac{1}{8} \left[ |7 - 5.75|^2 + |8-5.75|^2 + |8-5.75|^2 + |9-5.75|^2 \right] = 7.4375 . \end{eqnarray*}
The variace can be also calculated by the formula \( s_n^2 = \overline{X^2} - \overline{X}^2 . \)
\begin{eqnarray*} \overline{X^2} &=& \frac{1}{n}\, \sum_{i=1}^n X_i^2 = \frac{1}{8} \left[ 1^2 + 2^2 + 3^2 + 4^2 +5^2 +6^2 + 7^2 +8^2 + 9^2 \right] = 40.5 , \\ s_n^2 &=& \overline{X^2} - \overline{X}^2 = 40.5 - 5.75^2 = 7.4375 . \end{eqnarray*}
The standard deviation is
\[ s_n = \sqrt{s_n^2} = \sqrt{7.4375} \approx 2.72718 . \]
To get the sample variation we apply the formula
\[ s^2 = \frac{n}{n-1}\, s_n^2 = \frac{8}{7}\cdot 7.4375 = 8.5 . \]
The sample standard deviation is
\[ s = \sqrt{s^2} = \sqrt{8.5} \approx 2.91548 . \]