An R TUTORIAL for Statistics Applications

Part 1 - Section 5: Measures of Concentration

This chapter covers basic information regarding data visualisation using R.

Section 5: Data dashboards

A data dashboard is a data-visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available. It is like an automobile's dashboard instrumentation that provides information on the vehicle's current speed, fuel level, and engine temperature so that a driver can access current operating conditions and take effective action. Similarly, a data dashboard provides the important metrics that mahages need to quickly assess the performance of their organization and react accordingly.

We start with some basic definitions. Let X be a discrete random varibale. Its r-th moment is defined by the formula

\[ m'_r = \frac{1}{n}\, \sum_{i=1}^n X_i^r . \]

The r-th central moment is defined by the formula

\[ m_r = \frac{1}{n}\, \sum_{i=1}^n \left( X_i - \overline{X} \right)^r , \]

where \( \overline{X} = m_1 \) is the mean value of n values of X.

Moments can be calculated with R as follows:

Another option is to use the function moment from the e1071 package. As it is not in the core R library, the package has to be installed and loaded into the R workspace.

The sample skewness is defined by the formula

\[ a_3 = \frac{m_3}{m_2^{3/2}} = \frac{1}{n\,s_n^3} \, \sum_{i=1}^n \left( X_i - \overline{X} \right)^3 . \]

The skewness of a data population is defined by the following formula, where μ₂ and μ₃ are the second and third central moments.

\[ \gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} . \]

Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicate that the mean of the data values is larger than the median, and the data distribution is right-skewed.

To calculate the skewness coefficient (of eruptions) one needs the function skewness from the e1071 package. As the package is not in the core R library, it has to be installed and loaded into the R workspace.

The kurtosis of a univariate population is defined by the following formula, ... moments . Intuitively, the kurtosis describes the tail shape of the data distribution. The normal distribution has zero kurtosis and thus the standard tail shape. It is said to be mesokurtic . ...
The sample kurtosis is defined by formula

\[ a_4 = \frac{m_4}{m_2^{2}} -3 = \frac{1}{n\,s_n^4} \, \sum_{i=1}^n \left( X_i - \overline{X} \right)^4 . \]

Note that Excel functions SKEW and KURT calculate skewness and kurtosis by formulas

\begin{eqnarray*} a_3^{\ast} &=& \frac{n}{(n-1)(n-2)} \,\sum_{i=1}^n \left( \frac{X_i - \overline{X}}{s} \right)^3 , \\ a_4^{\ast} &=& \frac{n(n+1)}{(n-1)(n-2)(n-3)} \,\sum_{i=1}^n \left( \frac{X_i - \overline{X}}{s} \right)^4 . \end{eqnarray*}

We can related them to ours:

\begin{eqnarray*} a_3 &=& \frac{n-2}{} \, a_3^{\ast} , \\ a &=& \frac{(n-2)(n-3)}{n^2 -1} \,a_4^{\ast} - \frac{6}{n+1} . \end{eqnarray*}