Statistics is the science and, arguably, also the art of learning from data. As a discipline
it is concerned
with the collection, analysis, and interpretation of data, as well as the effective
communication and presentation of
results relying on data. Statistics lies at the heart of the type of quantitative reasoning
necessary for making
important advances in the sciences, such as medicine and genetics, and for making important
decisions in business and
public policy.
Knowledge in statistics provides you with the necessary tools and conceptual foundations in
quantitative
reasoning to extract information intelligently from this sea of data.
The following is a true story. In 1973, the University of California,
Berkeley
got into some trouble over its
admissions of students into postgraduate courses. Specifically, the problem
was
in gender breakdown of their
admissions, which looked like this:
|
Number of applicants |
Percent admitted |
Males |
8442 |
44% |
Females |
4321 |
35% |
Because the difference of 9% in admission rates between males and females is
just
too big to be a coincidence, the
university was sued. Actually, in this circumstances, people started looking
very carefully at the admissions data.
Remarkably, they found that most of the departments actually had a slightly
higher success rate for female
applications than for male applicants. the table below shows the admission
figures for the six largest departments
(with the names of the departments removed for privacy reasons):
|
Males |
Females |
|
Applicants |
Percent admitted |
825 |
62% |
560 |
63% |
325 |
37% |
417 |
33% |
191 |
28% |
272 |
6% |
|
Applicants |
Percent admitted |
108 |
82% |
25 |
68% |
593 |
34% |
375 |
35% |
393 |
24% |
341 |
7% |
|
Remarkably, most departments had a higher rate of admissions for females than
for
males! Yet the overall rate
of admission across the university for females was lower than for males. To
explain this paradox, known as Simpson's paradox, it worth to mention
that the departments are not equal to one another. Next, males and females
tended to apply to different departments.
On the whole, males tended to apply to the departments that had high
admission
rates. ■
From medical studies to research experiments, from satellites continuously orbiting the
globe to ubiquitous social network sites like Facebook or LinkedIn, from polling
organizations to United Nations observers, data are being collected everywhere and all the
time. Knowledge in statistics provides you with the necessary tools and conceptual
foundations in quantitative reasoning to extract information intelligently from this sea of
data. Specific conclusions can be drawn with statistics by determining how much confidence
we have in our results, or how much estimated error we have allowed in our conclusions.
Knowing this allows us to forecast events, given specified conditions.
This course is concerned with data-driven decision making and the use of analytical
approaches in the decision-making process. Three developments spurred recent explosive
growth in the use of analytical methods in business applications and other areas. First,
technological advances---such as improved point-of-sale scanner technology and the
collection of data through e-commerce, Internet social networks, and data generated from
personal electronic devices---produce incredible amounts of data for various applications.
naturally, businesses want to use these data to improve the efficiency and profitability of
their operations, better understand their customers, price their products more effectively,
and gain a competitive advantage. The same trade we observe in other applications, including
pharmaceutical industry, medical technologies, education, research, and many others. Second,
ongoing research has resulted in numerous methodological developments, including advances in
computational approaches to effectively handle and explore massive amounts of data, faster
algorithms for optimization and simulation, and more effective approaches for visualizing
data. Third, these methodological developments were paired with an explosion in computing
power and storage capability. Better computing hardware, parallel computing, and more
recently, cloud computing have enabled businesses to solve big problems more quickly and
more accurately than ever before.
In summary, the availability of massive amounts of data, improvements in analytic
methodologies, and substantial increases in computing power have all come together to result
in a dramatic upsurge in the use of analytic methods.