Lecture 2

Objectives

The aim of this lecture is to help you understand

  • that there exist mathematical functions that describe different distributions

  • what makes the normal distribution normal and what are its properties

  • how random fluctuations affect sampling and parameter estimates

  • the function of the sampling distribution and the standard error

  • the Central Limit Theorem

With this knowledge you’ll build a solid foundation for understanding all the statistics we will be learning in this programme!

It’s all Greek to me!

Before we start, let’s make clear an important distinction between sample statistics, population parameters, and parameter estimates.

What we really want when we’re analysing data using statistical methods is to know the value of the population parameter(s) of interest. This could be the population mean, or the difference between two populations. The problem is that we can’t directly measure these parameters because it’s not possible to observe the entire population.

What we do instead is collect a sample and observe the sample statistic, such as the sample mean. We then use this sample statistic as an estimate – the best guess – of the value of the population parameter.

To make this distinction clear in notation, we use Greek letters for population parameters, Latin letters for sample statistics, and letters with a hat for population estimates:

  • \(\mu\) is the population mean
  • \(\hat{x}\) is the sample mean
  • \(\hat{\mu}\) is the estimate of the population mean

Same goes for, for example, the standard deviation: \(\sigma\) is the population parameter, \(s\) is the sample statistic, and \(\hat{\sigma}\) is the parameter estimate.

Distributions again

Let’s briefly revisit the general topic of distributions. Numerically speaking, a distribution is the number of observations per each value of a variable. We could ask our sample whether they like cats, dogs, both, or neither and the resulting numbers would be the distribution of the “pet person” variable (or some other name).

Inspecting a variable’s distribution gives information about which values occur more often and which less often.

We can visualise a distribution as the shape formed by the bars of a bar chart/histogram.

hist(rnorm(1000))
Basic histogram

Figure 1: Basic histogram