2.5 Measures of the Center of the Data

Ram Subedi

2.5 Measures of the Center of the Data

The center of a data set is also a way of describing location. The two most widely used measures of the center of the data are the mean (average) and the median.

The words mean and average are often used interchangeably. The substitution of one word for the other is common practice. The technical term is arithmetic mean and average is technically a center location. However, in practice among non-statisticians, average is commonly accepted for arithmetic mean.

Mode of a data set is the data that occurs with the highest frequency. Since mode is the most frequently occurring value, it can be used to describe a typical (central) value of a data set. If a data set has two modes, it is called bimodal; three modes, trimodal, or multimodal. A data set can have no modes as well.

Review

Review: Statistical Language – Measures of Central Tendency
The linked resource discusses shapes of distribution that we’ll cover in the next section.

Optional: Finding mean, median, and mode @KhanAcademy

Review & Practice

Review: Statistics intro: Mean, median, and Mode

Please complete the practice exercise Calculating mean and median from data displays at the end of the review linked above.

Recall from UNIT 1 that a parameter is a numerical summary of a population whereas a statistic is a numerical summary of a sample. When we compute a numerical summary from a data set, we need to identify whether that data set is a population or a sample. For example, if we compute the mean of a data set, that mean will be called a parameter if the data set was a population and a statistic if the data set was a sample. Parameters are generally written using Greek alphabets. The lower case [latex]\mu[/latex] pronounced “mew”) is used to represent a population mean (parameter). If the mean was computed from a sample, then the symbol used for the sample mean (statistic) is [latex]\bar x[/latex] (pronounced x-bar).

Population Mean, [latex]\mu[/latex] Sample Mean, [latex]\bar x[/latex].

Resistant measures: A numerical summary of data is said to be resistant if extreme values (very large or small outliers) relative to the data do not affect its value substantially.

Video: Resistance & Measures of Center

Calculating the Mean/Median of a Frequency Distribution Table (FDT) or Grouped Frequency Distribution Tables (GFDT)

Let’s look at a Frequency Distribution Table (FDT) below.

Number of times teenager is reminded	Frequency
0	2
1	5
2	8
3	14
4	7
5	4

How many data observations are there?

5, 6, or way more?

Note that frequency represents the count of a data value, and we if add up all of the counts (frequencies), then we’ll have the total count (= [latex]n[/latex], the total number of observations). In this case, [latex]n=40[/latex]. if there are [latex]40[/latex] values, the median must be the average of the data values in the [latex]20^{\text{th}}[/latex] and [latex]21^{\text{st}}[/latex] positions. But what are those values? We see from the table that the first two are [latex]0[/latex]‘s, the next 5 are [latex]1[/latex]‘s; that’s a total of [latex]7[/latex]. The next eight are [latex]2[/latex]‘s, so that brings us to [latex]15[/latex] total we’ve seen so far. If you thought that sounded like cumulative frequency, you’d be correct! The cumulative frequency of the data value of [latex]2[/latex] is [latex]15[/latex]. The next 14 values, 16 through 29 are all [latex]3[/latex]‘s. That means the value in the [latex]20^{\text{th}}[/latex] and [latex]21^{\text{st}}[/latex] are both [latex]3[/latex]. So the median must be [latex]3[/latex] as well since the average of two [latex]3[/latex]‘s is a [latex]3[/latex].

If we revert back to the raw data, this data set can be written as [latex]0, 0, \underbrace{1, 1, 1, 1, 1}_{\text{5 times}}, \underbrace{2, 2, 2, 2, 2, 2, 2, 2}_{\text{8 times}}, ...[/latex] and so on. We can also look at the expanded raw data without having to worry about cumulative frequencies. For calculating the mean, we need to add up all the [latex]40[/latex] data values. We know there are 2 zeros, that add to 0; five 1’s that add to 5; eight 2’s that add to [latex]16 = 2\times 8[/latex], and so on. We can get the total from each data value by multiplying the data value with its frequency. So, the mean for the data set is: \[\bar x = \frac{0 \times 2+1 \times 5+2 \times 8+ 3 \times 14+ 4 \times 7+ 5 \times 4}{40} = 111/40 = 2.775\] Wait, can I do this on a calculator?

Use One Variable Statistics from SUBEDI Calculators to compute mean for frequency tables. For data input make sure to select Frequency Table and enter data values in the first column and corresponding frequencies in the second column before pressing calculate button.

Grouped Frequency Distribution Tables (GFDT)

If a dataset has many data values, it may be too long to list all the values individually in a FDT. In such cases, we can group the data values into classes (groups) to create a Grouped Frequency Distribution Table (GFDT). Here’s an example of scores of students on an exam:

Scores	Frequency
30–39	5
40–49	13
50–59	11
60–69	15
70–79	20

Since the data values are given as ranges (class widths), we first find the midpoints of each class and use the method from the FDT above to calculate estimates for the mean and median.

Practice

GFDT on a Ti Calculator

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

License

Share This Book